How Taalas "prints" LLM onto a chip?
created: Feb. 21, 2026, 7:07 p.m. | updated: Feb. 22, 2026, 7:23 a.m.
How Taalas "prints" LLM onto a chip?
A startup called Taalas, recently released an ASIC chip running Llama 3.1 8B (3/6 bit quant) at an inference rate of 17,000 tokens per seconds.
I tried to read through their blog and they've literally "hardwired" the model's weights on chip.
They just engraved the 32 layers of Llama 3.1 sequentially on a chip.
It took them two months, to develop chip for Llama 3.1 8B.
13 hours, 24 minutes ago: Hacker News