However Briefly

created: Feb. 21, 2026, 7:07 p.m. | updated: Feb. 22, 2026, 7:23 a.m.

How Taalas "prints" LLM onto a chip? A startup called Taalas, recently released an ASIC chip running Llama 3.1 8B (3/6 bit quant) at an inference rate of 17,000 tokens per seconds. I tried to read through their blog and they've literally "hardwired" the model's weights on chip. They just engraved the 32 layers of Llama 3.1 sequentially on a chip. It took them two months, to develop chip for Llama 3.1 8B.

Read Full Article

13 hours, 24 minutes ago: Hacker News