The Birth of Baby Llama

Jul 25, 2023

∙ Paid

The excitement is real. The quest for running LLMs on a single computer nudged OpenAI’s Andrej Karpathy, known for his contributions to the field of deep learning, to embark on a weekend passion project to create a simplified version of the Llama 2 model – aka Baby Llama, or llama 2.c.

“Yay, Llama 2 can now load and inference the Meta released models! 🙂,” shared an excited Karpathy, citing the smallest 7B model at ~3 tokens/second on 96 OMP threads on a cloud Linux box. “Still just CPU, fp32, expecting ~300 tokens/second tomorrow 🙂,” he shared, on X (formerly Twitter).

Keep reading with a 7-day free trial

Subscribe to Sector 6 | The Newsletter of AIM to keep reading this post and get 7 days of free access to the full post archives.