The excitement is real. The quest for running LLMs on a single computer nudged OpenAI’s Andrej Karpathy, known for his contributions to the field of deep learning, to embark on a weekend passion project to create a simplified version of the Llama 2 model – aka Baby Llama, or llama 2.c.
“Yay, Llama 2 can now load and inference the Meta released models! 🙂,” shared an excited Karpathy, citing the smallest 7B model at ~3 tokens/second on 96 OMP threads on a cloud Linux box. “Still just CPU, fp32, expecting ~300 tokens/second tomorrow 🙂,” he shared, on X (formerly Twitter).
Keep reading with a 7-day free trial
Subscribe to Sector 6 | The Newsletter of AIM to keep reading this post and get 7 days of free access to the full post archives.