LLMs Getting Cheaper 🤗 🔥
In a recent post, renowned computer scientist Andrej Karpathy demonstrated how the cost of training large language models (LLMs) has significantly decreased over the past five years, making it feasible to train models like GPT-2 for approximately $672 on “one 8XH100 GPU node in 24 hours”.
“Incredibly, the costs have come down dramatically over the past five years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g., the FineWeb-Edu dataset),” said Karpathy.
Keep reading with a 7-day free trial
Subscribe to Sector 6 | The Newsletter of AIM to keep reading this post and get 7 days of free access to the full post archives.