Last week, the buzz was around large language models. A day after DeepMind came out with Gopher, a 280 billion parameter transformer language model, Google introduced the Generalist Language Model (GLaM) – a trillion weight model that uses sparsity.
The full version of GLaM has 1.2T total parameters across 64 experts per mixture of experts (MoE) layer wi…
Keep reading with a 7-day free trial
Subscribe to Sector 6 | The Newsletter of AIM to keep reading this post and get 7 days of free access to the full post archives.