Sector 6 | The Newsletter of AIM

Sector 6 | The Newsletter of AIM

Share this post

Sector 6 | The Newsletter of AIM
Sector 6 | The Newsletter of AIM
Benchmarking, the Indian Way
The Belamy

Benchmarking, the Indian Way

Analytics India Magazine's avatar
Analytics India Magazine
Sep 29, 2023
∙ Paid

Share this post

Sector 6 | The Newsletter of AIM
Sector 6 | The Newsletter of AIM
Benchmarking, the Indian Way
Share

How do you test the intelligence of an LLM? The answer is, benchmarks such as MMLU, HumanEval, AGIEval and the like.

Whether it's GPT-4 or Llama 2, creators typically begin by highlighting their LLMs' benchmark scores in their research papers. 

But how do you set up a benchmark? For this, we make the benchmarks attempt various human-level examinations. 

A majority of the benchmarks, primarily originating from the US, incorporate various examination elements. For instance, MMLU assesses 57 tasks, encompassing subjects such as elementary mathematics, US history, computer science, and law. Similarly, AGIEval draws inspiration from assessments like the SAT, LSAT, and other examinations, including the Chinese College Entrance Exam (Gaokao), law school admission tests, math competitions, and national civil service assessments.

Keep reading with a 7-day free trial

Subscribe to Sector 6 | The Newsletter of AIM to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Analytics India Magazine
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share