feat: Add mmlu-redux and it's spanish transaltion as generative task definitions #2705

luiscosio · 2025-02-16T20:37:42Z

This PR adds generative task definitions for two MMLU-Redux datasets:

Original MMLU-Redux-2.0 (english) from edinburgh-dawg/mmlu-redux-2.0
Spanish translation of MMLU-Redux-2.0

The task definitions follow the same structure and evaluation metrics as existing MMLU tasks, using exact_match for scoring with weight_by_size enabled. Both datasets are organized into 4 main groups:

STEM
Other
Social Sciences
Humanities

Each group maintains consistent evaluation metrics and aggregation methods across both language versions.

Changes include:

Added task definitions for generative format evaluation
Consistent group structure between English and Spanish versions
Maintained weight_by_size true for all metrics
Version 3 metadata tag for compatibility

This enhancement allows for direct comparison of model performance between English and Spanish versions of MMLU-Redux in a generative setting.

baberabb · 2025-02-19T21:53:37Z

Hi! Thanks for the PR. Just some minor issues:

test is failing as it can't find one of the subtask configs on HF hub (probably a typo).
Could you add the readme from template/new_yaml_task, and also add an entry in lm_eval/tasks/README.md

StellaAthena · 2025-03-20T16:37:26Z

Also, can you add results showing that this runs and reproduces the results from their paper?

jgcb00 · 2025-06-02T15:42:37Z

Hi @luiscosio, could you fix it? It would greatly ease adoption of this benchmark over the standard MMLU, and I’m keen to make it the new standard for our models.

luiscosio · 2025-06-04T18:18:41Z

@jgcb00 I will fix it this week.

CLAassistant · 2025-07-02T17:52:43Z

All committers have signed the CLA.

CT-6282 · 2025-07-02T18:24:04Z

Hi @jgcb00 I addressed the errors on the tasks and tests passed locally, also added readmes.

… mmlu_redux_spanish task entries and unique subgroup names

…for tests, apply pre-commit fixes

luiscosio added 3 commits February 15, 2025 20:55

Added benchmark

e654c74

Added more testing

5a3dabf

Added task definition for mmlu_redux and mmlu_redux_spanish

fb0f575

luiscosio requested review from baberabb and lintangsutawika as code owners February 16, 2025 20:37

luiscosio mentioned this pull request Feb 16, 2025

Added MMLU-Redux to EleutherAI's LM-Evaluation-Harness aryopg/mmlu-redux#4

Open

CT-6282 added 3 commits June 25, 2025 14:52

Add MMLU Redux English and Spanish tasks with YAML fixes and READMEs

7c8e654

Add remaining MMLU Redux YAMLs and updated tasks README

d16a9a8

Add MMLU Redux English and Spanish tasks with YAML fixes and READMEs

aa1f202

CT-6282 requested a review from StellaAthena as a code owner July 2, 2025 17:52

CT-6282 added 3 commits July 2, 2025 11:59

Add MMLU Redux changes from pr-2705

e4c9f22

Merge branch 'pr-2705' into mmlu-redux-2.0-spanish

83dda0e

Merge branch 'main' into mmlu-redux-2.0-spanish

66cad23

CT-6282 added 4 commits July 20, 2025 14:03

Resolve pre-commit hook and pytest overlapping group issues by adding…

37a6237

… mmlu_redux_spanish task entries and unique subgroup names

Merge branch 'main' into mmlu-redux-2.0-spanish

00d871e

Enhance retry logic to prevent 429 error when using Hugging Face API …

95f2c8f

…for tests, apply pre-commit fixes

Merge branch 'main' into mmlu-redux-2.0-spanish

d3b264a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add mmlu-redux and it's spanish transaltion as generative task definitions #2705

feat: Add mmlu-redux and it's spanish transaltion as generative task definitions #2705

Uh oh!

luiscosio commented Feb 16, 2025

Uh oh!

baberabb commented Feb 19, 2025

Uh oh!

StellaAthena commented Mar 20, 2025

Uh oh!

jgcb00 commented Jun 2, 2025

Uh oh!

luiscosio commented Jun 4, 2025

Uh oh!

CLAassistant commented Jul 2, 2025 •

edited

Loading

Uh oh!

CT-6282 commented Jul 2, 2025

Uh oh!

Uh oh!

feat: Add mmlu-redux and it's spanish transaltion as generative task definitions #2705

Are you sure you want to change the base?

feat: Add mmlu-redux and it's spanish transaltion as generative task definitions #2705

Uh oh!

Conversation

luiscosio commented Feb 16, 2025

Uh oh!

baberabb commented Feb 19, 2025

Uh oh!

StellaAthena commented Mar 20, 2025

Uh oh!

jgcb00 commented Jun 2, 2025

Uh oh!

luiscosio commented Jun 4, 2025

Uh oh!

CLAassistant commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CT-6282 commented Jul 2, 2025

Uh oh!

Uh oh!

CLAassistant commented Jul 2, 2025 •

edited

Loading