-
Notifications
You must be signed in to change notification settings - Fork 2.7k
feat: Add mmlu-redux and it's spanish transaltion as generative task definitions #2705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi! Thanks for the PR. Just some minor issues:
|
Also, can you add results showing that this runs and reproduces the results from their paper? |
Hi @luiscosio, could you fix it? It would greatly ease adoption of this benchmark over the standard MMLU, and I’m keen to make it the new standard for our models. |
@jgcb00 I will fix it this week. |
Hi @jgcb00 I addressed the errors on the tasks and tests passed locally, also added readmes. |
… mmlu_redux_spanish task entries and unique subgroup names
…for tests, apply pre-commit fixes
This PR adds generative task definitions for two MMLU-Redux datasets:
The task definitions follow the same structure and evaluation metrics as existing MMLU tasks, using exact_match for scoring with weight_by_size enabled. Both datasets are organized into 4 main groups:
Each group maintains consistent evaluation metrics and aggregation methods across both language versions.
Changes include:
This enhancement allows for direct comparison of model performance between English and Spanish versions of MMLU-Redux in a generative setting.