Skip to content

Actions: EleutherAI/lm-evaluation-harness

Actions

All workflows

Actions

Loading...
Loading

Showing runs from all workflows
3,645 workflow runs
3,645 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

feat: Add mmlu-redux and it's spanish transaltion as generative task definitions
Tasks Modified #5348: Pull request #2705 synchronize by CT-6282
September 4, 2025 03:23 Action required amias-mx:mmlu-redux-2.0-spanish
September 4, 2025 03:23 Action required
feat: Add mmlu-redux and it's spanish transaltion as generative task definitions
Unit Tests #5320: Pull request #2705 synchronize by CT-6282
September 4, 2025 03:23 Action required amias-mx:mmlu-redux-2.0-spanish
September 4, 2025 03:23 Action required
Add support for steering individual attention heads
Unit Tests #5319: Pull request #3279 synchronize by luciaquirke
September 3, 2025 07:02 5m 25s steer-attn
September 3, 2025 07:02 5m 25s
Add support for steering individual attention heads
Tasks Modified #5347: Pull request #3279 synchronize by luciaquirke
September 3, 2025 07:02 9s steer-attn
September 3, 2025 07:02 9s
Add support for steering individual attention heads
Tasks Modified #5346: Pull request #3279 opened by luciaquirke
September 3, 2025 06:48 11s steer-attn
September 3, 2025 06:48 11s
Add support for steering individual attention heads
Unit Tests #5318: Pull request #3279 opened by luciaquirke
September 3, 2025 06:48 4m 56s steer-attn
September 3, 2025 06:48 4m 56s
Add the Icelandic WinoGrande benchmark
Unit Tests #5317: Pull request #3277 synchronize by jmichaelov
September 2, 2025 13:24 5m 16s jmichaelov:icelandic_winogrande
September 2, 2025 13:24 5m 16s
Add the Icelandic WinoGrande benchmark
Tasks Modified #5345: Pull request #3277 synchronize by jmichaelov
September 2, 2025 13:24 2m 2s jmichaelov:icelandic_winogrande
September 2, 2025 13:24 2m 2s
Add the Icelandic WinoGrande benchmark
Unit Tests #5316: Pull request #3277 opened by jmichaelov
September 2, 2025 13:13 5m 19s jmichaelov:icelandic_winogrande
September 2, 2025 13:13 5m 19s
Add the Icelandic WinoGrande benchmark
Tasks Modified #5344: Pull request #3277 opened by jmichaelov
September 2, 2025 13:13 1m 35s jmichaelov:icelandic_winogrande
September 2, 2025 13:13 1m 35s
Add EsBBQ and CaBBQ tasks (#3167)
Unit Tests #5315: Commit 2d7cb5c pushed by baberabb
September 2, 2025 12:11 5m 26s main
September 2, 2025 12:11 5m 26s
Add EsBBQ and CaBBQ tasks (#3167)
Tasks Modified #5343: Commit 2d7cb5c pushed by baberabb
September 2, 2025 12:11 2m 3s main
September 2, 2025 12:11 2m 3s
Add acc_norm metric to ZhoBLiMP (#3271)
Unit Tests #5314: Commit ecebf1b pushed by baberabb
September 2, 2025 12:05 5m 29s main
September 2, 2025 12:05 5m 29s
Add acc_norm metric to ZhoBLiMP (#3271)
Tasks Modified #5342: Commit ecebf1b pushed by baberabb
September 2, 2025 12:05 1m 37s main
September 2, 2025 12:05 1m 37s
Add acc_norm to BLiMP-NL (#3272)
Unit Tests #5313: Commit aff14e5 pushed by baberabb
September 2, 2025 12:05 5m 38s main
September 2, 2025 12:05 5m 38s
Add acc_norm to BLiMP-NL (#3272)
Tasks Modified #5341: Commit aff14e5 pushed by baberabb
September 2, 2025 12:05 1m 51s main
September 2, 2025 12:05 1m 51s
Add BHS benchmark (#3265)
Tasks Modified #5340: Commit 331288b pushed by baberabb
September 2, 2025 12:04 3m 40s main
September 2, 2025 12:04 3m 40s
Add BHS benchmark (#3265)
Unit Tests #5312: Commit 331288b pushed by baberabb
September 2, 2025 12:04 5m 34s main
September 2, 2025 12:04 5m 34s
Fix LongBench Evaluation
Unit Tests #5311: Pull request #3273 opened by TimurAysin
August 31, 2025 18:10 Action required TimurAysin:longbench_fix
August 31, 2025 18:10 Action required
Fix LongBench Evaluation
Tasks Modified #5339: Pull request #3273 opened by TimurAysin
August 31, 2025 18:10 Action required TimurAysin:longbench_fix
August 31, 2025 18:10 Action required
Add acc_norm metric to BLiMP-NL
Tasks Modified #5338: Pull request #3272 opened by jmichaelov
August 31, 2025 17:05 2m 2s jmichaelov:patch-2
August 31, 2025 17:05 2m 2s
Add acc_norm metric to BLiMP-NL
Unit Tests #5310: Pull request #3272 opened by jmichaelov
August 31, 2025 17:05 5m 17s jmichaelov:patch-2
August 31, 2025 17:05 5m 17s
Add acc_norm metric to ZhoBLiMP
Tasks Modified #5337: Pull request #3271 opened by jmichaelov
August 31, 2025 17:05 1m 31s jmichaelov:patch-3
August 31, 2025 17:05 1m 31s
Add acc_norm metric to ZhoBLiMP
Unit Tests #5309: Pull request #3271 opened by jmichaelov
August 31, 2025 17:05 5m 16s jmichaelov:patch-3
August 31, 2025 17:05 5m 16s
Add BHS benchmark
Tasks Modified #5336: Pull request #3265 synchronize by jmichaelov
August 31, 2025 16:57 1m 49s jmichaelov:bhs
August 31, 2025 16:57 1m 49s