EyeBench V1.0 Benchmark Results
Interactive tables sourced from the latest formatted benchmark exports. Values show the mean and standard deviation across folds.
Overall Leaderboard (Test)
Macro-level comparison across every benchmark task on the held-out test folds.
| Model | Layout | Saccade/Fixation | Word-Level | Trial-Level | Linguistic | Embeddings | Avg Normalized Score | Mean Rank |
|---|---|---|---|---|---|---|---|---|
| Majority Class / Chance | - | - | - | - | - | - | 0.367 | 10.1 |
| Reading Speed | - | - | - | ✓ | - | - | 0.421 | 10.29 |
| Text-Only Roberta | - | - | - | - | - | ✓ | 0.672 | 6.1 |
| Logistic Regression [meziere2023using] | - | - | - | ✓ | - | - | 0.571 | 7.67 |
| SVM [hollenstein2023zuco] | - | - | - | ✓ | - | - | 0.521 | 7.38 |
| Random Forest [makowski2024detection] | - | ✓ | - | ✓ | ✓ | - | 0.788 | 4.48 |
| AhnRNN [ahn2020towards] | ✓ | ✓ | - | - | - | - | 0.36 | 9.48 |
| AhnCNN [ahn2020towards] | ✓ | ✓ | - | - | - | - | 0.531 | 6.95 |
| BEyeLSTM [reich_inferring_2022] | ✓ | ✓ | ✓ | ✓ | ✓ | - | 0.414 | 9.1 |
| PLM-AS [Yang2023PLMASPL] | - | ✓ | - | - | - | ✓ | 0.494 | 8.0 |
| PLM-AS-RM [haller2022eye] | - | ✓ | ✓ | - | - | ✓ | 0.475 | 8.95 |
| RoBERTEye-W [Shubi2024Finegrained] | ✓ | - | ✓ | ✓ | ✓ | ✓ | 0.757 | 4.43 |
| RoBERTEye-F [Shubi2024Finegrained] | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.654 | 6.81 |
| MAG-Eye [Shubi2024Finegrained] | ✓ | - | ✓ | ✓ | ✓ | ✓ | 0.686 | 5.62 |
| PostFusion-Eye [Shubi2024Finegrained] | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.546 | 9.81 |
Results Tasks Combined
Side-by-side view of the primary metric for each EyeBench task, averaged over folds.
| Model | OneStop RC | SBSAT RC | PoTeC RC | PoTeC DE | IITBHGC CV | CopCo TYP | SBSAT STD | CopCo RCS | MECOL2 LEX |
|---|---|---|---|---|---|---|---|---|---|
| Majority Class / Chance | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 51.4 ± 1.3 | 50.4 ± 0.3 | 49.9 ± 0.0 | 0.73 ± 0.0 | 2.66 ± 0.1 | 12.45 ± 0.1 |
| Reading Speed | 49.6 ± 0.8 | 50.8 ± 1.6 | 51.9 ± 2.2 | 60.4 ± 1.7 | 57.3 ± 0.7 | 56.6 ± 2.1 | 0.77 ± 0.0 | 2.68 ± 0.1 | 28.95 ± 14.3 |
| Text-Only Roberta | 61.1 ± 1.0 | 55.9 ± 2.3 | 56.3 ± 1.6 | 62.0 ± 4.0 | 58.8 ± 1.5 | 50.1 ± 0.4 | 0.72 ± 0.0 | 2.64 ± 0.0 | 12.45 ± 0.1 |
| Logistic Regression [meziere2023using] | 53.0 ± 0.8 | 52.3 ± 1.0 | 54.1 ± 0.7 | 54.0 ± 1.7 | 54.6 ± 1.1 | 80.6 ± 2.3 | 0.82 ± 0.0 | 2.74 ± 0.1 | 10.34 ± 0.0 |
| SVM [hollenstein2023zuco] | 50.7 ± 0.7 | 50.2 ± 0.9 | 50.6 ± 0.7 | 54.5 ± 1.5 | 54.5 ± 0.6 | 72.5 ± 1.6 | 0.73 ± 0.0 | 2.76 ± 0.1 | 10.94 ± 0.0 |
| Random Forest [makowski2024detection] | 58.0 ± 0.6 | 52.3 ± 0.7 | 54.3 ± 1.2 | 62.3 ± 3.4 | 56.4 ± 0.3 | 82.9 ± 1.5 | 0.77 ± 0.0 | 2.51 ± 0.1 | 10.09 ± 0.0 |
| AhnRNN [ahn2020towards] | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.1 | 50.9 ± 0.7 | 50.0 ± 0.1 | 0.72 ± 0.0 | 2.63 ± 0.0 | 12.46 ± 0.1 |
| AhnCNN [ahn2020towards] | 49.7 ± 0.7 | 50.8 ± 1.8 | 51.6 ± 2.0 | 60.6 ± 3.4 | 52.9 ± 1.0 | 83.4 ± 1.1 | 0.72 ± 0.0 | 2.63 ± 0.0 | 12.27 ± 0.0 |
| BEyeLSTM [reich_inferring_2022] | 52.5 ± 0.8 | 50.1 ± 0.5 | 54.7 ± 1.4 | 51.8 ± 3.5 | 51.3 ± 1.2 | 80.2 ± 1.5 | 1.43 ± 0.7 | 2.63 ± 0.0 | 12.68 ± 0.3 |
| PLM-AS [Yang2023PLMASPL] | 56.1 ± 0.9 | 49.5 ± 1.1 | 56.5 ± 0.3 | 51.3 ± 2.4 | 51.4 ± 0.6 | 57.9 ± 4.6 | 0.71 ± 0.0 | 2.64 ± 0.0 | 12.46 ± 0.1 |
| PLM-AS-RM [haller2022eye] | 58.4 ± 0.5 | 53.9 ± 1.1 | 59.0 ± 1.2 | 64.2 ± 4.0 | 53.4 ± 1.5 | 69.2 ± 1.0 | 1.21 ± 0.0 | 2.67 ± 0.0 | 31.79 ± 0.2 |
| RoBERTEye-W [Shubi2024Finegrained] | 61.4 ± 0.9 | 57.4 ± 3.7 | 56.8 ± 1.2 | 62.5 ± 7.3 | 58.0 ± 2.2 | 75.6 ± 2.3 | 0.71 ± 0.0 | 2.65 ± 0.0 | 11.16 ± 0.1 |
| RoBERTEye-F [Shubi2024Finegrained] | 61.9 ± 0.8 | 56.0 ± 2.7 | 54.7 ± 2.2 | 64.5 ± 3.1 | 58.4 ± 1.1 | 70.8 ± 1.9 | 0.73 ± 0.0 | 2.66 ± 0.1 | 11.25 ± 0.2 |
| MAG-Eye [Shubi2024Finegrained] | 62.9 ± 0.5 | 56.0 ± 2.1 | 58.3 ± 1.3 | 57.6 ± 7.1 | 58.0 ± 2.0 | 53.3 ± 3.7 | 0.72 ± 0.0 | 2.64 ± 0.0 | 12.46 ± 0.1 |
| PostFusion-Eye [Shubi2024Finegrained] | 61.1 ± 0.6 | 55.4 ± 2.6 | 53.0 ± 1.8 | 53.6 ± 0.9 | 57.5 ± 1.4 | 73.2 ± 1.5 | 0.8 ± 0.1 | 2.79 ± 0.1 | 14.0 ± 0.4 |