Skip to content

EyeBench V1.0 Benchmark Results

Interactive tables sourced from the latest formatted benchmark exports. Values show the mean and standard deviation across folds.

Overall Leaderboard (Test)

Macro-level comparison across every benchmark task on the held-out test folds.

ModelLayoutSaccade/FixationWord-LevelTrial-LevelLinguisticEmbeddingsAvg Normalized ScoreMean Rank
Majority Class / Chance------0.36710.1
Reading Speed-----0.42110.29
Text-Only Roberta-----0.6726.1
Logistic Regression [meziere2023using]-----0.5717.67
SVM [hollenstein2023zuco]-----0.5217.38
Random Forest [makowski2024detection]---0.7884.48
AhnRNN [ahn2020towards]----0.369.48
AhnCNN [ahn2020towards]----0.5316.95
BEyeLSTM [reich_inferring_2022]-0.4149.1
PLM-AS [Yang2023PLMASPL]----0.4948.0
PLM-AS-RM [haller2022eye]---0.4758.95
RoBERTEye-W [Shubi2024Finegrained]-0.7574.43
RoBERTEye-F [Shubi2024Finegrained]0.6546.81
MAG-Eye [Shubi2024Finegrained]-0.6865.62
PostFusion-Eye [Shubi2024Finegrained]0.5469.81

Results Tasks Combined

Side-by-side view of the primary metric for each EyeBench task, averaged over folds.

ModelOneStop RCSBSAT RCPoTeC RCPoTeC DEIITBHGC CVCopCo TYPSBSAT STDCopCo RCSMECOL2 LEX
Majority Class / Chance50.0 ± 0.050.0 ± 0.050.0 ± 0.051.4 ± 1.350.4 ± 0.349.9 ± 0.00.73 ± 0.02.66 ± 0.112.45 ± 0.1
Reading Speed49.6 ± 0.850.8 ± 1.651.9 ± 2.260.4 ± 1.757.3 ± 0.756.6 ± 2.10.77 ± 0.02.68 ± 0.128.95 ± 14.3
Text-Only Roberta61.1 ± 1.055.9 ± 2.356.3 ± 1.662.0 ± 4.058.8 ± 1.550.1 ± 0.40.72 ± 0.02.64 ± 0.012.45 ± 0.1
Logistic Regression [meziere2023using]53.0 ± 0.852.3 ± 1.054.1 ± 0.754.0 ± 1.754.6 ± 1.180.6 ± 2.30.82 ± 0.02.74 ± 0.110.34 ± 0.0
SVM [hollenstein2023zuco]50.7 ± 0.750.2 ± 0.950.6 ± 0.754.5 ± 1.554.5 ± 0.672.5 ± 1.60.73 ± 0.02.76 ± 0.110.94 ± 0.0
Random Forest [makowski2024detection]58.0 ± 0.652.3 ± 0.754.3 ± 1.262.3 ± 3.456.4 ± 0.382.9 ± 1.50.77 ± 0.02.51 ± 0.110.09 ± 0.0
AhnRNN [ahn2020towards]50.0 ± 0.050.0 ± 0.050.0 ± 0.050.0 ± 0.150.9 ± 0.750.0 ± 0.10.72 ± 0.02.63 ± 0.012.46 ± 0.1
AhnCNN [ahn2020towards]49.7 ± 0.750.8 ± 1.851.6 ± 2.060.6 ± 3.452.9 ± 1.083.4 ± 1.10.72 ± 0.02.63 ± 0.012.27 ± 0.0
BEyeLSTM [reich_inferring_2022]52.5 ± 0.850.1 ± 0.554.7 ± 1.451.8 ± 3.551.3 ± 1.280.2 ± 1.51.43 ± 0.72.63 ± 0.012.68 ± 0.3
PLM-AS [Yang2023PLMASPL]56.1 ± 0.949.5 ± 1.156.5 ± 0.351.3 ± 2.451.4 ± 0.657.9 ± 4.60.71 ± 0.02.64 ± 0.012.46 ± 0.1
PLM-AS-RM [haller2022eye]58.4 ± 0.553.9 ± 1.159.0 ± 1.264.2 ± 4.053.4 ± 1.569.2 ± 1.01.21 ± 0.02.67 ± 0.031.79 ± 0.2
RoBERTEye-W [Shubi2024Finegrained]61.4 ± 0.957.4 ± 3.756.8 ± 1.262.5 ± 7.358.0 ± 2.275.6 ± 2.30.71 ± 0.02.65 ± 0.011.16 ± 0.1
RoBERTEye-F [Shubi2024Finegrained]61.9 ± 0.856.0 ± 2.754.7 ± 2.264.5 ± 3.158.4 ± 1.170.8 ± 1.90.73 ± 0.02.66 ± 0.111.25 ± 0.2
MAG-Eye [Shubi2024Finegrained]62.9 ± 0.556.0 ± 2.158.3 ± 1.357.6 ± 7.158.0 ± 2.053.3 ± 3.70.72 ± 0.02.64 ± 0.012.46 ± 0.1
PostFusion-Eye [Shubi2024Finegrained]61.1 ± 0.655.4 ± 2.653.0 ± 1.853.6 ± 0.957.5 ± 1.473.2 ± 1.50.8 ± 0.12.79 ± 0.114.0 ± 0.4