SBSAT RC
Reading Comprehension (SBSAT)
Test
| Model | Unseen Reader Balanced Accuracy | Unseen Text Balanced Accuracy | Unseen Text and Reader Balanced Accuracy | Average Balanced Accuracy | Unseen Reader AUROC | Unseen Text AUROC | Unseen Text and Reader AUROC | Average AUROC |
|---|
| Majority Class / Chance | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| Reading Speed | 51.1 ± 1.8 | 50.9 ± 0.4 | 49.9 ± 0.1 | 50.2 ± 0.9 | 52.0 ± 2.1 | 50.8 ± 0.3 | 52.4 ± 1.0 | 50.8 ± 1.6 |
| Text-Only Roberta | 61.5 ± 1.8 | 51.2 ± 0.7 | 50.4 ± 0.5 | 56.1 ± 1.1 | 67.1 ± 1.4 | 43.7 ± 4.5 | 46.0 ± 2.3 | 55.9 ± 2.3 |
| Logistic Regression [meziere2023using] | 53.6 ± 1.5 | 51.3 ± 0.7 | 49.9 ± 1.0 | 51.8 ± 0.9 | 53.7 ± 1.3 | 52.3 ± 0.9 | 47.8 ± 3.1 | 52.3 ± 1.0 |
| SVM [hollenstein2023zuco] | 50.7 ± 0.6 | 50.3 ± 1.6 | 48.3 ± 0.7 | 50.2 ± 0.9 | 50.7 ± 0.6 | 50.3 ± 1.6 | 48.3 ± 0.7 | 50.2 ± 0.9 |
| Random Forest [makowski2024detection] | 54.2 ± 0.9 | 51.3 ± 1.2 | 48.9 ± 1.5 | 52.1 ± 0.6 | 54.5 ± 0.9 | 51.6 ± 1.1 | 48.2 ± 0.9 | 52.3 ± 0.7 |
| AhnRNN [ahn2020towards] | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| AhnCNN [ahn2020towards] | 50.9 ± 0.4 | 51.3 ± 0.7 | 50.9 ± 0.8 | 51.1 ± 0.5 | 49.8 ± 2.0 | 52.7 ± 1.4 | 48.4 ± 3.2 | 50.8 ± 1.8 |
| BEyeLSTM [reich_inferring_2022] | 50.9 ± 0.9 | 49.9 ± 0.9 | 48.5 ± 2.0 | 49.9 ± 0.4 | 51.6 ± 1.4 | 51.1 ± 1.3 | 48.9 ± 2.9 | 50.1 ± 0.5 |
| PLM-AS [Yang2023PLMASPL] | 49.1 ± 0.7 | 50.7 ± 1.3 | 48.2 ± 1.4 | 49.5 ± 0.2 | 49.7 ± 1.6 | 49.9 ± 0.6 | 49.5 ± 3.6 | 49.5 ± 1.1 |
| PLM-AS-RM [haller2022eye] | 51.2 ± 0.4 | 51.5 ± 1.2 | 51.2 ± 1.0 | 51.2 ± 0.7 | 53.0 ± 1.7 | 54.7 ± 2.2 | 54.2 ± 1.4 | 53.9 ± 1.1 |
| RoBERTEye-W [Shubi2024Finegrained] | 55.6 ± 2.7 | 52.7 ± 2.6 | 53.9 ± 3.1 | 54.3 ± 2.5 | 59.9 ± 3.6 | 52.5 ± 4.9 | 56.1 ± 4.2 | 57.4 ± 3.7 |
| RoBERTEye-F [Shubi2024Finegrained] | 55.7 ± 4.0 | 51.0 ± 1.4 | 52.5 ± 1.3 | 53.8 ± 2.4 | 58.7 ± 5.4 | 53.4 ± 2.6 | 51.8 ± 1.0 | 56.0 ± 2.7 |
| MAG-Eye [Shubi2024Finegrained] | 60.6 ± 2.5 | 48.3 ± 2.4 | 46.6 ± 4.6 | 54.1 ± 1.9 | 65.6 ± 2.3 | 44.8 ± 4.3 | 42.0 ± 6.3 | 56.0 ± 2.1 |
| PostFusion-Eye [Shubi2024Finegrained] | 53.9 ± 2.2 | 49.1 ± 0.9 | 51.0 ± 1.0 | 51.7 ± 1.1 | 57.9 ± 3.5 | 55.9 ± 4.0 | 52.7 ± 5.9 | 55.4 ± 2.6 |
Validation
| Model | Unseen Reader Balanced Accuracy | Unseen Text Balanced Accuracy | Unseen Text and Reader Balanced Accuracy | Average Balanced Accuracy | Unseen Reader AUROC | Unseen Text AUROC | Unseen Text and Reader AUROC | Average AUROC |
|---|
| Majority Class / Chance | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| Reading Speed | 53.6 ± 1.7 | 50.5 ± 0.2 | 50.5 ± 0.4 | 52.4 ± 1.5 | 53.5 ± 1.9 | 49.4 ± 0.7 | 50.9 ± 1.5 | 52.2 ± 1.3 |
| Text-Only Roberta | 63.5 ± 2.7 | 56.7 ± 3.5 | 55.6 ± 2.2 | 60.0 ± 2.3 | 67.6 ± 2.3 | 62.8 ± 5.2 | 61.5 ± 6.0 | 64.8 ± 0.4 |
| Logistic Regression [meziere2023using] | 52.7 ± 1.4 | 52.8 ± 1.5 | 52.4 ± 2.4 | 52.5 ± 1.4 | 52.4 ± 1.8 | 51.4 ± 1.4 | 51.8 ± 3.1 | 51.9 ± 1.5 |
| SVM [hollenstein2023zuco] | 52.1 ± 1.1 | 52.6 ± 1.1 | 52.9 ± 2.6 | 52.7 ± 1.2 | 52.1 ± 1.1 | 52.6 ± 1.1 | 52.9 ± 2.6 | 52.7 ± 1.2 |
| Random Forest [makowski2024detection] | 53.1 ± 1.0 | 49.9 ± 2.3 | 54.1 ± 2.6 | 51.9 ± 1.3 | 55.1 ± 1.4 | 51.2 ± 2.2 | 54.9 ± 3.0 | 53.5 ± 1.4 |
| AhnRNN [ahn2020towards] | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| AhnCNN [ahn2020towards] | 50.8 ± 0.4 | 50.5 ± 0.7 | 48.5 ± 0.8 | 50.2 ± 0.3 | 54.5 ± 1.6 | 49.9 ± 1.5 | 50.9 ± 0.5 | 51.7 ± 1.2 |
| BEyeLSTM [reich_inferring_2022] | 52.2 ± 1.4 | 50.1 ± 1.2 | 50.0 ± 1.2 | 51.0 ± 1.2 | 55.1 ± 2.3 | 51.9 ± 1.5 | 52.6 ± 1.8 | 52.7 ± 1.7 |
| PLM-AS [Yang2023PLMASPL] | 50.1 ± 0.8 | 51.1 ± 1.5 | 48.2 ± 2.2 | 50.1 ± 1.1 | 49.2 ± 1.7 | 51.2 ± 3.3 | 50.1 ± 1.0 | 50.1 ± 1.7 |
| PLM-AS-RM [haller2022eye] | 52.0 ± 0.2 | 51.2 ± 1.1 | 51.8 ± 0.8 | 51.6 ± 0.5 | 54.0 ± 1.8 | 52.0 ± 2.9 | 53.8 ± 1.0 | 53.2 ± 1.3 |
| RoBERTEye-W [Shubi2024Finegrained] | 59.4 ± 4.6 | 50.5 ± 0.5 | 52.1 ± 1.4 | 54.2 ± 2.1 | 64.2 ± 4.9 | 50.8 ± 3.7 | 49.8 ± 2.0 | 57.3 ± 2.6 |
| RoBERTEye-F [Shubi2024Finegrained] | 58.3 ± 3.4 | 55.8 ± 2.9 | 53.7 ± 2.8 | 57.2 ± 2.9 | 60.1 ± 4.3 | 58.4 ± 4.1 | 55.2 ± 3.1 | 59.3 ± 3.4 |
| MAG-Eye [Shubi2024Finegrained] | 63.2 ± 2.9 | 53.2 ± 3.1 | 52.8 ± 2.8 | 56.8 ± 2.7 | 69.2 ± 2.9 | 49.1 ± 6.7 | 50.3 ± 7.1 | 60.2 ± 3.5 |
| PostFusion-Eye [Shubi2024Finegrained] | 55.8 ± 3.1 | 49.5 ± 0.6 | 50.4 ± 0.2 | 52.9 ± 1.7 | 57.4 ± 5.1 | 51.4 ± 2.2 | 50.6 ± 1.2 | 55.3 ± 3.1 |