OneStop RC
Reading Comprehension (OneStop)
Test
| Model | Unseen Reader Balanced Accuracy | Unseen Text Balanced Accuracy | Unseen Text and Reader Balanced Accuracy | Average Balanced Accuracy | Unseen Reader AUROC | Unseen Text AUROC | Unseen Text and Reader AUROC | Average AUROC |
|---|
| Majority Class / Chance | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| Reading Speed | 50.6 ± 1.2 | 49.4 ± 0.8 | 48.0 ± 1.5 | 49.9 ± 0.6 | 49.8 ± 1.7 | 49.3 ± 0.9 | 47.7 ± 2.2 | 49.6 ± 0.8 |
| Text-Only Roberta | 58.6 ± 1.7 | 51.4 ± 0.5 | 53.7 ± 1.6 | 55.0 ± 1.0 | 66.3 ± 1.1 | 55.2 ± 1.5 | 55.0 ± 2.8 | 61.1 ± 1.0 |
| Logistic Regression [meziere2023using] | 51.1 ± 1.2 | 52.2 ± 0.4 | 52.4 ± 2.5 | 51.7 ± 0.7 | 52.2 ± 1.5 | 53.4 ± 0.9 | 54.3 ± 3.2 | 53.0 ± 0.8 |
| SVM [hollenstein2023zuco] | 50.0 ± 0.9 | 51.6 ± 0.6 | 50.8 ± 1.5 | 50.7 ± 0.7 | 50.0 ± 0.9 | 51.6 ± 0.6 | 50.8 ± 1.5 | 50.7 ± 0.7 |
| Random Forest [makowski2024detection] | 56.2 ± 0.8 | 53.5 ± 1.0 | 54.7 ± 1.5 | 55.1 ± 0.5 | 59.4 ± 0.9 | 56.0 ± 1.3 | 57.2 ± 1.9 | 58.0 ± 0.6 |
| AhnRNN [ahn2020towards] | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| AhnCNN [ahn2020towards] | 50.1 ± 0.1 | 50.1 ± 0.1 | 49.7 ± 0.9 | 50.0 ± 0.0 | 48.3 ± 1.5 | 51.7 ± 0.5 | 48.1 ± 3.1 | 49.7 ± 0.7 |
| BEyeLSTM [reich_inferring_2022] | 53.0 ± 0.6 | 50.0 ± 0.7 | 51.6 ± 1.3 | 51.5 ± 0.5 | 54.8 ± 1.1 | 50.3 ± 1.2 | 51.0 ± 2.5 | 52.5 ± 0.8 |
| PLM-AS [Yang2023PLMASPL] | 56.0 ± 0.9 | 50.9 ± 0.8 | 53.8 ± 2.2 | 53.5 ± 0.8 | 59.6 ± 0.9 | 52.1 ± 1.0 | 55.9 ± 1.9 | 56.1 ± 0.9 |
| PLM-AS-RM [haller2022eye] | 58.0 ± 0.7 | 52.5 ± 0.8 | 56.2 ± 2.3 | 55.2 ± 0.4 | 62.0 ± 0.8 | 54.1 ± 1.3 | 58.6 ± 2.1 | 58.4 ± 0.5 |
| RoBERTEye-W [Shubi2024Finegrained] | 58.4 ± 1.8 | 51.1 ± 0.8 | 54.1 ± 2.2 | 54.7 ± 1.0 | 66.5 ± 1.5 | 54.7 ± 1.5 | 57.2 ± 2.7 | 61.4 ± 0.9 |
| RoBERTEye-F [Shubi2024Finegrained] | 56.6 ± 1.3 | 50.7 ± 0.4 | 52.0 ± 1.5 | 53.6 ± 0.9 | 67.3 ± 1.2 | 55.7 ± 1.1 | 54.8 ± 3.4 | 61.9 ± 0.8 |
| MAG-Eye [Shubi2024Finegrained] | 58.3 ± 1.7 | 50.9 ± 0.4 | 50.1 ± 0.5 | 54.3 ± 0.9 | 67.7 ± 1.0 | 57.7 ± 0.5 | 57.8 ± 2.2 | 62.9 ± 0.5 |
| PostFusion-Eye [Shubi2024Finegrained] | 57.0 ± 1.5 | 52.2 ± 0.7 | 52.2 ± 1.1 | 54.7 ± 0.9 | 64.5 ± 0.9 | 57.1 ± 1.1 | 54.7 ± 3.2 | 61.1 ± 0.6 |
Validation
| Model | Unseen Reader Balanced Accuracy | Unseen Text Balanced Accuracy | Unseen Text and Reader Balanced Accuracy | Average Balanced Accuracy | Unseen Reader AUROC | Unseen Text AUROC | Unseen Text and Reader AUROC | Average AUROC |
|---|
| Majority Class / Chance | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| Reading Speed | 48.6 ± 1.3 | 50.0 ± 0.9 | 49.4 ± 1.5 | 49.4 ± 0.7 | 48.4 ± 1.8 | 49.8 ± 1.0 | 50.6 ± 2.3 | 49.2 ± 0.8 |
| Text-Only Roberta | 59.2 ± 1.8 | 52.2 ± 0.8 | 53.0 ± 1.9 | 55.6 ± 1.3 | 67.4 ± 1.7 | 56.8 ± 1.2 | 58.9 ± 2.4 | 62.5 ± 1.2 |
| Logistic Regression [meziere2023using] | 52.1 ± 1.2 | 52.9 ± 0.5 | 54.0 ± 2.1 | 52.7 ± 0.6 | 53.0 ± 1.5 | 53.8 ± 0.8 | 53.1 ± 3.0 | 53.6 ± 0.7 |
| SVM [hollenstein2023zuco] | 50.9 ± 0.5 | 52.8 ± 0.8 | 52.8 ± 1.5 | 51.9 ± 0.5 | 50.9 ± 0.5 | 52.8 ± 0.8 | 52.8 ± 1.5 | 51.9 ± 0.5 |
| Random Forest [makowski2024detection] | 59.2 ± 0.8 | 54.6 ± 1.1 | 53.8 ± 1.6 | 56.9 ± 0.4 | 61.4 ± 0.8 | 56.8 ± 1.3 | 56.6 ± 1.7 | 59.3 ± 0.5 |
| AhnRNN [ahn2020towards] | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| AhnCNN [ahn2020towards] | 50.5 ± 0.5 | 50.2 ± 0.1 | 50.2 ± 0.2 | 50.3 ± 0.3 | 52.3 ± 0.7 | 51.3 ± 0.9 | 50.5 ± 2.0 | 51.8 ± 0.5 |
| BEyeLSTM [reich_inferring_2022] | 53.7 ± 0.6 | 52.0 ± 0.9 | 52.3 ± 1.7 | 52.8 ± 0.5 | 56.8 ± 1.0 | 54.9 ± 1.5 | 54.9 ± 2.3 | 55.5 ± 1.0 |
| PLM-AS [Yang2023PLMASPL] | 57.5 ± 1.2 | 52.0 ± 0.9 | 53.3 ± 1.2 | 54.8 ± 1.0 | 62.0 ± 1.2 | 54.0 ± 1.3 | 54.6 ± 1.9 | 58.1 ± 1.1 |
| PLM-AS-RM [haller2022eye] | 59.7 ± 0.6 | 52.3 ± 0.6 | 60.5 ± 2.0 | 56.4 ± 0.6 | 63.7 ± 0.8 | 55.1 ± 0.8 | 61.3 ± 1.8 | 59.9 ± 0.8 |
| RoBERTEye-W [Shubi2024Finegrained] | 59.3 ± 1.9 | 50.2 ± 0.7 | 52.5 ± 1.5 | 54.7 ± 1.1 | 68.1 ± 1.5 | 56.9 ± 1.0 | 57.8 ± 1.4 | 63.1 ± 1.0 |
| RoBERTEye-F [Shubi2024Finegrained] | 57.5 ± 1.2 | 51.0 ± 0.7 | 50.8 ± 0.7 | 54.0 ± 0.8 | 67.9 ± 1.4 | 57.4 ± 0.9 | 58.9 ± 2.8 | 63.2 ± 0.8 |
| MAG-Eye [Shubi2024Finegrained] | 58.4 ± 1.5 | 50.9 ± 0.5 | 51.4 ± 1.4 | 54.4 ± 0.8 | 68.5 ± 0.9 | 58.3 ± 0.9 | 60.4 ± 2.0 | 63.7 ± 0.8 |
| PostFusion-Eye [Shubi2024Finegrained] | 59.1 ± 2.0 | 53.2 ± 0.8 | 55.3 ± 2.1 | 56.1 ± 1.3 | 66.2 ± 0.9 | 58.5 ± 1.2 | 59.8 ± 2.8 | 62.5 ± 0.7 |