PoTeC RC
Reading Comprehension (PoTeC)
Test
| Model | Unseen Reader Balanced Accuracy | Unseen Text Balanced Accuracy | Unseen Text and Reader Balanced Accuracy | Average Balanced Accuracy | Unseen Reader AUROC | Unseen Text AUROC | Unseen Text and Reader AUROC | Average AUROC |
|---|
| Majority Class / Chance | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| Reading Speed | 52.5 ± 1.2 | 50.7 ± 1.6 | 49.3 ± 3.6 | 51.3 ± 1.7 | 52.2 ± 1.5 | 51.1 ± 2.0 | 53.5 ± 6.1 | 51.9 ± 2.2 |
| Text-Only Roberta | 57.6 ± 0.9 | 48.8 ± 1.7 | 50.3 ± 1.7 | 51.7 ± 1.3 | 62.5 ± 1.5 | 48.2 ± 1.8 | 48.2 ± 2.4 | 56.3 ± 1.6 |
| Logistic Regression [meziere2023using] | 53.6 ± 0.9 | 53.9 ± 1.8 | 51.3 ± 2.1 | 52.9 ± 0.4 | 53.9 ± 1.8 | 55.9 ± 2.2 | 53.2 ± 0.6 | 54.1 ± 0.7 |
| SVM [hollenstein2023zuco] | 51.3 ± 0.9 | 50.1 ± 0.7 | 50.6 ± 0.9 | 50.6 ± 0.7 | 51.3 ± 0.9 | 50.1 ± 0.7 | 50.6 ± 0.9 | 50.6 ± 0.7 |
| Random Forest [makowski2024detection] | 55.8 ± 1.5 | 48.3 ± 1.4 | 48.2 ± 2.3 | 51.8 ± 1.2 | 59.2 ± 2.2 | 49.9 ± 1.7 | 46.1 ± 2.6 | 54.3 ± 1.2 |
| AhnRNN [ahn2020towards] | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| AhnCNN [ahn2020towards] | 50.0 ± 1.1 | 49.7 ± 1.2 | 49.5 ± 2.0 | 49.4 ± 0.9 | 51.4 ± 1.8 | 53.9 ± 2.6 | 47.4 ± 2.3 | 51.6 ± 2.0 |
| BEyeLSTM [reich_inferring_2022] | 58.5 ± 1.1 | 51.6 ± 0.9 | 51.0 ± 1.6 | 53.2 ± 0.8 | 61.1 ± 1.9 | 51.5 ± 2.7 | 51.7 ± 4.2 | 54.7 ± 1.4 |
| PLM-AS [Yang2023PLMASPL] | 54.6 ± 1.5 | 50.1 ± 0.6 | 50.0 ± 1.2 | 52.1 ± 0.8 | 58.3 ± 0.6 | 53.8 ± 0.6 | 53.7 ± 1.0 | 56.5 ± 0.3 |
| PLM-AS-RM [haller2022eye] | 58.1 ± 1.0 | 49.0 ± 1.0 | 46.1 ± 1.7 | 53.9 ± 1.4 | 61.8 ± 1.1 | 53.2 ± 2.3 | 51.5 ± 4.7 | 59.0 ± 1.2 |
| RoBERTEye-W [Shubi2024Finegrained] | 58.1 ± 1.0 | 49.7 ± 0.1 | 48.8 ± 0.7 | 52.6 ± 0.9 | 61.1 ± 0.5 | 51.7 ± 2.5 | 49.7 ± 3.2 | 56.8 ± 1.2 |
| RoBERTEye-F [Shubi2024Finegrained] | 50.2 ± 0.2 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.1 | 57.3 ± 2.7 | 52.7 ± 2.1 | 49.1 ± 3.7 | 54.7 ± 2.2 |
| MAG-Eye [Shubi2024Finegrained] | 59.3 ± 1.1 | 49.8 ± 0.2 | 49.1 ± 0.8 | 54.2 ± 1.1 | 63.7 ± 1.5 | 48.7 ± 2.4 | 48.6 ± 3.9 | 58.3 ± 1.3 |
| PostFusion-Eye [Shubi2024Finegrained] | 52.6 ± 1.6 | 50.1 ± 1.1 | 50.0 ± 1.6 | 51.3 ± 0.7 | 56.6 ± 2.0 | 51.2 ± 2.4 | 48.6 ± 2.6 | 53.0 ± 1.8 |
Validation
| Model | Unseen Reader Balanced Accuracy | Unseen Text Balanced Accuracy | Unseen Text and Reader Balanced Accuracy | Average Balanced Accuracy | Unseen Reader AUROC | Unseen Text AUROC | Unseen Text and Reader AUROC | Average AUROC |
|---|
| Majority Class / Chance | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| Reading Speed | 50.0 ± 1.5 | 51.1 ± 1.5 | 46.7 ± 3.9 | 49.7 ± 1.9 | 49.5 ± 2.1 | 52.7 ± 1.7 | 45.2 ± 5.9 | 50.0 ± 2.6 |
| Text-Only Roberta | 60.1 ± 1.9 | 50.0 ± 0.0 | 50.0 ± 0.0 | 55.3 ± 1.2 | 63.5 ± 2.4 | 56.8 ± 4.8 | 55.2 ± 5.3 | 59.8 ± 1.3 |
| Logistic Regression [meziere2023using] | 52.1 ± 2.4 | 53.8 ± 2.1 | 51.8 ± 1.2 | 52.9 ± 0.4 | 52.7 ± 3.0 | 55.2 ± 2.8 | 50.5 ± 2.0 | 53.7 ± 0.8 |
| SVM [hollenstein2023zuco] | 51.4 ± 1.2 | 54.8 ± 1.7 | 49.2 ± 2.0 | 52.1 ± 0.8 | 51.4 ± 1.2 | 54.8 ± 1.7 | 49.2 ± 2.0 | 52.1 ± 0.8 |
| Random Forest [makowski2024detection] | 58.5 ± 1.5 | 51.8 ± 0.9 | 52.1 ± 1.0 | 55.2 ± 1.0 | 57.9 ± 1.4 | 54.9 ± 2.5 | 51.5 ± 2.4 | 56.1 ± 1.5 |
| AhnRNN [ahn2020towards] | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 | 50.0 ± 0.0 |
| AhnCNN [ahn2020towards] | 52.1 ± 0.4 | 52.0 ± 1.9 | 49.9 ± 0.9 | 51.7 ± 0.9 | 54.3 ± 1.2 | 54.7 ± 2.1 | 52.1 ± 1.6 | 54.4 ± 1.4 |
| BEyeLSTM [reich_inferring_2022] | 58.6 ± 1.1 | 53.0 ± 1.7 | 52.8 ± 1.7 | 56.8 ± 0.5 | 62.1 ± 1.8 | 56.0 ± 2.4 | 60.1 ± 2.1 | 60.7 ± 1.2 |
| PLM-AS [Yang2023PLMASPL] | 54.9 ± 0.5 | 51.0 ± 1.6 | 51.8 ± 1.0 | 53.3 ± 0.5 | 59.8 ± 1.0 | 51.4 ± 1.4 | 55.2 ± 2.4 | 56.2 ± 0.4 |
| PLM-AS-RM [haller2022eye] | 59.4 ± 0.7 | 53.3 ± 1.5 | 51.6 ± 1.3 | 56.8 ± 0.7 | 63.0 ± 1.4 | 53.8 ± 2.2 | 50.4 ± 3.4 | 57.8 ± 0.6 |
| RoBERTEye-W [Shubi2024Finegrained] | 60.5 ± 0.2 | 52.4 ± 1.5 | 52.7 ± 2.5 | 56.7 ± 0.8 | 63.1 ± 1.8 | 54.0 ± 1.0 | 54.8 ± 2.9 | 59.1 ± 0.3 |
| RoBERTEye-F [Shubi2024Finegrained] | 51.0 ± 0.9 | 49.6 ± 0.4 | 50.1 ± 0.1 | 50.5 ± 0.5 | 56.5 ± 3.7 | 54.8 ± 2.5 | 54.6 ± 2.0 | 56.8 ± 1.0 |
| MAG-Eye [Shubi2024Finegrained] | 59.9 ± 1.9 | 53.3 ± 2.2 | 51.5 ± 1.5 | 56.7 ± 1.0 | 65.4 ± 2.1 | 57.7 ± 1.8 | 58.7 ± 1.6 | 61.7 ± 0.8 |
| PostFusion-Eye [Shubi2024Finegrained] | 54.8 ± 1.6 | 51.4 ± 1.2 | 51.8 ± 0.8 | 53.3 ± 1.0 | 56.9 ± 1.2 | 53.1 ± 2.2 | 57.5 ± 2.7 | 55.9 ± 1.6 |