Alongside the evaluation task, we plan to conduct an experimental study to explore the feasibility of sociolinguistic profiling of AI models by comparing their outputs with human annotations collected on the Prolific platform.
The zero-shot run(s) outputs of participant systems obtained from GS detection (i.e., the main task) will be compared against crowdsourced annotations, with a focus on demographic groups divided by gender and by age ranges. This will allow us to analyze to which group(s) each system is closer in its judgments, providing insights into sociolinguistic patterns and potential biases.
Such a design enables a fine-grained analysis of how models reflect - or diverge from - the perspectives of distinct demographic groups.Β
For this reason, we encourage participants to experiment with different models and systems, so as to enrich the scope of this investigation.