GSI:detect (Detecting Gender Stereotypes in Italian) is one of the new tasks of the EVALITA 2026 Evaluation campaign of NLP and Speech Tools for Italian.
A stereotype is defined as a pre-constituted, generalised and simplistic opinion, i.e. not based on personal evaluation of individual cases but mechanically repeated, about people or events and situations. Gender stereotypes, in particular, are often found in misogynistic hate speech but they often appear in non-hateful communication as well, and unconscious stereotypes can also be used with positive meaning.
The deconstruction of gender stereotypes is necessary for a more adequate description of communicative reality and above all to prevent discrimination. Investigating if LLMs can detect and correctly classify gender stereotypes may be a good way to understand if this could be of help to human experts in a wide array of practical tasks, such as avoiding gender stereotypes in teaching materials or in curricula vitae, or studying their presence in journalism.
In developing LLMs, diverse mitigation techniques are applied to avoid biases. Our research question goes one step beyond this: are they also able to discriminate between a stereotyped and a non-stereotyped sentence? To what extent can they do this?
The purpose of the GSI:detect Task is to promote research in the detection of gender stereotypes in different typologies of short Italian texts. It will focus, in particular, on testing systems' ability to identify the extent to which a sentence contains or refers to a stereotype, represented as a number between 0 and 1. In addition, it also include a sub-task on stereotype categorization, implemented as a multi-class classification task.