The GSI:detect task aims at evaluating systems' ability to detect and classify gender stereotypes (GSs) across diverse types of short texts. It includes a compulsory main task and an optional sub-task, the latter strongly encouraged to enable a more comprehensive evaluation of the phenomenon.
Given a short text, the task requires systems to assign a numerical score, the GS value, that quantifies the degree to which the text contains or refers to a gender stereotype.
This is formulated as a regression task, where GS values are real numbers in the range [0,1], where 1 indicates the maximum degree of stereotypical content.
The annotation granularity is fixed at two decimal places.
‼️Example‼️
------------------------------------
Le donne non sono portate per la matematica e la logica, sono meglio nelle materie umanistiche.
➡️ GS value: 1
------------------------------------
The performance of participant systems in GS detection will be evaluated using Mean Squared Error (MSE), which measures the average of the squared differences between the actual and predicted GS values.
To make system ranking more intuitive, the MSE will be normalized, so that higher values correspond to better performance.
Given a short text, the task requires systems to assign it to one of the predefined Gender Stereotype (GS) categories listed below.
Each text must be classified into a single category, independently of the GS value assigned in the main Task.
GS categories are:
Role stereotypes: They include social and cultural expectations about what women and men should do and about how they should be;
Personality stereotypes: They assign emotional and behavioral traits to men and women based on their gender;
Competence stereotypes: They include generalized judgments of a person's abilities based on his/her gender;
Physical stereotypes: They refer to the expectations about the physical aspect of men and (especially) women and all aspects of personal care in general;
Sexual stereotypes: They refer to the attitude and behavior that men and women have with respect to sexuality;
Relational Stereotypes: They refer to the way in which women and men should behave in interpersonal/sentimental relations.
This is formulated as a multi-class classification task.
Note: For a more accurate understanding of the task, participants may refer to the official guidelines for stereotype classification, which were used by human annotators.
They can be consulted here.
‼️Example‼️
------------------------------------
Le donne non sono portate per la matematica e la logica, sono meglio nelle materie umanistiche.
➡️ GS category: Competence
------------------------------------
The performance of participant systems in GS classification will be evaluated using the F1 score, which balances precision and recall in a single metric.