Jain, Nishtha, Popović, Maja ORCID: 0000-0001-8234-8745, Groves, Declan and Specia, Lucia ORCID: 0000-0002-5495-3128 (2022) Leveraging pre-trained language models for gender debiasing. In: 13th Language Resources and Evaluation Conference, 20-25 June 2022, Marseille, France.
Abstract
Studying and mitigating gender and other biases in natural language have become important areas of research from both algorithmic and data perspectives. This paper explores the idea of reducing gender bias in a language generation context by generating gender variants of sentences. Previous work in this field has either been rule-based or required large amounts of gender balanced training data. These approaches are however not scalable across multiple languages, as creating data or rules for each language is costly and time-consuming. This work explores a light-weight method to generate gender variants for a given text using pre-trained language models as the resource, without any task-specific labelled data. The approach is designed to work on multiple languages with minimal changes in the form of heuristics. To showcase that, we have tested it on a high-resourced language, namely Spanish, and a low-resourced language from a different family, namely Serbian. The approach proved to work very well on Spanish, and while the results were less positive for Serbian, it showed potential even for languages where pre-trained models are less effective.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | gender debiasing; language generation; pre-trained language models |
Subjects: | Computer Science > Computational linguistics Computer Science > Machine learning Humanities > Language Social Sciences > Gender |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Published in: | Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). . European Language Resources Association (ELRA). |
Publisher: | European Language Resources Association (ELRA) |
Official URL: | https://aclanthology.org/2022.lrec-1.235 |
Copyright Information: | © European Language Resources Association (ELRA) |
ID Code: | 28365 |
Deposited On: | 25 May 2023 11:27 by Maja Popovic . Last Modified 25 May 2023 11:27 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial 4.0 255kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record