Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Tiomsú corpais don taighde foclóireachta: corpas foclóireachta na Gaeilge (CFG2020) [Corpus creation for lexicographical research: corpas foclóireachta na Gaeilge (CFG2020)]

Ó Meachair, Mícheál J., Ó Raghallaigh, Brian orcid logoORCID: 0000-0003-3813-1949, Bhreathnach, Úna orcid logoORCID: 0000-0002-6427-2633, Ó Cleircín, Gearóid and Scannell, Kevin orcid logoORCID: 0000-0003-4075-9524 (2021) Tiomsú corpais don taighde foclóireachta: corpas foclóireachta na Gaeilge (CFG2020) [Corpus creation for lexicographical research: corpas foclóireachta na Gaeilge (CFG2020)]. TEANGA: Iris Chumann na Teangeolaíochta Feidhmí in Éirinn/The Journal of the Irish Association for Applied Linguistics, 28 . pp. 278-305. ISSN 2565-6325

Abstract
Leagtar amach sa pháipéar seo na céimeanna a leanadh le Corpas Foclóireachta na Gaeilge 2020 (CFG2020), corpas aonteangach 77.3 milliún focal, a thiomsú. Mínítear comhthéacs an tionscadail agus na riachtanais a spreag na cinntí a tógadh lena linn. Déantar cur síos ansin ar chéim an tiomsaithe agus ar na céimeanna próiseála. Tugtar spléachadh ar inneachar an chorpais, ar an acmhainn a cruthaíodh lena chuardach, agus ar an gcineál anailíse agus taighde a cumasaíodh leis seo. Tiomsaíodh CFG2020 ar an tuiscint gur réamhchéim é ar thionscadal níos leithne corpais, is ar an gcúis sin a dhéantar moltaí i dtaca lena fheabhsú agus lena mhéadú. [This paper sets out the steps followed in the compilation of Corpas Foclóireachta na Gaeilge 2020 (CFG2020), a monolingual 77.3 million word Irish-language corpus. The context and circumstances of the project are explained, along with the motivation for various decisions made. The compilation and processing stages are described in detail. The contents of the corpus are outlined and the resource created to query CFG2020 is presented, along with reference to the kinds of analysis and research which it enables. CFG2020 was created as a first step towards a proposed larger corpus project, and suggestions for improvement and expansion are therefore proposed.]
Metadata
Item Type:Article (Published)
Refereed:Yes
Uncontrolled Keywords:Corpas foclóireachta; Foclóireacht; Corpais; Gaeilge; Lexicographic corpus; Lexicography; Corpora; Irish language
Subjects:Computer Science > Computational linguistics
Computer Science > Machine translating
Humanities > Irish language
Humanities > Library science
Humanities > Linguistics
Humanities > Translating and interpreting
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Humanities and Social Science > Fiontar agus Scoil na Gaeilge
Official URL:https://doi.org/10.35903/teanga.v28i.726
Copyright Information:© 2021 Irish Association for Applied Linguistics (CC BY-NC-ND 4.0)
ID Code:27824
Deposited On:06 Oct 2022 11:20 by ?na Bhreathnach . Last Modified 06 Oct 2022 11:20
Documents

Full text available as:

[thumbnail of teanga/article/view/726] PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 3.0
32kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record