Lohar, Pintu ORCID: 0000-0002-5328-1585, Ganguly, Debasis ORCID: 0000-0003-0050-7138, Afli, Haithem ORCID: 0000-0002-7449-4707, Way, Andy ORCID: 0000-0001-5736-5930 and Jones, Gareth J.F. ORCID: 0000-0003-2923-8365 (2016) FaDA: fast document aligner using word embedding. Prague Bulletin of Mathematical Linguistics (106). pp. 169-179. ISSN 1804-0462
Abstract
FaDA is a free/open-source tool for aligning multilingual documents. It employs a novel
crosslingual information retrieval (CLIR)-based document-alignment algorithm involving the
distances between embedded word vectors in combination with the word overlap between the
source-language and the target-language documents. In this approach, we initially construct a
pseudo-query from a source-language document. We then represent the target-language documents and the pseudo-query as word vectors to find the average similarity measure between
them. This word vector-based similarity measure is then combined with the term overlap-based
similarity. Our initial experiments show that s standard Statistical Machine Translation (SMT)-
based approach is outperformed by our CLIR-based approach in finding the correct alignment
pairs. In addition to this, subsequent experiments with the word vector-based method show
further improvements in the performance of the system.
Metadata
Item Type: | Article (Published) |
---|---|
Refereed: | Yes |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Publisher: | PBML |
Official URL: | http://dx.doi.org/10.1515/pralin-2016-0016. |
Copyright Information: | © 2016 De Gruyter Open. Distributed under CC BY-NC-ND |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | y Science Foundation Ireland in the ADAPT Centre (Grant 13/RC/2106) (www.adaptcentre.ie) at Dublin City University |
ID Code: | 23310 |
Deposited On: | 17 May 2019 13:16 by Thomas Murtagh . Last Modified 05 May 2023 16:27 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
185kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record