Passban, Peyman, Liu, Qun ORCID: 0000-0002-7000-1792 and Way, Andy ORCID: 0000-0001-5736-5930 (2016) Enriching phrase tables for statistical machine translation using mixed embeddings. In: COLING, the 26th International Conference on Computational Linguistics, 13-16 Dec 2016, Osaka, Japan. ISBN 978-4-87974-702-0
Abstract
The phrase table is considered to be the main bilingual resource for the phrase-based statistical machine translation (PBSMT) model. During translation, a source sentence is decomposed
into several phrases. The best match of each source phrase is selected among several target-side
counterparts within the phrase table, and processed by the decoder to generate a sentence-level
translation. The best match is chosen according to several factors, including a set of bilingual
features. PBSMT engines by default provide four probability scores in phrase tables which are
considered as the main set of bilingual features. Our goal is to enrich that set of features, as a
better feature set should yield better translations. We propose new scores generated by a Convolutional Neural Network (CNN) which indicate the semantic relatedness of phrase pairs. We
evaluate our model in different experimental settings with different language pairs. We observe
significant improvements when the proposed features are incorporated into the PBSMT pipeline.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Published in: | Proceedings of COLING, the 26th International Conference on Computational Linguisticss: Technical Papers. . Coling 2016 conference committee. ISBN 978-4-87974-702-0 |
Publisher: | Coling 2016 conference committee |
Official URL: | https://aclweb.org/anthology/C16-1243 |
Copyright Information: | © 2016 The Authors |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Science Foundation Ireland at ADAPT: Centre for Digital Content Platform Research (Grant 13/RC/2106). |
ID Code: | 23230 |
Deposited On: | 02 May 2019 11:35 by Thomas Murtagh . Last Modified 02 May 2019 11:35 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
480kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record