Shterionov, Dimitar ORCID: 0000-0001-6300-797X, Du, Jinhua ORCID: 0000-0002-3267-4881, Palminteri, Marc Anthony, Casanellas, Laura, O'Dowd, Tony and Way, Andy ORCID: 0000-0001-5736-5930 (2016) Improving KantanMT training efficiency with fast align. In: Twelfth Conference of The Association for Machine Translation in the Americas, 28 Oct- 1 Nov 2016, Austin, TX, USA.
Abstract
In recent years, statistical machine translation (SMT) has been widely deployed in translators’
workflow with significant improvement of productivity. However, prior to invoking an SMT
system to translate an unknown text, an SMT engine needs to be built. As such, building speed
of the engine is essential for the translation workflow, i.e., the sooner an engine is built, the
sooner it will be exploited.
With the increase of the computational capabilities of recent technology the building time for
an SMT engine has decreased substantially. For example, cloud-based SMT providers, such as
KantanMT, can built high-quality, ready-to-use, custom SMT engines in less than a couple of
days. To speed-up furthermore this process we look into optimizing the word alignment process
that takes place during building the SMT engine. Namely, we substitute the word alignment
tool used by KantanMT pipeline – Giza++ – with a more efficient one, i.e., fast_align.
In this work we present the design and the implementation of the KantanMT pipeline that uses
fast_align in place of Giza++. We also conduct a comparison between the two word
alignment tools with industry data and report on our findings. Up to our knowledge, such
extensive empirical evaluation of the two tools has not been done before.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Published in: | Beregovaya, Olga, (ed.) Proceedings of AMTA 2016: MT Users' Track. 2. AMTA. |
Publisher: | AMTA |
Copyright Information: | © 2016 the Authors. CC-BY-ND |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
ID Code: | 23348 |
Deposited On: | 22 May 2019 15:34 by Thomas Murtagh . Last Modified 05 May 2020 15:58 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
195kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record