Hassan, Hany (2009) Lexical syntax for statistical machine translation. PhD thesis, Dublin City University.
Abstract
Statistical Machine Translation (SMT) is by far the most dominant paradigm of Machine
Translation. This can be justified by many reasons, such as accuracy, scalability, computational
efficiency and fast adaptation to new languages and domains. However, current
approaches of Phrase-based SMT lacks the capabilities of producing more grammatical
translations and handling long-range reordering while maintaining the grammatical structure
of the translation output. Recently, SMT researchers started to focus on extending
Phrase-based SMT systems with syntactic knowledge; however, the previous techniques
have limited capabilities due to introducing redundantly ambiguous syntactic structures
and using decoders with limited language models, and with a high computational cost.
In this thesis, we extend Phrase-based SMT with lexical syntactic descriptions that
localize global syntactic information on the word without introducing syntactic redundant
ambiguity. We presente a novel model of Phrase-based SMT which integrates linguistic
lexical descriptions —supertags— into the target language model and the target side of
the translation model. We conduct extensive experiments in two language pairs, Arabic–
English and German–English, which show significant improvements over the state-ofthe-
art Phrase-based SMT systems.
Moreover, we introduce a novel Incremental Dependency-based Syntactic Language
Model (IDLM) based on wide-coverage CCG incremental parsing which we integrate
into a direct translation SMT system. Our proposed approach is the first to integrate
full dependency parsing in SMT systems with a very attractive computational cost since it
deploys the linear decoders widely used in Phrase–based SMT systems. The experimental
results show a good improvement over a top-ranked state-of-the-art system.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | March 2009 |
Refereed: | No |
Supervisor(s): | Way, Andy and Sima'an, Khalil |
Subjects: | Computer Science > Computational linguistics Computer Science > Machine translating Computer Science > Machine learning |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Science Foundation Ireland |
ID Code: | 2320 |
Deposited On: | 02 Apr 2009 16:56 by Andrew Way . Last Modified 19 Jul 2018 14:43 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record