Sentence similarity-based source context modelling in PBSMT

Haque, Rejwanul ORCID: 0000-0003-1680-0099, Kumar Naskar, Sudip, Way, Andy ORCID: 0000-0001-5736-5930, Costa-Jussá, Marta and Banchs, Rafael E. (2010) Sentence similarity-based source context modelling in PBSMT. In: the International Conference on Asian Language Processing 2010, 28-30 Dec. 2010, Harbin, China.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation (PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of supertags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with supertag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a supertag-based feature.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Conference
Refereed:	Yes
Uncontrolled Keywords:	sentence similarity; source context information; statistical machine translation
Subjects:	Computer Science > Machine translating
DCU Faculties and Centres:	Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in:	Asian Language Processing (IALP), 2010 International Conference on. . IEEE.
Publisher:	IEEE
Official URL:	http://dx.doi.org/10.1109/IALP.2010.45
Copyright Information:	© 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:	16158
Deposited On:	20 Jun 2011 13:21 by Shane Harper . Last Modified 18 Nov 2020 16:46

Documents

Full text available as:

[thumbnail of Sentence_Similarity-Based_Source_Context_Modelling_in_PBSMT.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
158kB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Sentence similarity-based source context modelling in PBSMT

Downloads