Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Creating a strong statistical machine translation system by combining different decoders

Machine translation is a very important field in Natural Language Processing. The need for machine translation arises due to the increasing amount of data available online. Most of our data now is digital and this is expected to increase over time. Since human manual translation takes a lot of time...

Full description

Saved in:
Bibliographic Details
Main Author: ElMaghraby, Ayah
Format: Thesis
Published: AUC Knowledge Fountain 2017
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613409415004160
access_status_str Open Access
author ElMaghraby, Ayah
author_browse ElMaghraby, Ayah
author_facet ElMaghraby, Ayah
author_sort ElMaghraby, Ayah
collection Thesis
dc_rights_str_mv The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy.
description Machine translation is a very important field in Natural Language Processing. The need for machine translation arises due to the increasing amount of data available online. Most of our data now is digital and this is expected to increase over time. Since human manual translation takes a lot of time and effort, machine translation is needed to cover all of the languages available. A lot of research has been done to make machine translation faster and more reliable between different language pairs. Machine translation is now being coupled with deep learning and neural networks. New topics in machine translation are being studied and tested like applying neural machine translation as a replacement to the classical statistical machine translation. In this thesis, we also study the effect of data-preprocessing and decoder type on translation output. We then demonstrate two ways to enhance translation from English to Arabic. The first approach uses a two-decoder system; the first decoder translates from English to Arabic and the second is a post-processing decoder that retranslates the first Arabic output to Arabic again to fix some of the translation errors. We then study the results of different kinds of decoders and their contributions to the test set. The results of this study lead to the second approach which combines different decoders to create a stronger one. The second approach uses a classifier to categorize the English sentences based on their structure. The output of the classifier is the decoder that is suited best to translate the English sentence. Both approaches increased the BLEU score albeit with different ranges. The classifier showed an increase of ~0.1 BLEU points while the post-processing decoder showed an increase of between ~0.3~11 BLEU points on two different test sets. Eventually we compare our results to Google translate to know how well we are doing in comparison to a well-known translator. Our best translation machine system scored 5 absolute points compared to Google translate in ISI corpus test set and we were 9 absolute points lower in the case of the UN corpus test set.
format Thesis
id oai:fount.aucegypt.edu:etds-1291
institution American University in Cairo (Egypt)
last_indexed 2026-06-10T12:35:41.195Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from AUC Knowledge Fountain — bepress
publishDate 2017
publishDateRange 2017
publishDateSort 2017
publisher AUC Knowledge Fountain
publisherStr AUC Knowledge Fountain
record_format dspace
source_str AUC Knowledge Fountain — bepress
spelling oai:fount.aucegypt.edu:etds-1291 Creating a strong statistical machine translation system by combining different decoders ElMaghraby, Ayah Machine translation is a very important field in Natural Language Processing. The need for machine translation arises due to the increasing amount of data available online. Most of our data now is digital and this is expected to increase over time. Since human manual translation takes a lot of time and effort, machine translation is needed to cover all of the languages available. A lot of research has been done to make machine translation faster and more reliable between different language pairs. Machine translation is now being coupled with deep learning and neural networks. New topics in machine translation are being studied and tested like applying neural machine translation as a replacement to the classical statistical machine translation. In this thesis, we also study the effect of data-preprocessing and decoder type on translation output. We then demonstrate two ways to enhance translation from English to Arabic. The first approach uses a two-decoder system; the first decoder translates from English to Arabic and the second is a post-processing decoder that retranslates the first Arabic output to Arabic again to fix some of the translation errors. We then study the results of different kinds of decoders and their contributions to the test set. The results of this study lead to the second approach which combines different decoders to create a stronger one. The second approach uses a classifier to categorize the English sentences based on their structure. The output of the classifier is the decoder that is suited best to translate the English sentence. Both approaches increased the BLEU score albeit with different ranges. The classifier showed an increase of ~0.1 BLEU points while the post-processing decoder showed an increase of between ~0.3~11 BLEU points on two different test sets. Eventually we compare our results to Google translate to know how well we are doing in comparison to a well-known translator. Our best translation machine system scored 5 absolute points compared to Google translate in ISI corpus test set and we were 9 absolute points lower in the case of the UN corpus test set. 2017-02-01T08:00:00Z thesis application/pdf https://fount.aucegypt.edu/etds/292 https://fount.aucegypt.edu/context/etds/article/1291/viewcontent/Thesis.pdf The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy. Theses and Dissertations AUC Knowledge Fountain Statistical Machine Translation English to Arabic Translation
spellingShingle Statistical Machine Translation
English to Arabic Translation
ElMaghraby, Ayah
Creating a strong statistical machine translation system by combining different decoders
title Creating a strong statistical machine translation system by combining different decoders
title_full Creating a strong statistical machine translation system by combining different decoders
title_fullStr Creating a strong statistical machine translation system by combining different decoders
title_full_unstemmed Creating a strong statistical machine translation system by combining different decoders
title_short Creating a strong statistical machine translation system by combining different decoders
title_sort creating a strong statistical machine translation system by combining different decoders
topic Statistical Machine Translation
English to Arabic Translation
url https://fount.aucegypt.edu/etds/292
https://fount.aucegypt.edu/context/etds/article/1291/viewcontent/Thesis.pdf
work_keys_str_mv AT elmaghrabyayah creatingastrongstatisticalmachinetranslationsystembycombiningdifferentdecoders