Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Low-resource neural machine translation for Southern African languages

Thesis (MSc)--Stellenbosch University, 2021.

Saved in:
Bibliographic Details
Main Author: Nyoni, Evander EL-Tabonah
Other Authors: Bassett, Bruce
Format: Thesis
Language:en_ZA
Published: 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613736989097984
access_status_str Open Access
author Nyoni, Evander EL-Tabonah
author2 Bassett, Bruce
author_browse Bassett, Bruce
Nyoni, Evander EL-Tabonah
author_facet Bassett, Bruce
Nyoni, Evander EL-Tabonah
author_sort Nyoni, Evander EL-Tabonah
collection Thesis
description Thesis (MSc)--Stellenbosch University, 2021.
format Thesis
id oai:scholar.sun.ac.za:10019.1/123667
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:40:53.839Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2021
publishDateRange 2021
publishDateSort 2021
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/123667 Low-resource neural machine translation for Southern African languages Nyoni, Evander EL-Tabonah Bassett, Bruce Brink, Willie transfer learning multilingual learning zero-shot learning BLEU Thesis (MSc)--Stellenbosch University, 2021. ENGLISH ABSTRACT: The majority of African languages have not fully benefited from the recent advances in machine translation due to lack of data. Motivated by this challenge we leverage and compare transfer learning, multilingual learning and zero-shot learning on three Southern Bantu languages (namely isiZulu, isiXhosa and Shona) and English. We focus primarily on the English-to-isiZulu pair, since it has the smallest number of training pairs (30000 sentences), comprising just 28% of the average size of the other corpora. We demonstrate the significant importance of language similarity on English-to-isiZulu translations by comparing transfer learning and multilingual learning on the Englishto- isiXhosa (similar) and English-to-Shona (dissimilar) tasks. We further show that multilingual learning is the best training protocol when there is sufficient data, with BLEU score gains of between 3.8 and 7.9 compared to transfer learning and zero-shot learning respectively for the English-to-isiZulu task. Our findings show that zero-shot learning is better than training a baseline model from scratch if there is not much English-toisiZulu data. Our best model improves the previous English-to-isiZulu state-of-the-art BLEU score by more than 10. Taken together, our findings highlight the potential of leveraging the inter-relations within and between South Eastern Bantu languages to improve translations in low-resource settings. AFRIKAANSE OPSOMMING: Die meeste Afrikatale het weens die gebrek aan data nie ten volle gebaat by die onlangse vooruitgang in masjienvertaling nie. Gemotiveer deur hierdie uitdaging benut en vergelyk ons oordragleer, veeltalige leer en nul-skoot leer op drie Suidelike Bantoe-tale (naamlik isiZulu, isiXhosa en Shona) en Engels. Ons fokus hoofsaaklik op die Engelstot- isiZulu-paar, aangesien dit die kleinste aantal opleidingspare (30000 sinne) het, wat slegs 28% van die gemiddelde grootte van die ander korpusse beslaan. Ons demonstreer die belangrikheid van taalgelykheid in vertalings tussen Engels en isiZulu deur die oordragleer en veeltalige leer op die take Engels-na-isiXhosa (soortgelyk) en Engelsna- Shona (verskillende) te vergelyk. Ons toon verder dat meertalige leer die beste opleidingsprotokol is as daar voldoende data is, met BLEU-tellingwinste van tussen 3.8 en 7.9 in vergelyking met onderskeidelik oordragleer en nul-skoot leer vir die Engels-naisiZulu- taak. Ons bevindinge toon dat zero-shot-leer beter is as om ’n basislynmodel van voor af op te lei as daar nie veel Engels-tot-isiZulu-data is nie. Ons beste model verbeter ook die vorige Engels-tot-isiZulu SOTA BLEU telling met meer as 10. Ons bevindings beklemtoon die potensiaal om die onderlinge verhoudings binne en tussen Suid-Oosterse Bantoe-tale te benut om vertalings in lae-hulpbron-instellings te verbeter. 2021-09-20T15:26:42Z 2021-12-22T14:14:54Z 2021-09-20T15:26:42Z 2021-12-22T14:14:54Z 2021-12 Thesis http://hdl.handle.net/10019.1/123667 en_ZA application/pdf
spellingShingle transfer learning
multilingual learning
zero-shot learning
BLEU
Nyoni, Evander EL-Tabonah
Low-resource neural machine translation for Southern African languages
title Low-resource neural machine translation for Southern African languages
title_full Low-resource neural machine translation for Southern African languages
title_fullStr Low-resource neural machine translation for Southern African languages
title_full_unstemmed Low-resource neural machine translation for Southern African languages
title_short Low-resource neural machine translation for Southern African languages
title_sort low resource neural machine translation for southern african languages
topic transfer learning
multilingual learning
zero-shot learning
BLEU
url http://hdl.handle.net/10019.1/123667
work_keys_str_mv AT nyonievandereltabonah lowresourceneuralmachinetranslationforsouthernafricanlanguages