Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu

Natural Language Generation (NLG) systems are used to generate text in order to reduce manual effort. Most existing systems are built to support European languages with simple and/or well-documented grammars. IsiZulu and isiXhosa, two of the largest South African languages by first language speakers...

Full description

Saved in:
Bibliographic Details
Main Author: Mahlaza, Zola
Other Authors: Keet, Catharina
Format: Thesis
Language:English
Published: Department of Computer Science 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613272855805952
access_status_str Open Access
author Mahlaza, Zola
author2 Keet, Catharina
author_browse Keet, Catharina
Mahlaza, Zola
author_facet Keet, Catharina
Mahlaza, Zola
author_sort Mahlaza, Zola
collection Thesis
description Natural Language Generation (NLG) systems are used to generate text in order to reduce manual effort. Most existing systems are built to support European languages with simple and/or well-documented grammars. IsiZulu and isiXhosa, two of the largest South African languages by first language speakers, have not received a lot of attention in the field despite the potential impact of NLG systems for their speakers. The existing NLG systems created for these languages rely on ad hoc methods for surface realisation. Surface realisation is the process of generating text from a system's abstract representations of sentences. The aforementioned methods combine templates and grammar rules since the languages are low-resourced and grammatically rich. However, do not use their scant linguistic resources efficiently, they do not rely on a template specification that supports interoperability, and do not use an architecture that yields easy-to-maintain software since none exists. The objectives of this thesis are to create the foundations for easy to maintain and reusable surface realisation tools for isiXhosa and isiZulu by establishing a principled way to pair templates and grammar rules, organise surface realisation modules such that the components are modular, analysable, and reusable, and create template specifications that are interoperable. In addition, it is to demonstrate that aforementioned objectives can be achieved while generating good quality isiXhosa and isiZulu text in the data-to-text and knowledge-to-text areas. We achieve these objectives by developing a model-based approach of pairing templates and Computational Grammar Rules (CGRs) to obtain linguistically wellfounded templates that are suitable for low-resourced and grammatically rich languages. To obtain interoperable template specifications, we created a task ontology using a bottom-up approach and evaluated it via the standard practice of using Competency Questions (CQs) and removing inconsistencies via an automated reasoner. We also created an architecture that satisfies the most maintainability features from the BS ISO/IEC 25010:2011 standard. In addition, we created proof-of-concept text generation tools that use the proposed approaches and artifacts to generate isiZulu and isiXhosa text and surveyed speakers of the two languages to establish the quality of the text. We have found that most (57%) of the generated isiXhosa texts are judged positively and there is no consensus on the remaining texts, possibly due to differences in dialect. In addition, most (83%) of the generated isiZulu texts are also judged positively as they have at most one participant who considers them to be ungrammatical and unacceptable.
format Thesis
id oai:open.uct.ac.za:11427/37479
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:33:31.121Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/37479 Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu Mahlaza, Zola Keet, Catharina Computer Science Natural Language Generation (NLG) systems are used to generate text in order to reduce manual effort. Most existing systems are built to support European languages with simple and/or well-documented grammars. IsiZulu and isiXhosa, two of the largest South African languages by first language speakers, have not received a lot of attention in the field despite the potential impact of NLG systems for their speakers. The existing NLG systems created for these languages rely on ad hoc methods for surface realisation. Surface realisation is the process of generating text from a system's abstract representations of sentences. The aforementioned methods combine templates and grammar rules since the languages are low-resourced and grammatically rich. However, do not use their scant linguistic resources efficiently, they do not rely on a template specification that supports interoperability, and do not use an architecture that yields easy-to-maintain software since none exists. The objectives of this thesis are to create the foundations for easy to maintain and reusable surface realisation tools for isiXhosa and isiZulu by establishing a principled way to pair templates and grammar rules, organise surface realisation modules such that the components are modular, analysable, and reusable, and create template specifications that are interoperable. In addition, it is to demonstrate that aforementioned objectives can be achieved while generating good quality isiXhosa and isiZulu text in the data-to-text and knowledge-to-text areas. We achieve these objectives by developing a model-based approach of pairing templates and Computational Grammar Rules (CGRs) to obtain linguistically wellfounded templates that are suitable for low-resourced and grammatically rich languages. To obtain interoperable template specifications, we created a task ontology using a bottom-up approach and evaluated it via the standard practice of using Competency Questions (CQs) and removing inconsistencies via an automated reasoner. We also created an architecture that satisfies the most maintainability features from the BS ISO/IEC 25010:2011 standard. In addition, we created proof-of-concept text generation tools that use the proposed approaches and artifacts to generate isiZulu and isiXhosa text and surveyed speakers of the two languages to establish the quality of the text. We have found that most (57%) of the generated isiXhosa texts are judged positively and there is no consensus on the remaining texts, possibly due to differences in dialect. In addition, most (83%) of the generated isiZulu texts are also judged positively as they have at most one participant who considers them to be ungrammatical and unacceptable. 2023-03-17T10:20:15Z 2023-03-17T10:20:15Z 2022 2023-03-17T08:41:17Z Doctoral Thesis Doctoral PhD http://hdl.handle.net/11427/37479 eng application/pdf Department of Computer Science Faculty of Science
spellingShingle Computer Science
Mahlaza, Zola
Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu
thesis_degree_str Doctoral
title Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu
title_full Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu
title_fullStr Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu
title_full_unstemmed Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu
title_short Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu
title_sort foundations for reusable and maintainable surface realisers for isixhosa and isizulu
topic Computer Science
url http://hdl.handle.net/11427/37479
work_keys_str_mv AT mahlazazola foundationsforreusableandmaintainablesurfacerealisersforisixhosaandisizulu