Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu

Natural Language Generation (NLG) systems are used to generate text in order to reduce manual effort. Most existing systems are built to support European languages with simple and/or well-documented grammars. IsiZulu and isiXhosa, two of the largest South African languages by first language speakers...

Full description

Saved in:
Bibliographic Details
Main Author: Mahlaza, Zola
Other Authors: Keet, Catharina
Format: Thesis
Language:English
Published: Department of Computer Science 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Natural Language Generation (NLG) systems are used to generate text in order to reduce manual effort. Most existing systems are built to support European languages with simple and/or well-documented grammars. IsiZulu and isiXhosa, two of the largest South African languages by first language speakers, have not received a lot of attention in the field despite the potential impact of NLG systems for their speakers. The existing NLG systems created for these languages rely on ad hoc methods for surface realisation. Surface realisation is the process of generating text from a system's abstract representations of sentences. The aforementioned methods combine templates and grammar rules since the languages are low-resourced and grammatically rich. However, do not use their scant linguistic resources efficiently, they do not rely on a template specification that supports interoperability, and do not use an architecture that yields easy-to-maintain software since none exists. The objectives of this thesis are to create the foundations for easy to maintain and reusable surface realisation tools for isiXhosa and isiZulu by establishing a principled way to pair templates and grammar rules, organise surface realisation modules such that the components are modular, analysable, and reusable, and create template specifications that are interoperable. In addition, it is to demonstrate that aforementioned objectives can be achieved while generating good quality isiXhosa and isiZulu text in the data-to-text and knowledge-to-text areas. We achieve these objectives by developing a model-based approach of pairing templates and Computational Grammar Rules (CGRs) to obtain linguistically wellfounded templates that are suitable for low-resourced and grammatically rich languages. To obtain interoperable template specifications, we created a task ontology using a bottom-up approach and evaluated it via the standard practice of using Competency Questions (CQs) and removing inconsistencies via an automated reasoner. We also created an architecture that satisfies the most maintainability features from the BS ISO/IEC 25010:2011 standard. In addition, we created proof-of-concept text generation tools that use the proposed approaches and artifacts to generate isiZulu and isiXhosa text and surveyed speakers of the two languages to establish the quality of the text. We have found that most (57%) of the generated isiXhosa texts are judged positively and there is no consensus on the remaining texts, possibly due to differences in dialect. In addition, most (83%) of the generated isiZulu texts are also judged positively as they have at most one participant who considers them to be ungrammatical and unacceptable.