Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Grammars for generating isiXhosa and isiZulu weather bulletin verbs

The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summarie...

Full description

Saved in:
Bibliographic Details
Main Author: Mahlaza, Zola
Other Authors: Keet, C Maria
Format: Thesis
Language:English
Published: Department of Computer Science 2018
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613323647778816
access_status_str Open Access
author Mahlaza, Zola
author2 Keet, C Maria
author_browse Keet, C Maria
Mahlaza, Zola
author_facet Keet, C Maria
Mahlaza, Zola
author_sort Mahlaza, Zola
collection Thesis
description The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects.
format Thesis
id oai:open.uct.ac.za:11427/27997
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:34:17.944Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2018
publishDateRange 2018
publishDateSort 2018
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/27997 Grammars for generating isiXhosa and isiZulu weather bulletin verbs Mahlaza, Zola Keet, C Maria Natural Language Generation Computational Linguistics The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects. 2018-05-07T14:23:55Z 2018-05-07T14:23:55Z 2018 Master Thesis Masters MSc http://hdl.handle.net/11427/27997 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town
spellingShingle Natural Language Generation
Computational Linguistics
Mahlaza, Zola
Grammars for generating isiXhosa and isiZulu weather bulletin verbs
thesis_degree_str Master's
title Grammars for generating isiXhosa and isiZulu weather bulletin verbs
title_full Grammars for generating isiXhosa and isiZulu weather bulletin verbs
title_fullStr Grammars for generating isiXhosa and isiZulu weather bulletin verbs
title_full_unstemmed Grammars for generating isiXhosa and isiZulu weather bulletin verbs
title_short Grammars for generating isiXhosa and isiZulu weather bulletin verbs
title_sort grammars for generating isixhosa and isizulu weather bulletin verbs
topic Natural Language Generation
Computational Linguistics
url http://hdl.handle.net/11427/27997
work_keys_str_mv AT mahlazazola grammarsforgeneratingisixhosaandisizuluweatherbulletinverbs