Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Natural language interface to relational database: a simplified customization approach

Natural language interfaces to databases (NLIDB) allow end-users with no knowledge of a formal language like SQL to query databases. One of the main open problems currently investigated is the development of NLIDB systems that are easily portable across several domains. The present study focuses on...

Full description

Saved in:

Bibliographic Details
Main Author:	Mvumbi, Tresor
Other Authors:	Keet, Maria
Format:	Thesis
Language:	English
Published:	Department of Computer Science 2017
Subjects:	Computer Science
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613224637038592
access_status_str	Open Access
author	Mvumbi, Tresor
author2	Keet, Maria
author_browse	Keet, Maria Mvumbi, Tresor
author_facet	Keet, Maria Mvumbi, Tresor
author_sort	Mvumbi, Tresor
collection	Thesis
description	Natural language interfaces to databases (NLIDB) allow end-users with no knowledge of a formal language like SQL to query databases. One of the main open problems currently investigated is the development of NLIDB systems that are easily portable across several domains. The present study focuses on the development and evaluation of methods allowing to simplify customization of NLIDB targeting relational databases without sacrificing coverage and accuracy. This goal is approached by the introduction of two authoring frameworks that aim to reduce the workload required to port a NLIDB to a new domain. The first authoring approach is called top-down; it assumes the existence of a corpus of unannotated natural language sample questions used to pre-harvest key lexical terms to simplify customization. The top-down approach further reduces the configuration workload by autoincluding the semantics for negative form of verbs, comparative and superlative forms of adjectives in the configuration model. The second authoring approach introduced is bottom-up; it explores the possibility of building a configuration model with no manual customization using the information from the database schema and an off-the-shelf dictionary. The evaluation of the prototype system with geo-query, a benchmark query corpus, has shown that the top-down approach significantly reduces the customization workload: 93% of the entries defining the meaning of verbs and adjectives which represents the hard work has been automatically generated by the system; only 26 straightforward mappings and 3 manual definitions of meaning were required for customization. The top-down approach answered correctly 74.5 % of the questions. The bottom-up approach, however, has correctly answered only 1/3 of the questions due to insufficient lexicon and missing semantics. The use of an external lexicon did not improve the system's accuracy. The bottom-up model has nevertheless correctly answered 3/4 of the 105 simple retrieval questions in the query corpus not requiring nesting. Therefore, the bottom-up approach can be useful to build an initial lightweight configuration model that can be incrementally refined by using the failed queries to train a topdown model for example. The experimental results for top-down suggest that it is indeed possible to construct a portable NLIDB that reduces the configuration effort while maintaining a decent coverage and accuracy.
format	Thesis
id	oai:open.uct.ac.za:11427/23058
institution	University of Cape Town (South Africa)
language	eng
last_indexed	2026-06-10T12:32:44.899Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2017
publishDateRange	2017
publishDateSort	2017
publisher	Department of Computer Science
publisherStr	Department of Computer Science
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/23058 Natural language interface to relational database: a simplified customization approach Mvumbi, Tresor Keet, Maria Bagula, Antoine Computer Science Natural language interfaces to databases (NLIDB) allow end-users with no knowledge of a formal language like SQL to query databases. One of the main open problems currently investigated is the development of NLIDB systems that are easily portable across several domains. The present study focuses on the development and evaluation of methods allowing to simplify customization of NLIDB targeting relational databases without sacrificing coverage and accuracy. This goal is approached by the introduction of two authoring frameworks that aim to reduce the workload required to port a NLIDB to a new domain. The first authoring approach is called top-down; it assumes the existence of a corpus of unannotated natural language sample questions used to pre-harvest key lexical terms to simplify customization. The top-down approach further reduces the configuration workload by autoincluding the semantics for negative form of verbs, comparative and superlative forms of adjectives in the configuration model. The second authoring approach introduced is bottom-up; it explores the possibility of building a configuration model with no manual customization using the information from the database schema and an off-the-shelf dictionary. The evaluation of the prototype system with geo-query, a benchmark query corpus, has shown that the top-down approach significantly reduces the customization workload: 93% of the entries defining the meaning of verbs and adjectives which represents the hard work has been automatically generated by the system; only 26 straightforward mappings and 3 manual definitions of meaning were required for customization. The top-down approach answered correctly 74.5 % of the questions. The bottom-up approach, however, has correctly answered only 1/3 of the questions due to insufficient lexicon and missing semantics. The use of an external lexicon did not improve the system's accuracy. The bottom-up model has nevertheless correctly answered 3/4 of the 105 simple retrieval questions in the query corpus not requiring nesting. Therefore, the bottom-up approach can be useful to build an initial lightweight configuration model that can be incrementally refined by using the failed queries to train a topdown model for example. The experimental results for top-down suggest that it is indeed possible to construct a portable NLIDB that reduces the configuration effort while maintaining a decent coverage and accuracy. 2017-01-25T14:11:07Z 2017-01-25T14:11:07Z 2016 Master Thesis Masters MSc http://hdl.handle.net/11427/23058 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town
spellingShingle	Computer Science Mvumbi, Tresor Natural language interface to relational database: a simplified customization approach
thesis_degree_str	Master's
title	Natural language interface to relational database: a simplified customization approach
title_full	Natural language interface to relational database: a simplified customization approach
title_fullStr	Natural language interface to relational database: a simplified customization approach
title_full_unstemmed	Natural language interface to relational database: a simplified customization approach
title_short	Natural language interface to relational database: a simplified customization approach
title_sort	natural language interface to relational database a simplified customization approach
topic	Computer Science
url	http://hdl.handle.net/11427/23058
work_keys_str_mv	AT mvumbitresor naturallanguageinterfacetorelationaldatabaseasimplifiedcustomizationapproach

Full Text Available

Natural language interface to relational database: a simplified customization approach

Similar Items