Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Semi-automatic matching of semi-structured data updates

Includes bibliographical references.

Saved in:
Bibliographic Details
Main Author: Forshaw,Gareth William
Other Authors: Berman, Sonia
Format: Thesis
Language:English
Published: Department of Computer Science 2015
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613843740426240
access_status_str Open Access
author Forshaw,Gareth William
author2 Berman, Sonia
author_browse Berman, Sonia
Forshaw,Gareth William
author_facet Berman, Sonia
Forshaw,Gareth William
author_sort Forshaw,Gareth William
collection Thesis
description Includes bibliographical references.
format Thesis
id oai:open.uct.ac.za:11427/12930
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:42:35.740Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2015
publishDateRange 2015
publishDateSort 2015
publisher Department of Computer Science
publisherStr Department of Computer Science
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/12930 Semi-automatic matching of semi-structured data updates Forshaw,Gareth William Berman, Sonia Information Technology Includes bibliographical references. Data matching, also referred to as data linkage or field matching, is a technique used to combine multiple data sources into one data set. Data matching is used for data integration in a number of sectors and industries; from politics and health care to scientific applications. The motivation for this study was the observation of the day-to-day struggles of a large non-governmental organisation (NGO) in managing their membership database. With a membership base of close to 2.4 million, the challenges they face with regard to the capturing and processing of the semi-structured membership updates are monumental. Updates arrive from the field in a multitude of formats, often incomplete and unstructured, and expert knowledge is geographically localised. These issues are compounded by an extremely complex organisational hierarchy and a general lack of data validation processes. An online system was proposed for pre-processing input and then matching it against the membership database. Termed the Data Pre-Processing and Matching System (DPPMS), it allows for single or bulk updates. Based on the success of the DPPMS with the NGO’s membership database, it was subsequently used for pre-processing and data matching of semi-structured patient and financial customer data. Using the semi-automated DPPMS rather than a clerical data matching system, true positive matches increased by 21% while false negative matches decreased by 20%. The Recall, Precision and F-Measure values all improved and the risk of false positives diminished. The DPPMS was unable to match approximately 8% of provided records; this was largely due to human error during initial data capture. While the DPPMS greatly diminished the reliance on experts, their role remained pivotal during the final stage of the process. 2015-05-27T04:11:15Z 2015-05-27T04:11:15Z 2014 Master Thesis Masters MSc http://hdl.handle.net/11427/12930 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town
spellingShingle Information Technology
Forshaw,Gareth William
Semi-automatic matching of semi-structured data updates
thesis_degree_str Master's
title Semi-automatic matching of semi-structured data updates
title_full Semi-automatic matching of semi-structured data updates
title_fullStr Semi-automatic matching of semi-structured data updates
title_full_unstemmed Semi-automatic matching of semi-structured data updates
title_short Semi-automatic matching of semi-structured data updates
title_sort semi automatic matching of semi structured data updates
topic Information Technology
url http://hdl.handle.net/11427/12930
work_keys_str_mv AT forshawgarethwilliam semiautomaticmatchingofsemistructureddataupdates