Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Includes bibliographical references.
| Main Author: | |
|---|---|
| Other Authors: | |
| Format: | Thesis |
| Language: | English |
| Published: |
Department of Computer Science
2015
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613843740426240 |
|---|---|
| access_status_str | Open Access |
| author | Forshaw,Gareth William |
| author2 | Berman, Sonia |
| author_browse | Berman, Sonia Forshaw,Gareth William |
| author_facet | Berman, Sonia Forshaw,Gareth William |
| author_sort | Forshaw,Gareth William |
| collection | Thesis |
| description | Includes bibliographical references. |
| format | Thesis |
| id | oai:open.uct.ac.za:11427/12930 |
| institution | University of Cape Town (South Africa) |
| language | eng |
| last_indexed | 2026-06-10T12:42:35.740Z |
| license_str | Not specified — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository |
| publishDate | 2015 |
| publishDateRange | 2015 |
| publishDateSort | 2015 |
| publisher | Department of Computer Science |
| publisherStr | Department of Computer Science |
| record_format | dspace |
| source_str | UCTD — University of Cape Town Open Access Repository |
| spelling | oai:open.uct.ac.za:11427/12930 Semi-automatic matching of semi-structured data updates Forshaw,Gareth William Berman, Sonia Information Technology Includes bibliographical references. Data matching, also referred to as data linkage or field matching, is a technique used to combine multiple data sources into one data set. Data matching is used for data integration in a number of sectors and industries; from politics and health care to scientific applications. The motivation for this study was the observation of the day-to-day struggles of a large non-governmental organisation (NGO) in managing their membership database. With a membership base of close to 2.4 million, the challenges they face with regard to the capturing and processing of the semi-structured membership updates are monumental. Updates arrive from the field in a multitude of formats, often incomplete and unstructured, and expert knowledge is geographically localised. These issues are compounded by an extremely complex organisational hierarchy and a general lack of data validation processes. An online system was proposed for pre-processing input and then matching it against the membership database. Termed the Data Pre-Processing and Matching System (DPPMS), it allows for single or bulk updates. Based on the success of the DPPMS with the NGO’s membership database, it was subsequently used for pre-processing and data matching of semi-structured patient and financial customer data. Using the semi-automated DPPMS rather than a clerical data matching system, true positive matches increased by 21% while false negative matches decreased by 20%. The Recall, Precision and F-Measure values all improved and the risk of false positives diminished. The DPPMS was unable to match approximately 8% of provided records; this was largely due to human error during initial data capture. While the DPPMS greatly diminished the reliance on experts, their role remained pivotal during the final stage of the process. 2015-05-27T04:11:15Z 2015-05-27T04:11:15Z 2014 Master Thesis Masters MSc http://hdl.handle.net/11427/12930 eng application/pdf Department of Computer Science Faculty of Science University of Cape Town |
| spellingShingle | Information Technology Forshaw,Gareth William Semi-automatic matching of semi-structured data updates |
| thesis_degree_str | Master's |
| title | Semi-automatic matching of semi-structured data updates |
| title_full | Semi-automatic matching of semi-structured data updates |
| title_fullStr | Semi-automatic matching of semi-structured data updates |
| title_full_unstemmed | Semi-automatic matching of semi-structured data updates |
| title_short | Semi-automatic matching of semi-structured data updates |
| title_sort | semi automatic matching of semi structured data updates |
| topic | Information Technology |
| url | http://hdl.handle.net/11427/12930 |
| work_keys_str_mv | AT forshawgarethwilliam semiautomaticmatchingofsemistructureddataupdates |