Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2020.
| Other Authors: | |
|---|---|
| Format: | Thesis |
| Language: | English |
| Published: |
University of Pretoria
2022
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613669894914048 |
|---|---|
| access_status_str | Open Access |
| author2 | Marivate, Vukosi |
| author_browse | Marivate, Vukosi |
| author_facet | Marivate, Vukosi |
| collection | Thesis |
| dc_rights_str_mv | © 2021 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. |
| description | Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2020. |
| format | Thesis |
| id | oai:repository.up.ac.za:2263/83181 |
| institution | University of Pretoria (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:39:49.883Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository |
| publishDate | 2022 |
| publishDateRange | 2022 |
| publishDateSort | 2022 |
| publisher | University of Pretoria |
| publisherStr | University of Pretoria |
| record_format | dspace |
| source_str | UPSpace — University of Pretoria Institutional Repository |
| spelling | oai:repository.up.ac.za:2263/83181 Identifying financial risk through natural language processing of company annual reports Marivate, Vukosi u19333634@tuks.co.za Theron, Jacques Lamont UCTD Financial risk Company annual reports Natural language processing (NLP) Machine learning Classification Closed domain question answering Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2020. A pipeline was developed to source annual reports of South African banks and convert them into a novel corpus. Plain text was extracted from unstructured reports whilst maintaining lineage to its coordinates in the original Portable Document Format (PDF). Initial experiments with Natural Language Processing (NLP) and machine learning classification aim at exposing financial risk inherent in the text as opposed to analysing the numerical financial values. Failed financial or governance events related to banks in the public domain were used to label annual reports as high risk. The balance of the reports were annotated as low risk to formulate a binary classification problem for machine learning. Bag of words and word embedding techniques were applied and supplemented with linguistic features like tone, uncertainty and causality based on available wordlists. Classifiers were built using traditional logistic regression and Support Vector Machine (SVM), as well as modern Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) deep learning models. The corpus and initial findings provide a baseline for further research. Applications include an early warning system for regulators as well as question answering based on the content. Computer Science MIT (Big Data Science) Unrestricted 2022-01-12T06:00:02Z 2022-01-12T06:00:02Z 2021/04/13 2020 Mini Dissertation * A2021 http://hdl.handle.net/2263/83181 en © 2021 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria |
| spellingShingle | UCTD Financial risk Company annual reports Natural language processing (NLP) Machine learning Classification Closed domain question answering Identifying financial risk through natural language processing of company annual reports |
| title | Identifying financial risk through natural language processing of company annual reports |
| title_full | Identifying financial risk through natural language processing of company annual reports |
| title_fullStr | Identifying financial risk through natural language processing of company annual reports |
| title_full_unstemmed | Identifying financial risk through natural language processing of company annual reports |
| title_short | Identifying financial risk through natural language processing of company annual reports |
| title_sort | identifying financial risk through natural language processing of company annual reports |
| topic | UCTD Financial risk Company annual reports Natural language processing (NLP) Machine learning Classification Closed domain question answering |
| url | http://hdl.handle.net/2263/83181 |