Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Exploring the application of Natural Language Processing to scientific medical cannabis publications

Cannabis has become recognised internationally as a powerful medicinal plant. The explosion of clinical research on cannabis has made it difficult for researchers and medical professionals to keep up to date with new findings. Analyzing the large quantities of available text data using natural langu...

Full description

Saved in:

Bibliographic Details
Main Author:	de Beer, James Charles
Other Authors:	Nyirenda, Juwa
Format:	Thesis
Language:	English
Published:	Department of Statistical Sciences 2023
Subjects:	Statistical Sciences
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613212226093056
access_status_str	Open Access
author	de Beer, James Charles
author2	Nyirenda, Juwa
author_browse	Nyirenda, Juwa de Beer, James Charles
author_facet	Nyirenda, Juwa de Beer, James Charles
author_sort	de Beer, James Charles
collection	Thesis
description	Cannabis has become recognised internationally as a powerful medicinal plant. The explosion of clinical research on cannabis has made it difficult for researchers and medical professionals to keep up to date with new findings. Analyzing the large quantities of available text data using natural language processing and machine learning algorithms could improve the speed and accuracy at which cannabis research is processed, as well as expose hitherto unknown connections between cannabis compounds and the treatment of healtth conditions. In turn, this would help direct future research and clinical trials. This thesis aims to develop an appropriate method to extract the key connections between cannabis compounds, human physiology and disease from the existing medical literature. First, natural language processing techniques (such as document clustering and topic modelling, global vector word embeddings and supervised document classifiers) are used to group 500 journal articles from the general literature on cannabis according to broad research topics; analyse the interaction between cannabis compounds, human physiology and diseases; and train a classifier to classify unseen documents. Second, the connections generated through this quantitative process are assessed qualitatively against those in a manual dataset of research findings from more than 500 studies collated over a number of years and provided by a medical company specialising in cannabis research. The results indicate that the methods developed were able to effectively and accurately demonstrate conenction between cannabis plant compounds and diseases. Hence, the working code accurately reproduced the results of manual analysis. This was shown by the close similarity of ranked key word to diseases. The unsupervised methods were able to effectively cluster and model topic distributions between the data to group documents by topic, while the supervised learning methods were able to accurately train models based on these suggestions, thereby solving a real-world practical problem in data management and analysis.
format	Thesis
id	oai:open.uct.ac.za:11427/37173
institution	University of Cape Town (South Africa)
language	eng
last_indexed	2026-06-10T12:32:33.381Z
license_str	Not specified — see source repository
provenance_str_mv	Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate	2023
publishDateRange	2023
publishDateSort	2023
publisher	Department of Statistical Sciences
publisherStr	Department of Statistical Sciences
record_format	dspace
source_str	UCTD — University of Cape Town Open Access Repository
spelling	oai:open.uct.ac.za:11427/37173 Exploring the application of Natural Language Processing to scientific medical cannabis publications de Beer, James Charles Nyirenda, Juwa Statistical Sciences Cannabis has become recognised internationally as a powerful medicinal plant. The explosion of clinical research on cannabis has made it difficult for researchers and medical professionals to keep up to date with new findings. Analyzing the large quantities of available text data using natural language processing and machine learning algorithms could improve the speed and accuracy at which cannabis research is processed, as well as expose hitherto unknown connections between cannabis compounds and the treatment of healtth conditions. In turn, this would help direct future research and clinical trials. This thesis aims to develop an appropriate method to extract the key connections between cannabis compounds, human physiology and disease from the existing medical literature. First, natural language processing techniques (such as document clustering and topic modelling, global vector word embeddings and supervised document classifiers) are used to group 500 journal articles from the general literature on cannabis according to broad research topics; analyse the interaction between cannabis compounds, human physiology and diseases; and train a classifier to classify unseen documents. Second, the connections generated through this quantitative process are assessed qualitatively against those in a manual dataset of research findings from more than 500 studies collated over a number of years and provided by a medical company specialising in cannabis research. The results indicate that the methods developed were able to effectively and accurately demonstrate conenction between cannabis plant compounds and diseases. Hence, the working code accurately reproduced the results of manual analysis. This was shown by the close similarity of ranked key word to diseases. The unsupervised methods were able to effectively cluster and model topic distributions between the data to group documents by topic, while the supervised learning methods were able to accurately train models based on these suggestions, thereby solving a real-world practical problem in data management and analysis. 2023-03-03T08:54:09Z 2023-03-03T08:54:09Z 2022 2023-02-20T12:31:55Z Master Thesis Masters MSc http://hdl.handle.net/11427/37173 eng application/pdf Department of Statistical Sciences Faculty of Science
spellingShingle	Statistical Sciences de Beer, James Charles Exploring the application of Natural Language Processing to scientific medical cannabis publications
thesis_degree_str	Master's
title	Exploring the application of Natural Language Processing to scientific medical cannabis publications
title_full	Exploring the application of Natural Language Processing to scientific medical cannabis publications
title_fullStr	Exploring the application of Natural Language Processing to scientific medical cannabis publications
title_full_unstemmed	Exploring the application of Natural Language Processing to scientific medical cannabis publications
title_short	Exploring the application of Natural Language Processing to scientific medical cannabis publications
title_sort	exploring the application of natural language processing to scientific medical cannabis publications
topic	Statistical Sciences
url	http://hdl.handle.net/11427/37173
work_keys_str_mv	AT debeerjamescharles exploringtheapplicationofnaturallanguageprocessingtoscientificmedicalcannabispublications

Full Text Available

Exploring the application of Natural Language Processing to scientific medical cannabis publications

Similar Items