Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Transcriptomic profile based cancer disease prediction and patient survival time differentiation

ENGLISH ABSTRACT : Cancer disease is an abnormal growth of cells, which may be caused by mutations in genes which, as a result, alter the way cells function mainly in the way they grow and divide. Cancer cells are regulated by complex interactions mediated by a group of proteins and miRNAs which a...

Full description

Saved in:
Bibliographic Details
Main Author: Ofosu Mensah, Samuel
Other Authors: Mazandu, Gaston Kuzamunu
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2018
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613980369879040
access_status_str Open Access
author Ofosu Mensah, Samuel
author2 Mazandu, Gaston Kuzamunu
author_browse Mazandu, Gaston Kuzamunu
Ofosu Mensah, Samuel
author_facet Mazandu, Gaston Kuzamunu
Ofosu Mensah, Samuel
author_sort Ofosu Mensah, Samuel
collection Thesis
dc_rights_str_mv Stellenbosch University
description ENGLISH ABSTRACT : Cancer disease is an abnormal growth of cells, which may be caused by mutations in genes which, as a result, alter the way cells function mainly in the way they grow and divide. Cancer cells are regulated by complex interactions mediated by a group of proteins and miRNAs which are expressed and repressed. With the help of transcriptomic technologies such as RNA–sequencing (RNA–seq), it is now possible to profile thousands of genes at once to create a global picture of the functions of cells. Here, the study employs a statistical approach, called Significance Analysis of Microarray (SAM), to identify genes that are differentially expressed in breast cancer patients. Genes with scores greater than a threshold are deemed potentially significant. Genes identified as significantly different are used for twofold reasons. First, the study uses these significantly identified genes to predict breast cancer using three machine learning algorithms. The machine learning algorithms used are random forests, artificial neural networks and support vector machines. Secondly, clinical details of patients and significantly identified genes are combined to build a survival model to predict the probability of survival and risk to the event in breast cancer patients. Using The Cancer Genome Atlas (TCGA) as the primary data for the study, SAM reported 23 genes as significantly different. Further investigations revealed that these 23 significant genes are involved in tumour suppression, angiogenesis, cell growth factor, tumourigenesis, cell proliferation, tumour progression and tumour necrosis activities. In predicting breast cancer, 10 out of the 23 genes contribute significantly to the model. Finally, it was identified that log–logistic distribution best describes the survival time of breast cancer patients. Moreover, the survival model revealed that expression levels of six genes influence the survival probability of a breast cancer patient.
format Thesis
id oai:scholar.sun.ac.za:10019.1/105059
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:44:45.702Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2018
publishDateRange 2018
publishDateSort 2018
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/105059 Transcriptomic profile based cancer disease prediction and patient survival time differentiation Ofosu Mensah, Samuel Mazandu, Gaston Kuzamunu Utete, Simukai Wanzira Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Applied Mathematics. Cancer Genome Atlas (TCGA) Breast -- Cancer -- Research RNA–sequencing Survival analysis (Biometry) UCTD ENGLISH ABSTRACT : Cancer disease is an abnormal growth of cells, which may be caused by mutations in genes which, as a result, alter the way cells function mainly in the way they grow and divide. Cancer cells are regulated by complex interactions mediated by a group of proteins and miRNAs which are expressed and repressed. With the help of transcriptomic technologies such as RNA–sequencing (RNA–seq), it is now possible to profile thousands of genes at once to create a global picture of the functions of cells. Here, the study employs a statistical approach, called Significance Analysis of Microarray (SAM), to identify genes that are differentially expressed in breast cancer patients. Genes with scores greater than a threshold are deemed potentially significant. Genes identified as significantly different are used for twofold reasons. First, the study uses these significantly identified genes to predict breast cancer using three machine learning algorithms. The machine learning algorithms used are random forests, artificial neural networks and support vector machines. Secondly, clinical details of patients and significantly identified genes are combined to build a survival model to predict the probability of survival and risk to the event in breast cancer patients. Using The Cancer Genome Atlas (TCGA) as the primary data for the study, SAM reported 23 genes as significantly different. Further investigations revealed that these 23 significant genes are involved in tumour suppression, angiogenesis, cell growth factor, tumourigenesis, cell proliferation, tumour progression and tumour necrosis activities. In predicting breast cancer, 10 out of the 23 genes contribute significantly to the model. Finally, it was identified that log–logistic distribution best describes the survival time of breast cancer patients. Moreover, the survival model revealed that expression levels of six genes influence the survival probability of a breast cancer patient. AFRIKAANSE OPSOMMING : Kanker siekte is ’n abnormale groei van selle, wat veroorsaak kan word deur mutasies in gene, gevolglik, verander die manier waarop selle hoofsaaklik funksioneer in die manier waarop hulle groei en verdeel. Kanker selle word gereguleer deur komplekse interaksies gemedieer deur ’n groep proteïene en miRNAs wat uitgedruk en onderdruk word. Met behulp van transcriptomiese tegnologie soos RNA–sequencing (RNA - seq), is dit nou moontlik om duisende gene gelyktydig te profileer om ’n globale prentjie van die funksies van selle te skep. Hier gebruik die studie ’n statistiese benadering, genoem Significance Analysis of Microarray (SAM), om betekenisvolle gene te identifiseer wat differensieel uitgedruk word in borskankerpasiënte. Genes met tellings groter as ’n drempel word beskou as potensieel betekenisvol. Vervolgens gebruik die studie hierdie beduidende geïdentifiseerde gene om borskanker te voorspel deur gebruik te maak van drie machine learning algoritmes, insluitend random forests, artificial neural networks en support vector machines. Laastens word kliniese besonderhede van pasiënte en beduidende geïdentifiseerde gene gekombineer om ’n oorlewingsmodel te bou om die waarskynlikheid van oorlewing en risiko vir die gebeurtenis in pasiënte met borskanker te voorspel. Die risiko vir die geleentheid vir hierdie studie is die dood. Met behulp van The Cancer Genome Atlas (TCGA) as die primêre data vir die studie, het SAM 23 gene so beduidend anders aangedui. Verdere ondersoeke het getoon dat hierdie 23 belangrike gene betrokke was by tumour suppression, angiogenesis, sel groeifaktor, tumourigenesis, sel proliferasie, tumor progressie en tumor necrosis aktiwiteite. By die voorspel van borskanker dra 10 uit die 23 gene aansienlik by tot die model. Ten slotte is geïdentifiseer dat log–logistieke verspreiding die oorlewingstyd van pasiënte met borskanker die beste beskryf. Daarbenewens het die oorlewingsmodel geopenbaar dat uitdrukkingsvlakke van ses gene die oorlewingswaarskynlikheid van ’n pasiënt met borskanker beïnvloed. Die oorlewingsmodel het verder getoon dat borskanker pasiënte waarskynlik groter risiko vir die gebeurtenis sal hê, maar na 3243.38 dae kan hul risiko vir die gebeurtenis geleidelik verminder. 2018-11-27T13:26:42Z 2018-12-07T06:57:24Z 2018-11-27T13:26:42Z 2018-12-07T06:57:24Z 2018-12 Thesis http://hdl.handle.net/10019.1/105059 en_ZA Stellenbosch University application/pdf Stellenbosch : Stellenbosch University
spellingShingle Cancer Genome Atlas (TCGA)
Breast -- Cancer -- Research
RNA–sequencing
Survival analysis (Biometry)
UCTD
Ofosu Mensah, Samuel
Transcriptomic profile based cancer disease prediction and patient survival time differentiation
title Transcriptomic profile based cancer disease prediction and patient survival time differentiation
title_full Transcriptomic profile based cancer disease prediction and patient survival time differentiation
title_fullStr Transcriptomic profile based cancer disease prediction and patient survival time differentiation
title_full_unstemmed Transcriptomic profile based cancer disease prediction and patient survival time differentiation
title_short Transcriptomic profile based cancer disease prediction and patient survival time differentiation
title_sort transcriptomic profile based cancer disease prediction and patient survival time differentiation
topic Cancer Genome Atlas (TCGA)
Breast -- Cancer -- Research
RNA–sequencing
Survival analysis (Biometry)
UCTD
url http://hdl.handle.net/10019.1/105059
work_keys_str_mv AT ofosumensahsamuel transcriptomicprofilebasedcancerdiseasepredictionandpatientsurvivaltimedifferentiation