Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Transcriptomic profile based cancer disease prediction and patient survival time differentiation

ENGLISH ABSTRACT : Cancer disease is an abnormal growth of cells, which may be caused by mutations in genes which, as a result, alter the way cells function mainly in the way they grow and divide. Cancer cells are regulated by complex interactions mediated by a group of proteins and miRNAs which a...

Full description

Saved in:
Bibliographic Details
Main Author: Ofosu Mensah, Samuel
Other Authors: Mazandu, Gaston Kuzamunu
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2018
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:ENGLISH ABSTRACT : Cancer disease is an abnormal growth of cells, which may be caused by mutations in genes which, as a result, alter the way cells function mainly in the way they grow and divide. Cancer cells are regulated by complex interactions mediated by a group of proteins and miRNAs which are expressed and repressed. With the help of transcriptomic technologies such as RNA–sequencing (RNA–seq), it is now possible to profile thousands of genes at once to create a global picture of the functions of cells. Here, the study employs a statistical approach, called Significance Analysis of Microarray (SAM), to identify genes that are differentially expressed in breast cancer patients. Genes with scores greater than a threshold are deemed potentially significant. Genes identified as significantly different are used for twofold reasons. First, the study uses these significantly identified genes to predict breast cancer using three machine learning algorithms. The machine learning algorithms used are random forests, artificial neural networks and support vector machines. Secondly, clinical details of patients and significantly identified genes are combined to build a survival model to predict the probability of survival and risk to the event in breast cancer patients. Using The Cancer Genome Atlas (TCGA) as the primary data for the study, SAM reported 23 genes as significantly different. Further investigations revealed that these 23 significant genes are involved in tumour suppression, angiogenesis, cell growth factor, tumourigenesis, cell proliferation, tumour progression and tumour necrosis activities. In predicting breast cancer, 10 out of the 23 genes contribute significantly to the model. Finally, it was identified that log–logistic distribution best describes the survival time of breast cancer patients. Moreover, the survival model revealed that expression levels of six genes influence the survival probability of a breast cancer patient.