Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Statistical model selection techniques for the cox proportional hazards model: a comparative study

The advancement in data acquiring technology continues to see survival data sets with many covariates. This has posed a new challenge for researchers in identifying important covariates for inference and prediction for a time-to-event response variable. In this dissertation, common Cox proportional...

Full description

Saved in:
Bibliographic Details
Main Author: Njati, Jolando
Other Authors: Gumedze, Freedom
Format: Thesis
Language:English
Published: Department of Statistical Sciences 2022
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613248118849536
access_status_str Open Access
author Njati, Jolando
author2 Gumedze, Freedom
author_browse Gumedze, Freedom
Njati, Jolando
author_facet Gumedze, Freedom
Njati, Jolando
author_sort Njati, Jolando
collection Thesis
description The advancement in data acquiring technology continues to see survival data sets with many covariates. This has posed a new challenge for researchers in identifying important covariates for inference and prediction for a time-to-event response variable. In this dissertation, common Cox proportional hazards model selection techniques and a random survival forest technique were compared using five performance criteria measures. These performance measures were concordance index, integrated area under the curve, and , and R2 . To carry out this exercise, a multicentre clinical trial data set was used. A simulation study was also implemented for this comparison. To develop a Cox proportional model, a training dataset of 75% of the observations was used and the model selection techniques were implemented to select covariates. Full Cox PH models containing all covariates were also incorporated for analysis for both the clinical trial data set and simulations. The clinical trial data set showed that the full model and forward selection technique performed better with the performance metrics employed, though they do not reduce the complexity of the model as much as the Lasso technique does. The simulation studies also showed that the full model performed better than the other techniques, with the Lasso technique overpenalising the model from the simulation with the smaller data set and many covariates. AIC and BIC were less effective in computation than the rest of the variable selection techniques, but effectively reduced model complexity than their counterparts for the simulations. The integrated area under the curve was the performance metric of choice for choosing the final model for analysis on the real data set. This performance metric gave more efficient outcomes unlike the other metrics on all selection techniques. This dissertation hence showed that variable selection techniques differ according to the study design of the research as well as the performance measure used. Hence, to have a good model, it is important to not use a model selection technique in isolation. There is therefore need for further research and publish techniques that work generally well for different study designs to make the process shorter for most researchers.
format Thesis
id oai:open.uct.ac.za:11427/36594
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:33:07.122Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2022
publishDateRange 2022
publishDateSort 2022
publisher Department of Statistical Sciences
publisherStr Department of Statistical Sciences
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/36594 Statistical model selection techniques for the cox proportional hazards model: a comparative study Njati, Jolando Gumedze, Freedom survival analysis simulation Cox proportional hazard model selection integrated area under the curve The advancement in data acquiring technology continues to see survival data sets with many covariates. This has posed a new challenge for researchers in identifying important covariates for inference and prediction for a time-to-event response variable. In this dissertation, common Cox proportional hazards model selection techniques and a random survival forest technique were compared using five performance criteria measures. These performance measures were concordance index, integrated area under the curve, and , and R2 . To carry out this exercise, a multicentre clinical trial data set was used. A simulation study was also implemented for this comparison. To develop a Cox proportional model, a training dataset of 75% of the observations was used and the model selection techniques were implemented to select covariates. Full Cox PH models containing all covariates were also incorporated for analysis for both the clinical trial data set and simulations. The clinical trial data set showed that the full model and forward selection technique performed better with the performance metrics employed, though they do not reduce the complexity of the model as much as the Lasso technique does. The simulation studies also showed that the full model performed better than the other techniques, with the Lasso technique overpenalising the model from the simulation with the smaller data set and many covariates. AIC and BIC were less effective in computation than the rest of the variable selection techniques, but effectively reduced model complexity than their counterparts for the simulations. The integrated area under the curve was the performance metric of choice for choosing the final model for analysis on the real data set. This performance metric gave more efficient outcomes unlike the other metrics on all selection techniques. This dissertation hence showed that variable selection techniques differ according to the study design of the research as well as the performance measure used. Hence, to have a good model, it is important to not use a model selection technique in isolation. There is therefore need for further research and publish techniques that work generally well for different study designs to make the process shorter for most researchers. 2022-07-01T15:26:47Z 2022-07-01T15:26:47Z 2022 2022-07-01T15:24:00Z Master Thesis Masters MSc http://hdl.handle.net/11427/36594 eng application/pdf Department of Statistical Sciences Faculty of Science
spellingShingle survival analysis
simulation
Cox proportional hazard model selection
integrated area under the curve
Njati, Jolando
Statistical model selection techniques for the cox proportional hazards model: a comparative study
thesis_degree_str Master's
title Statistical model selection techniques for the cox proportional hazards model: a comparative study
title_full Statistical model selection techniques for the cox proportional hazards model: a comparative study
title_fullStr Statistical model selection techniques for the cox proportional hazards model: a comparative study
title_full_unstemmed Statistical model selection techniques for the cox proportional hazards model: a comparative study
title_short Statistical model selection techniques for the cox proportional hazards model: a comparative study
title_sort statistical model selection techniques for the cox proportional hazards model a comparative study
topic survival analysis
simulation
Cox proportional hazard model selection
integrated area under the curve
url http://hdl.handle.net/11427/36594
work_keys_str_mv AT njatijolando statisticalmodelselectiontechniquesforthecoxproportionalhazardsmodelacomparativestudy