Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Maximum likelihood estimation for Cox regression under risk set sampling

Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2025.

Saved in:
Bibliographic Details
Other Authors: Nakhaeirad, Najmeh
Format: Thesis
Language:English
Published: University of Pretoria 2025
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613595777368064
access_status_str Open Access
author2 Nakhaeirad, Najmeh
author_browse Nakhaeirad, Najmeh
author_facet Nakhaeirad, Najmeh
collection Thesis
dc_rights_str_mv © 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2025.
format Thesis
id oai:repository.up.ac.za:2263/100744
institution University of Pretoria (South Africa)
language English
last_indexed 2026-06-10T12:38:39.160Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate 2025
publishDateRange 2025
publishDateSort 2025
publisher University of Pretoria
publisherStr University of Pretoria
record_format dspace
source_str UPSpace — University of Pretoria Institutional Repository
spelling oai:repository.up.ac.za:2263/100744 Maximum likelihood estimation for Cox regression under risk set sampling Nakhaeirad, Najmeh u19044438@tuks.co.za Nasejje, Justine Mashinini, Nontokozo UCTD Sustainable Development Goals (SDGs) Cox proportional hazard model Newton Raphson Stochastic gradient descent Risk set sampling Nested case control sampling Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2025. In certain epidemiological studies, researchers aim to investigate specific events, such as disease outcomes, and their associated risk factors within a cohort. However, in the era of big data, analyzing the entire cohort can be time-consuming due to the large volume of data. To address this challenge, a nested case-control design can be employed, allowing for quicker and more efficient analysis by focusing on a sample of cases and matched controls within the same population. In survival analysis, the cohort dataset is crucial for defining the risk sets in Cox proportional hazards (CPH) model optimization. These risk sets are integral to the Cox partial likelihood function, which is used to fit the model. This research seeks to apply the nested case-control design to these risk sets via a simulation study, specifically exploring various case-control structures such as 1:1, 1:2, 1:4, and 1:8.The study aims to investigate whether the size of sampled risk sets impacts the time efficiency of the model and the precision of the estimated parameters using two optimization methods: Newton Raphson (NR) and Stochastic Gradient Descent (SGD). Results from optimizing the four different case-control structures using NR suggest that the CPH model's parameter estimates converge to the true values, with bias decreasing as the number of controls per case decreases although there are minor fluctuations in some controls.(for example the positive bias values for $\beta_1$ obtained via the four different case-control structures are: 0.041, 0.039, 0.080, 0.133, 0.002). The CPH model fitted with NR performed well with a complete risk set in a large-sized datasets and continued to perform well with small-sized datasets, though not as effectively as with the larger one. When the CPH model is optimized using SGD across the four different case-control structures, it converges to the true parameter values, particularly when the sample size is large and a complete risk set is used. This study demonstrates how large datasets can be efficiently scaled in survival analysis studies, providing valuable insights relating to parameter precision.The estimates derived from the real data sets using both NR and SGD optimization techniques were generally similar, though with slight differences across the various case-control structures. The full risk set estimates were used as a reference for comparison with those from the different case-control structures. We have discovered in this research that in risk set sampling with a nested case-control design, using fewer controls per case leads to a case-control framework that more closely approximates the true values providing valuable insights into the trade-offs between time efficiency and precision in parameter estimation. Statistics MSc (Advanced Data Analytics) Unrestricted Faculty of Natural and Agricultural Sciences SDG-03: Good health and well-being 2025-02-11T20:54:38Z 2025-02-11T20:54:38Z 2025-05 2025-02 Dissertation * A2025 http://hdl.handle.net/2263/100744 https://doi.org/10.25403/UPresearchdata.28395101 en © 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle UCTD
Sustainable Development Goals (SDGs)
Cox proportional hazard model
Newton Raphson
Stochastic gradient descent
Risk set sampling
Nested case control sampling
Maximum likelihood estimation for Cox regression under risk set sampling
title Maximum likelihood estimation for Cox regression under risk set sampling
title_full Maximum likelihood estimation for Cox regression under risk set sampling
title_fullStr Maximum likelihood estimation for Cox regression under risk set sampling
title_full_unstemmed Maximum likelihood estimation for Cox regression under risk set sampling
title_short Maximum likelihood estimation for Cox regression under risk set sampling
title_sort maximum likelihood estimation for cox regression under risk set sampling
topic UCTD
Sustainable Development Goals (SDGs)
Cox proportional hazard model
Newton Raphson
Stochastic gradient descent
Risk set sampling
Nested case control sampling
url http://hdl.handle.net/2263/100744
https://doi.org/10.25403/UPresearchdata.28395101