Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2025.
| Other Authors: | |
|---|---|
| Format: | Thesis |
| Language: | English |
| Published: |
University of Pretoria
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| _version_ | 1867613595777368064 |
|---|---|
| access_status_str | Open Access |
| author2 | Nakhaeirad, Najmeh |
| author_browse | Nakhaeirad, Najmeh |
| author_facet | Nakhaeirad, Najmeh |
| collection | Thesis |
| dc_rights_str_mv | © 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. |
| description | Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2025. |
| format | Thesis |
| id | oai:repository.up.ac.za:2263/100744 |
| institution | University of Pretoria (South Africa) |
| language | English |
| last_indexed | 2026-06-10T12:38:39.160Z |
| license_str | Other — see source repository |
| provenance_str_mv | Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository |
| publishDate | 2025 |
| publishDateRange | 2025 |
| publishDateSort | 2025 |
| publisher | University of Pretoria |
| publisherStr | University of Pretoria |
| record_format | dspace |
| source_str | UPSpace — University of Pretoria Institutional Repository |
| spelling | oai:repository.up.ac.za:2263/100744 Maximum likelihood estimation for Cox regression under risk set sampling Nakhaeirad, Najmeh u19044438@tuks.co.za Nasejje, Justine Mashinini, Nontokozo UCTD Sustainable Development Goals (SDGs) Cox proportional hazard model Newton Raphson Stochastic gradient descent Risk set sampling Nested case control sampling Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2025. In certain epidemiological studies, researchers aim to investigate specific events, such as disease outcomes, and their associated risk factors within a cohort. However, in the era of big data, analyzing the entire cohort can be time-consuming due to the large volume of data. To address this challenge, a nested case-control design can be employed, allowing for quicker and more efficient analysis by focusing on a sample of cases and matched controls within the same population. In survival analysis, the cohort dataset is crucial for defining the risk sets in Cox proportional hazards (CPH) model optimization. These risk sets are integral to the Cox partial likelihood function, which is used to fit the model. This research seeks to apply the nested case-control design to these risk sets via a simulation study, specifically exploring various case-control structures such as 1:1, 1:2, 1:4, and 1:8.The study aims to investigate whether the size of sampled risk sets impacts the time efficiency of the model and the precision of the estimated parameters using two optimization methods: Newton Raphson (NR) and Stochastic Gradient Descent (SGD). Results from optimizing the four different case-control structures using NR suggest that the CPH model's parameter estimates converge to the true values, with bias decreasing as the number of controls per case decreases although there are minor fluctuations in some controls.(for example the positive bias values for $\beta_1$ obtained via the four different case-control structures are: 0.041, 0.039, 0.080, 0.133, 0.002). The CPH model fitted with NR performed well with a complete risk set in a large-sized datasets and continued to perform well with small-sized datasets, though not as effectively as with the larger one. When the CPH model is optimized using SGD across the four different case-control structures, it converges to the true parameter values, particularly when the sample size is large and a complete risk set is used. This study demonstrates how large datasets can be efficiently scaled in survival analysis studies, providing valuable insights relating to parameter precision.The estimates derived from the real data sets using both NR and SGD optimization techniques were generally similar, though with slight differences across the various case-control structures. The full risk set estimates were used as a reference for comparison with those from the different case-control structures. We have discovered in this research that in risk set sampling with a nested case-control design, using fewer controls per case leads to a case-control framework that more closely approximates the true values providing valuable insights into the trade-offs between time efficiency and precision in parameter estimation. Statistics MSc (Advanced Data Analytics) Unrestricted Faculty of Natural and Agricultural Sciences SDG-03: Good health and well-being 2025-02-11T20:54:38Z 2025-02-11T20:54:38Z 2025-05 2025-02 Dissertation * A2025 http://hdl.handle.net/2263/100744 https://doi.org/10.25403/UPresearchdata.28395101 en © 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria |
| spellingShingle | UCTD Sustainable Development Goals (SDGs) Cox proportional hazard model Newton Raphson Stochastic gradient descent Risk set sampling Nested case control sampling Maximum likelihood estimation for Cox regression under risk set sampling |
| title | Maximum likelihood estimation for Cox regression under risk set sampling |
| title_full | Maximum likelihood estimation for Cox regression under risk set sampling |
| title_fullStr | Maximum likelihood estimation for Cox regression under risk set sampling |
| title_full_unstemmed | Maximum likelihood estimation for Cox regression under risk set sampling |
| title_short | Maximum likelihood estimation for Cox regression under risk set sampling |
| title_sort | maximum likelihood estimation for cox regression under risk set sampling |
| topic | UCTD Sustainable Development Goals (SDGs) Cox proportional hazard model Newton Raphson Stochastic gradient descent Risk set sampling Nested case control sampling |
| url | http://hdl.handle.net/2263/100744 https://doi.org/10.25403/UPresearchdata.28395101 |