Full Text Available

Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Penalized feature selection in model-based clustering

Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2022.

Saved in:

Bibliographic Details
Other Authors:	Millard, Sollie M.
Format:	Thesis
Language:	English
Published:	University of Pretoria 2023
Subjects:	UCTD Variable selection Clustering Expectation Maximisation Penalized log-likelihood Penalized feature selection
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613531366490112
access_status_str	Open Access
author2	Millard, Sollie M.
author_browse	Millard, Sollie M.
author_facet	Millard, Sollie M.
collection	Thesis
dc_rights_str_mv	© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description	Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2022.
format	Thesis
id	oai:repository.up.ac.za:2263/91035
institution	University of Pretoria (South Africa)
language	English
last_indexed	2026-06-10T12:37:37.672Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate	2023
publishDateRange	2023
publishDateSort	2023
publisher	University of Pretoria
publisherStr	University of Pretoria
record_format	dspace
source_str	UPSpace — University of Pretoria Institutional Repository
spelling	oai:repository.up.ac.za:2263/91035 Penalized feature selection in model-based clustering Millard, Sollie M. luan3potgieter@gmail.com Kanfer, F.H.J. (Frans) Potgieter, Luandrie UCTD Variable selection Clustering Expectation Maximisation Penalized log-likelihood Penalized feature selection Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2022. Cluster analysis is a popular unsupervised statistical method used to group observations into clusters. Identifying latent segments and groupings in the data aids in the understanding of natural phenomena. The data driven society we live in today has made high dimensional data quite ubiquitous and hence noise variables are unavoidable. Modelbased clustering methods have had to adjust in order to identify these non-informative variables since they unduly increase a model’s complexity. This mini dissertation reviews the effectiveness of different penalized likelihood approaches and how they aid in identifying and removing uninformative variables. An EM algorithm is used to fit a penalized Gaussian mixture model to the data. The penalized log likelihood is maximized and if a variable’s parameter estimates are reduced to the same value across all clusters, it is removed from the model and deemed uninformative. It was found that by penalizing the mean, uninformative variables were successfully identified and removed. CSIR Statistics MSc (Advanced Data Analytics) Unrestricted 2023-06-06T13:00:21Z 2023-06-06T13:00:21Z 2023-09-01 2022 Mini Dissertation * S2023 http://hdl.handle.net/2263/91035 10.25403/UPresearchdata.23219531 en © 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle	UCTD Variable selection Clustering Expectation Maximisation Penalized log-likelihood Penalized feature selection Penalized feature selection in model-based clustering
title	Penalized feature selection in model-based clustering
title_full	Penalized feature selection in model-based clustering
title_fullStr	Penalized feature selection in model-based clustering
title_full_unstemmed	Penalized feature selection in model-based clustering
title_short	Penalized feature selection in model-based clustering
title_sort	penalized feature selection in model based clustering
topic	UCTD Variable selection Clustering Expectation Maximisation Penalized log-likelihood Penalized feature selection
url	http://hdl.handle.net/2263/91035

Full Text Available

Penalized feature selection in model-based clustering

Similar Items