Full Text Available

Access Repository Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Determining the number of clusters using penalised k-means clustering

Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2024.

Saved in:

Bibliographic Details
Other Authors:	Millard, Sollie M.
Format:	Thesis
Language:	English
Published:	University of Pretoria 2025
Subjects:	UCTD K-means Unsupervised k-means Entropy Pre-intialisation Number of clusters
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613501320593408
access_status_str	Open Access
author2	Millard, Sollie M.
author_browse	Millard, Sollie M.
author_facet	Millard, Sollie M.
collection	Thesis
dc_rights_str_mv	© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
description	Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2024.
format	Thesis
id	oai:repository.up.ac.za:2263/100627
institution	University of Pretoria (South Africa)
language	English
last_indexed	2026-06-10T12:37:09.154Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from UPSpace — University of Pretoria Institutional Repository
publishDate	2025
publishDateRange	2025
publishDateSort	2025
publisher	University of Pretoria
publisherStr	University of Pretoria
record_format	dspace
source_str	UPSpace — University of Pretoria Institutional Repository
spelling	oai:repository.up.ac.za:2263/100627 Determining the number of clusters using penalised k-means clustering Millard, Sollie M. robert.w.greyling@gmail.com Kanfer, F.H.J. (Frans) Greyling, Robert William UCTD K-means Unsupervised k-means Entropy Pre-intialisation Number of clusters Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2024. Clustering is an important part of statistics. However the issue of pre-initialisation of the number of clusters is still persistent. In this minor dissertation we consider a procedure to eliminate the pre-initialisation of the number of clusters in the k-means algorithm. This important advancement reduces manual effort in clustering tasks. This procedure aims to automatically eliminate the determination of the correct value of k. Following the approach by Sinaga and Yang; we modify the traditional k-means objective function by adding two entropy terms as penalty terms. An additional step was added to the algorithm to ensure that the initial clusters are not empty. A simulation study was conducted using multiple datasets with varying true cluster counts k, data dimensionalities D, and sample sizes n. Results indicate that the proposed algorithm performs well in identifying distinct clusters, particularly in lower-dimensional data. Statistics MSc (Advanced Data Analytics) Unrestricted Faculty of Natural and Agricultural Sciences None 2025-02-10T07:15:08Z 2025-02-10T07:15:08Z 2025-04 2024-11 Dissertation * A2025 http://hdl.handle.net/2263/100627 https://doi.org/10.25403/UPresearchdata.28380005 en © 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. application/pdf University of Pretoria
spellingShingle	UCTD K-means Unsupervised k-means Entropy Pre-intialisation Number of clusters Determining the number of clusters using penalised k-means clustering
title	Determining the number of clusters using penalised k-means clustering
title_full	Determining the number of clusters using penalised k-means clustering
title_fullStr	Determining the number of clusters using penalised k-means clustering
title_full_unstemmed	Determining the number of clusters using penalised k-means clustering
title_short	Determining the number of clusters using penalised k-means clustering
title_sort	determining the number of clusters using penalised k means clustering
topic	UCTD K-means Unsupervised k-means Entropy Pre-intialisation Number of clusters
url	http://hdl.handle.net/2263/100627 https://doi.org/10.25403/UPresearchdata.28380005

Full Text Available

Determining the number of clusters using penalised k-means clustering

Similar Items