Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Mining a large shopping database to predict where, when, and what consumers will buy next

Retailers with electronic point-of-sale systems continuously amass detailed data about the items each consumer buys (i.e. what item, how often, its package size, how many were bought, whether the item was on special, etc.). Where the retailer can also associate purchases with a particular individual...

Full description

Saved in:
Bibliographic Details
Main Author: Halam, Bantu
Other Authors: Durbach, Ian
Format: Thesis
Language:English
Published: Department of Statistical Sciences 2020
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613251646259200
access_status_str Open Access
author Halam, Bantu
author2 Durbach, Ian
author_browse Durbach, Ian
Halam, Bantu
author_facet Durbach, Ian
Halam, Bantu
author_sort Halam, Bantu
collection Thesis
description Retailers with electronic point-of-sale systems continuously amass detailed data about the items each consumer buys (i.e. what item, how often, its package size, how many were bought, whether the item was on special, etc.). Where the retailer can also associate purchases with a particular individual for example, when an account or loyalty card is issued, the buying behaviour of the consumer can be tracked over time, providing the retailer with valuable information about a customer's changing preferences. This project is based on mining a large database, containing the purchase histories of some 300 000 customers of a retailer, for insights into the behaviour of those customers. Specifically, the aim is to build three predictive models, each forming a chapter of the dissertation; forecasting the number of daily customers to visit a store, detecting changes in consumers' inter-purchase times, and predicting repeat customers after being given a special offer. Having too many goods and not enough customers implies loss for a business; having too few goods implies a lost opportunity to turn a profit. The ideal situation is to stock the appropriate number of goods for the number of customers arriving, so you can minimize loss, and maximize profit. To attend to this problem, in the first chapter we forecast the number of customers that will visit a store each day to buy any product (i.e. store daily visits). In the process we also carry out a time-series forecasting methods comparison, with the main aim of comparing machine learning methods to classical statistical methods. The models are fitted into a univariate time-series data and the best model for this particular dataset is selected using three accuracy measures. The results showed that there was not much difference between the methods, but some classical methods slightly performed better than the machine learning algorithms, and this was consistent with outcomes obtained by Makridakis et al. (2018) on similar comparisons. It is also vital for retailers to know when there has been a change in their consumers purchase behaviour. This change can either be the time between purchases, change in brand selection or change in market share. It is critical for such changes to be detected as early as possible, as speedy detection can help managers act before incurring loses. In the second chapter, we use change-point models to detect changes in consumers' inter-purchase times. Change-point models are approaches that offer a flexible, general-purpose solution to the problem of detecting changes in customer historic behaviour. This multiple change-point model assumes that there is a sequence of underlying parameters, and that this sequence is partitioned into contiguous blocks. These partitions are such that the parameter values are equal within, and different between blocks, whereby a beginning of a block is considered to be a change point. This changepoint model is fitted to consumers inter-purchase times (i.e. we model time between purchases) to see whether there were any significant changes on the consumers buying behaviour over a one year purchase period. The results showed that, depending on the length of the sequences, minority to a handful of customers do experience changes in their purchasing behaviours, with the longer sequences having more changes than the shorter ones. The results seemed to be different to those obtained by Clark and Durbach (2014), but analysing a portion of sequences of same lengths as those analysed in Clark and Durbach (2014), lead to similar results. Increasing sales growth is also vital for retailers, and there are various possible ways in which this can be achieved. One of the strategies is what is referred to as up-selling (whereby a customer is persuaded to make an additional purchase of the same product or purchase a more expensive version of the product.) and cross-selling (whereby a retailer sells a different product or service to an existing customer). These involve campaigning to customers and sell certain products, and sometimes include incentives in the campaign with the aim of exposing customers to these products hoping they will become repeat customers afterwards. In Chapter 3 we build a model to predict which customers are likely to become repeat customers after being given a special offer. This model is fitted to customers' time between two purchases, which makes the input time-series data, and is sequential in nature. Therefore, we build models that provide a good way for dealing with sequential inputs (i.e. convolutional neural networks and recurrent neural networks), and compare them to models that do not take into account the sequence of the data (i.e. feedforward neural networks and decision trees). The results showed that, inter-purchase times are only useful when they are about the same product, as models did no better than random if inter-purchase times were from a different product in the same department. Secondly, it is useful to take the order of the sequence into account, as models that do this do better than those who do not, with the latter not doing any better than a null model. Lastly, while none of the models performed well, deep learning models perform better than standard classification models and produce some substantial lift.
format Thesis
id oai:open.uct.ac.za:11427/32224
institution University of Cape Town (South Africa)
language eng
last_indexed 2026-06-10T12:33:10.259Z
license_str Not specified — see source repository
provenance_str_mv Harvested via OAI-PMH from UCTD — University of Cape Town Open Access Repository
publishDate 2020
publishDateRange 2020
publishDateSort 2020
publisher Department of Statistical Sciences
publisherStr Department of Statistical Sciences
record_format dspace
source_str UCTD — University of Cape Town Open Access Repository
spelling oai:open.uct.ac.za:11427/32224 Mining a large shopping database to predict where, when, and what consumers will buy next Halam, Bantu Durbach, Ian Statistical Sciences Retailers with electronic point-of-sale systems continuously amass detailed data about the items each consumer buys (i.e. what item, how often, its package size, how many were bought, whether the item was on special, etc.). Where the retailer can also associate purchases with a particular individual for example, when an account or loyalty card is issued, the buying behaviour of the consumer can be tracked over time, providing the retailer with valuable information about a customer's changing preferences. This project is based on mining a large database, containing the purchase histories of some 300 000 customers of a retailer, for insights into the behaviour of those customers. Specifically, the aim is to build three predictive models, each forming a chapter of the dissertation; forecasting the number of daily customers to visit a store, detecting changes in consumers' inter-purchase times, and predicting repeat customers after being given a special offer. Having too many goods and not enough customers implies loss for a business; having too few goods implies a lost opportunity to turn a profit. The ideal situation is to stock the appropriate number of goods for the number of customers arriving, so you can minimize loss, and maximize profit. To attend to this problem, in the first chapter we forecast the number of customers that will visit a store each day to buy any product (i.e. store daily visits). In the process we also carry out a time-series forecasting methods comparison, with the main aim of comparing machine learning methods to classical statistical methods. The models are fitted into a univariate time-series data and the best model for this particular dataset is selected using three accuracy measures. The results showed that there was not much difference between the methods, but some classical methods slightly performed better than the machine learning algorithms, and this was consistent with outcomes obtained by Makridakis et al. (2018) on similar comparisons. It is also vital for retailers to know when there has been a change in their consumers purchase behaviour. This change can either be the time between purchases, change in brand selection or change in market share. It is critical for such changes to be detected as early as possible, as speedy detection can help managers act before incurring loses. In the second chapter, we use change-point models to detect changes in consumers' inter-purchase times. Change-point models are approaches that offer a flexible, general-purpose solution to the problem of detecting changes in customer historic behaviour. This multiple change-point model assumes that there is a sequence of underlying parameters, and that this sequence is partitioned into contiguous blocks. These partitions are such that the parameter values are equal within, and different between blocks, whereby a beginning of a block is considered to be a change point. This changepoint model is fitted to consumers inter-purchase times (i.e. we model time between purchases) to see whether there were any significant changes on the consumers buying behaviour over a one year purchase period. The results showed that, depending on the length of the sequences, minority to a handful of customers do experience changes in their purchasing behaviours, with the longer sequences having more changes than the shorter ones. The results seemed to be different to those obtained by Clark and Durbach (2014), but analysing a portion of sequences of same lengths as those analysed in Clark and Durbach (2014), lead to similar results. Increasing sales growth is also vital for retailers, and there are various possible ways in which this can be achieved. One of the strategies is what is referred to as up-selling (whereby a customer is persuaded to make an additional purchase of the same product or purchase a more expensive version of the product.) and cross-selling (whereby a retailer sells a different product or service to an existing customer). These involve campaigning to customers and sell certain products, and sometimes include incentives in the campaign with the aim of exposing customers to these products hoping they will become repeat customers afterwards. In Chapter 3 we build a model to predict which customers are likely to become repeat customers after being given a special offer. This model is fitted to customers' time between two purchases, which makes the input time-series data, and is sequential in nature. Therefore, we build models that provide a good way for dealing with sequential inputs (i.e. convolutional neural networks and recurrent neural networks), and compare them to models that do not take into account the sequence of the data (i.e. feedforward neural networks and decision trees). The results showed that, inter-purchase times are only useful when they are about the same product, as models did no better than random if inter-purchase times were from a different product in the same department. Secondly, it is useful to take the order of the sequence into account, as models that do this do better than those who do not, with the latter not doing any better than a null model. Lastly, while none of the models performed well, deep learning models perform better than standard classification models and produce some substantial lift. 2020-09-11T12:50:09Z 2020-09-11T12:50:09Z 2020 2020-09-11T12:32:46Z Master Thesis Masters MSc http://hdl.handle.net/11427/32224 eng application/pdf Department of Statistical Sciences Faculty of Science
spellingShingle Statistical Sciences
Halam, Bantu
Mining a large shopping database to predict where, when, and what consumers will buy next
thesis_degree_str Master's
title Mining a large shopping database to predict where, when, and what consumers will buy next
title_full Mining a large shopping database to predict where, when, and what consumers will buy next
title_fullStr Mining a large shopping database to predict where, when, and what consumers will buy next
title_full_unstemmed Mining a large shopping database to predict where, when, and what consumers will buy next
title_short Mining a large shopping database to predict where, when, and what consumers will buy next
title_sort mining a large shopping database to predict where when and what consumers will buy next
topic Statistical Sciences
url http://hdl.handle.net/11427/32224
work_keys_str_mv AT halambantu miningalargeshoppingdatabasetopredictwherewhenandwhatconsumerswillbuynext