Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Exploratory data analysis and empirical modelling of stationary processes by use of genetic programming

Thesis (M.Ing.)--University of Stellenbosch, 1999.

Saved in:
Bibliographic Details
Main Author: Chemaly, Timothy Paul
Other Authors: Aldrich, C.
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2012
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614136885575680
access_status_str Open Access
author Chemaly, Timothy Paul
author2 Aldrich, C.
author_browse Aldrich, C.
Chemaly, Timothy Paul
author_facet Aldrich, C.
Chemaly, Timothy Paul
author_sort Chemaly, Timothy Paul
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (M.Ing.)--University of Stellenbosch, 1999.
format Thesis
id oai:scholar.sun.ac.za:10019.1/51267
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:47:15.075Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2012
publishDateRange 2012
publishDateSort 2012
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/51267 Exploratory data analysis and empirical modelling of stationary processes by use of genetic programming Chemaly, Timothy Paul Aldrich, C. Stellenbosch University. Faculty of Engineering. Dept. of Process Engineering. Chemical processes -- Data processing Stationary processes Genetic programming (Computer science) Computer algorithms Neural networks (Computer science) Dissertations -- Chemical engineering Thesis (M.Ing.)--University of Stellenbosch, 1999. ENGLISH SUMMARY: Enhancing the performance of any process requires a detailed knowledge of the unknown system, with a mathematical model being the most common means of representing this knowledge. The most frequently used statistical techniques, assume that any relationships between input and output variables are linear and that the data itself is normally distributed. However, real world systems can be highly non-linear and linear approaches can therefore fail to predict the behaviour of the system accurately. Explicit specification of optimal structure in large non-linear models is often not practical and as a result, non-parametric methods (kernel regression, artificial neural networks, etc.) are usually employed. Although these models allow accurate representation of complex systems, they can be very difficult to interpret. This research project explores a novel approach to this problem of mathematical modelling which attempts to evolve optimal parametric models, based on the Darwinian mechanism of evolution. This approach, referred to as genetic programming (GP), facilitates development of explicit or implicit models, or any mix of these two extremes, as dictated by the problem and unlike other methods, it can handle a trade-off between accuracy and interpretability with great ease. During this research; a -commercial application (a-GP) was developed, since very few commercial systems are currently available. Some techniques were developed, which improved the performance ofthe original algorithm considerably. For instance, memory demands were decreased by a factor of 5 by utilizing a different implementation model. Improved convergence and robustness was obtained by using a correlation-based fitness function in conjunction with a correction filter which reduced the sum of the squared errors; at the expense of a more complex model. The evaluation process was expedited by evaluating each tree-like structure as a reverse polish expression; as opposed to a branch-node reduction technique. Additional execution speed was further obtained by implementing the algorithm in c++ (an object oriented compiled language) which is significantly faster than the original LISP (an interpreted language) implementation, . The newly improved algorithm, a-GP, was applied to four industrial data sets and the results were compared against other methods such as standard genetic programming, multilayer perceptron neural networks and linear regression. It was found that a-GP outperformed standard genetic programming on all four case studies, while improving on neural networks on half of the runs. The evolved models tended to be complex. This could be attributed to the lack of parameter estimation that the genetic programming algorithm tried to compensate for by evolving complex tree structures; which it used to approximate the parameters. As a data visualization tool, a-GP was applied to four bench marking data sets used extensively in the literature. The results acquired with a-GP compared favourably with those obtained by other methods with the additional benefit in that a-GP was able to evolve simple mapping functions, which clearly indicated how the variables related to the structure. Additionally, the algorithm was applied in the mapping of two industrial processes. The results showed distinct clustering tendencies within the data, indicating the different operating regimes of the processes under investigation. AFRIKAANSE OPSOMMING: Om die vermoe van 'n proses stelsel te verbeter vereis 'n gedetaileerde kennis of model van die onderliggende proses. Die statistiese tegnieke wat meestal gebruik word, neem aan dat enige verwantskap tussen die intree- en die uittree-veranderlikes lineer is en dat die data self normaal versprei is. Ongelukkig is realistiese probleme dikwels nie-linieer en linieere benaderings kan nie die gedrag van sulke stelsels akkuraat karakteriseer nie. Die eksplisiete spesifikasie van die optimale struktuur in groot nie-linieere modelle is nie altyd prakties moontlik nie, met die gevolg dat nieparametriese metodes (basisfunksie-regressie, kunsmatige neurale netwerke, ens.) gebruik word. Alhoewel hierdie modelle akkurate voorstellings toelaat van komplekse sisteeme is hulle baie moeilik om te interpreteer. Hierdie tesis beskryf 'n unieke benadering tot die probleem van wiskundige modellering deur optimale parametriese modelle te evolueer, wat op die basiese beginsels van Darwin se evolusionere model berus. Hierdie tegniek, wat genetiese programmering (GP) heet, kan eksplisiete, sowel as implisiete modelle ontwikkel of enige kombinasie daarvan, soos gedikteer word deur die probleem. Anders as ander metodes, kan dit ook maklik 'n balans tussen akkuraatheid en interpreteerbaarheid handhaaf. Gedurende hierdie navorsing is 'n kommersiele sagteware-pakket (a-GP) ontwikkel, aangesien baie min sulke pakkette tans beskikbaar is. Verskeie tegnieke is ontwikkel wat die standaard algoritme aansienlik verbeter het. By voorbeeld, die aanvraag na geheue is verminder met 'n faktor van 5 deur gebruik te maak van 'n alternatiewe implementeringsmodel. Versnelde konvergensie en robuustheid was ook verkry deur gebruik te maak van 'n korrelasie-gebaseerde fiksheidfunksie in samewerking met 'n korreksiefilter wat die som van die gekwadreerde foute geminimeer het. Die evaluasieproses was ook versnel deur elke boomstruktuur as 'n omgekeerde Pooise vergelyking op te los, in plaas van 'n tak-node vereenvoudigingstegniek. Die verwerking was verder versnel deur die algoritme te implementeer in C++ ('n objek georienteerde, gekompileerde taal) wat aansienlik vinniger is as die oorspronklike LISP (ge"interpreteerde taal) implementering. Die nuwe verbeterde algoritme, a-GP, is toegepas op vier realistiese probleme met die doel om regressiemodelle te genereer en die resultate is vergelyk met die verkry deur ander tegnieke, 5005 standaard genetiese programmering, kunsmatige neurale netwerke en linieere regressie. Daar was gevind dat a-GP op die standaard genetiese programmering verbeter in al vier gevalle, terwyl dit op kunsmatige neurale netwerke verbeter het op een van die toetsstelle. Die mode lie het geneig om kompleks te wees, wat interpretasie bemoeilik het. Dit kan toegeskryf word aan die tekortkoming van 'n parameterbenadering, waarvoor die genetiese programmering algoritme probeer kompenseer deur komplekse boomstukture te ontwikkel. Die algoritme gebruik die strukture om die parameters af te skat. a-GP is ook gebruik om data te visualiseer. Die resultate op vier datastelle het gewys dat a-GP baie goed vergelyk met ander metodes, terwyl dit die addisionele voordeel gehad het, dat dit eenvoudige projeksie-funksies kon evolueer wat duidelike verwantskappe tussen die veranderlikes en die struktuur uitgewys het. Die algoritme was ook toegepas op die projeksie van twee industrieele stelsels na twee dimensies vir visualisering. Die resultate het duidelike trosvorming in die data uitgewys, wat 'n indikasie was van die verskillende operasionele toestande van die-prosesse. Master 2012-08-27T11:34:25Z 2012-08-27T11:34:25Z 1999-12 Thesis http://hdl.handle.net/10019.1/51267 en_ZA Stellenbosch University 175 pages : ill. application/pdf Stellenbosch : Stellenbosch University
spellingShingle Chemical processes -- Data processing
Stationary processes
Genetic programming (Computer science)
Computer algorithms
Neural networks (Computer science)
Dissertations -- Chemical engineering
Chemaly, Timothy Paul
Exploratory data analysis and empirical modelling of stationary processes by use of genetic programming
title Exploratory data analysis and empirical modelling of stationary processes by use of genetic programming
title_full Exploratory data analysis and empirical modelling of stationary processes by use of genetic programming
title_fullStr Exploratory data analysis and empirical modelling of stationary processes by use of genetic programming
title_full_unstemmed Exploratory data analysis and empirical modelling of stationary processes by use of genetic programming
title_short Exploratory data analysis and empirical modelling of stationary processes by use of genetic programming
title_sort exploratory data analysis and empirical modelling of stationary processes by use of genetic programming
topic Chemical processes -- Data processing
Stationary processes
Genetic programming (Computer science)
Computer algorithms
Neural networks (Computer science)
Dissertations -- Chemical engineering
url http://hdl.handle.net/10019.1/51267
work_keys_str_mv AT chemalytimothypaul exploratorydataanalysisandempiricalmodellingofstationaryprocessesbyuseofgeneticprogramming