Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Process monitoring and fault diagnosis using random forests

Thesis (PhD (Process Engineering))--University of Stellenbosch, 2010.

Saved in:
Bibliographic Details
Main Author: Auret, Lidia
Other Authors: Aldrich, C.
Format: Thesis
Language:English
Published: Stellenbosch : University of Stellenbosch 2010
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613904699392000
access_status_str Open Access
author Auret, Lidia
author2 Aldrich, C.
author_browse Aldrich, C.
Auret, Lidia
author_facet Aldrich, C.
Auret, Lidia
author_sort Auret, Lidia
collection Thesis
dc_rights_str_mv University of Stellenbosch
description Thesis (PhD (Process Engineering))--University of Stellenbosch, 2010.
format Thesis
id oai:scholar.sun.ac.za:10019.1/5360
institution Stellenbosch University (South Africa)
language English
last_indexed 2026-06-10T12:43:33.723Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2010
publishDateRange 2010
publishDateSort 2010
publisher Stellenbosch : University of Stellenbosch
publisherStr Stellenbosch : University of Stellenbosch
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/5360 Process monitoring and fault diagnosis using random forests Auret, Lidia Aldrich, C. University of Stellenbosch. Faculty of Engineering. Dept. of Process Engineering. Process monitoring Dissertations -- Process engineering Theses -- Process engineering Fault diagnosis Feature extraction Random forest model Thesis (PhD (Process Engineering))--University of Stellenbosch, 2010. Dissertation presented for the Degree of DOCTOR OF PHILOSOPHY (Extractive Metallurgical Engineering) in the Department of Process Engineering at the University of Stellenbosch ENGLISH ABSTRACT: Fault diagnosis is an important component of process monitoring, relevant in the greater context of developing safer, cleaner and more cost efficient processes. Data-driven unsupervised (or feature extractive) approaches to fault diagnosis exploit the many measurements available on modern plants. Certain current unsupervised approaches are hampered by their linearity assumptions, motivating the investigation of nonlinear methods. The diversity of data structures also motivates the investigation of novel feature extraction methodologies in process monitoring. Random forests are recently proposed statistical inference tools, deriving their predictive accuracy from the nonlinear nature of their constituent decision tree members and the power of ensembles. Random forest committees provide more than just predictions; model information on data proximities can be exploited to provide random forest features. Variable importance measures show which variables are closely associated with a chosen response variable, while partial dependencies indicate the relation of important variables to said response variable. The purpose of this study was therefore to investigate the feasibility of a new unsupervised method based on random forests as a potentially viable contender in the process monitoring statistical tool family. The hypothesis investigated was that unsupervised process monitoring and fault diagnosis can be improved by using features extracted from data with random forests, with further interpretation of fault conditions aided by random forest tools. The experimental results presented in this work support this hypothesis. An initial study was performed to assess the quality of random forest features. Random forest features were shown to be generally difficult to interpret in terms of geometry present in the original variable space. Random forest mapping and demapping models were shown to be very accurate on training data, and to extrapolate weakly to unseen data that do not fall within regions populated by training data. Random forest feature extraction was applied to unsupervised fault diagnosis for process data, and compared to linear and nonlinear methods. Random forest results were comparable to existing techniques, with the majority of random forest detections due to variable reconstruction errors. Further investigation revealed that the residual detection success of random forests originates from the constrained responses and poor generalization artifacts of decision trees. Random forest variable importance measures and partial dependencies were incorporated in a visualization tool to allow for the interpretation of fault conditions. A dynamic change point detection application with random forests proved more successful than an existing principal component analysis-based approach, with the success of the random forest method again residing in reconstruction errors. The addition of random forest fault diagnosis and change point detection algorithms to a suite of abnormal event detection techniques is recommended. The distance-to-model diagnostic based on random forest mapping and demapping proved successful in this work, and the theoretical understanding gained supports the application of this method to further data sets. AFRIKAANSE OPSOMMING: Foutdiagnose is ’n belangrike komponent van prosesmonitering, en is relevant binne die groter konteks van die ontwikkeling van veiliger, skoner en meer koste-effektiewe prosesse. Data-gedrewe toesigvrye of kenmerkekstraksie-benaderings tot foutdiagnose benut die vele metings wat op moderne prosesaanlegte beskikbaar is. Party van die huidige toesigvrye benaderings word deur aannames rakende liniariteit belemmer, wat as motivering dien om nie-liniêre metodes te ondersoek. Die diversiteit van datastrukture is ook verdere motivering vir ondersoek na nuwe kenmerkekstraksiemetodes in prosesmonitering. Lukrake-woude is ’n nuwe statistiese inferensie-tegniek, waarvan die akkuraatheid toegeskryf kan word aan die nie-liniêre aard van besluitnemingsboomlede en die bekwaamheid van ensembles. Lukrake-woudkomitees verskaf meer as net voorspellings; modelinligting oor datapuntnabyheid kan benut word om lukrakewoudkenmerke te verskaf. Metingbelangrikheidsaanduiers wys watter metings in ’n noue verhouding met ’n gekose uitsetveranderlike verkeer, terwyl parsiële afhanklikhede aandui wat die verhouding van ’n belangrike meting tot die gekose uitsetveranderlike is. Die doel van hierdie studie was dus om die uitvoerbaarheid van ’n nuwe toesigvrye metode vir prosesmonitering gebaseer op lukrake-woude te ondersoek. Die ondersoekte hipotese lui: toesigvrye prosesmonitering en foutdiagnose kan verbeter word deur kenmerke te gebruik wat met lukrake-woude geëkstraheer is, waar die verdere interpretasie van foutkondisies deur addisionele lukrake-woude-tegnieke bygestaan word. Eksperimentele resultate wat in hierdie werkstuk voorgelê is, ondersteun hierdie hipotese. ’n Intreestudie is gedoen om die gehalte van lukrake-woudkenmerke te assesseer. Daar is bevind dat dit moeilik is om lukrake-woudkenmerke in terme van die geometrie van die oorspronklike metingspasie te interpreteer. Verder is daar bevind dat lukrake-woudkartering en -dekartering baie akkuraat is vir opleidingsdata, maar dat dit swak ekstrapolasie-eienskappe toon vir ongesiene data wat in gebiede buite dié van die opleidingsdata val. Lukrake-woudkenmerkekstraksie is in toesigvrye-foutdiagnose vir gestadigde-toestandprosesse toegepas, en is met liniêre en nie-liniêre metodes vergelyk. Resultate met lukrake-woude is vergelykbaar met dié van bestaande metodes, en die meerderheid lukrake-woudopsporings is aan metingrekonstruksiefoute toe te skryf. Verdere ondersoek het getoon dat die sukses van res-opsporing op die beperkte uitsetwaardes en swak veralgemenende eienskappe van besluitnemingsbome berus. Lukrake-woude-metingbelangrikheidsaanduiers en parsiële afhanklikhede is ingelyf in ’n visualiseringstegniek wat vir die interpretasie van foutkondisies voorsiening maak. ’n Dinamiese aanwending van veranderingspuntopsporing met lukrake-woude is as meer suksesvol bewys as ’n bestaande metode gebaseer op hoofkomponentanalise. Die sukses van die lukrake-woudmetode is weereens aan rekonstruksie-reswaardes toe te skryf. ’n Voorstel wat na aanleiding van hierde studie gemaak is, is dat die lukrake-woudveranderingspunt- en foutopsporingsmetodes by ’n soortgelyke stel metodes gevoeg kan word. Daar is in hierdie werk bevind dat die afstand-vanaf-modeldiagnostiek gebaseer op lukrake-woudkartering en -dekartering suksesvol is vir foutopsporing. Die teoretiese begrippe wat ontsluier is, ondersteun die toepassing van hierdie metodes op verdere datastelle. 2010-11-23T10:00:44Z 2010-12-15T10:37:44Z 2010-11-23T10:00:44Z 2010-12-15T10:37:44Z 2010-12 Thesis http://hdl.handle.net/10019.1/5360 en University of Stellenbosch 214 p. : ill. application/pdf Stellenbosch : University of Stellenbosch
spellingShingle Process monitoring
Dissertations -- Process engineering
Theses -- Process engineering
Fault diagnosis
Feature extraction
Random forest model
Auret, Lidia
Process monitoring and fault diagnosis using random forests
title Process monitoring and fault diagnosis using random forests
title_full Process monitoring and fault diagnosis using random forests
title_fullStr Process monitoring and fault diagnosis using random forests
title_full_unstemmed Process monitoring and fault diagnosis using random forests
title_short Process monitoring and fault diagnosis using random forests
title_sort process monitoring and fault diagnosis using random forests
topic Process monitoring
Dissertations -- Process engineering
Theses -- Process engineering
Fault diagnosis
Feature extraction
Random forest model
url http://hdl.handle.net/10019.1/5360
work_keys_str_mv AT auretlidia processmonitoringandfaultdiagnosisusingrandomforests