Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Quality control for data-dependent and data-independent mass-spectrometry-based proteomics

Thesis (PhD)--Stellenbosch University, 2020.

Saved in:
Bibliographic Details
Main Author: Marina, Kriek
Other Authors: Tabb, David
Format: Thesis
Language:en_ZA
Published: Stellenbosch : Stellenbosch University 2020
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867614037407170560
access_status_str Open Access
author Marina, Kriek
author2 Tabb, David
author_browse Marina, Kriek
Tabb, David
author_facet Tabb, David
Marina, Kriek
author_sort Marina, Kriek
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (PhD)--Stellenbosch University, 2020.
format Thesis
id oai:scholar.sun.ac.za:10019.1/109383
institution Stellenbosch University (South Africa)
language en_ZA
last_indexed 2026-06-10T12:45:40.057Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2020
publishDateRange 2020
publishDateSort 2020
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/109383 Quality control for data-dependent and data-independent mass-spectrometry-based proteomics Marina, Kriek Tabb, David Stoychev, Stoyan Hristov Stellenbosch University. Faculty of Medicine and Health Sciences. Dept. of Biomedical Sciences: Molecular Biology and Human Genetics. Bioinformatics UCTD Proteomics Mass spectrometry Thesis (PhD)--Stellenbosch University, 2020. ENGLISH ABSTRACT: Discovery proteomics is advancing at a rapid rate, and quality control of the technique must adapt accordingly. In 2012, a console application, QuaMeter, was created to produce quality control metrics for data-dependent proteomics based on metrics first designed by the USA National Institute for Standards and Technology (NIST). In 2014, the tool gained an identification-independent mode, which can generate 44 quality metrics still applicable only to data-dependent acquisition. However, the development of new data-independent acquisition methods in recent years introduces the need for a data-independent acquisition version of QuaMeter. The QuaMeter metrics must also still be analysed in a statistical framework such as R/Python to gain full value of the multivariate nature of the metrics. Biologists who are inexperienced at programming/ using a console might therefore find the use of such software limiting and there is a desire for a tool with a user interface with which to analyse the metrics. Here, I have created a console software for the analysis of data-independent acquisition results. The tool provides a platform for in-depth analysis of data quality. The tool is the first of its sort to allow the user to divide the retention time into segments and return quality metrics for each segment separately. This allows the researcher to gain extra insight into the chromatography steps, and as I illustrate here, the tool illuminates problems that would not have been visible if only one metric was provided for the entire file. In addition, the m/z axis is split into the data’s underlying isolation window structure and metrics calculated for each window separately to equip a researcher with additional information for method development. A set of metrics is also added which produce one value for the entire file for easy outlier detection among files. This project also involves the creation of a desktop application with user interface for running either of the two console applications. This tool can also perform some of the key downstream analysis regularly performed in quality control. Outlier detection is enabled via PCA, classification of longitudinal data as good or bad quality is performed with random forest analysis and individual metrics can also be visualized against their distributions. In addition, many quality control principles are explained and demonstrated in the context of the quality control metrics, such as experimental design, identifying sources of variability in an experiment and conventional quality control techniques such as outlier detection and classification of data quality are demonstrated. AFRIKAANSE OPSOMMING: Proteïen-massaspektrometrie maak die afgelope dekade baie vinnig vordering en die gehaltebeheer van die tegniek moet derhalwe dienooreenkomstig aangepas word. In 2012 is ’n konsole-toepassing, QuaMeter, vir die voortbrenging van gehaltemetings vir data-afhanklike proteoomanalise geskep. Hierdie weergawe van die toepassing is op ’n toepassing deur die Amerikaanse National Institute for Standards and Technology(NIST) gebaseer. In 2014 is ’n identifikasie-onafhanklike weergawe van die sagteware bygevoeg, wat 44 gehaltemetings rapporteer, maar steeds net vir data-afhanklike verkrygingstegnieke. Meer onlangs is daar egter nuwe data-onafhanklike verkrygingsmetodes ontwerp wat redelike steun in die gemeenskap geniet. Daar het dus ’n behoefte aan ’n data-onafhanklike weergawe van QuaMeter ontstaan. Die resultate van QuaMeter moet egter steeds stroomaf deur ’n statistiese raamwerk soos R/Python geanaliseer word om die meerveranderlike aard van QuaMeter ten volle te benut. Bioloë wat onervare in programmering of die gebruik van ’n konsole is, mag dit dalk as ’n onoorkomelike struikelblok beskou. Ek het derhalwe ’n konsole-sagteware, SwaMe, vir die analise van data-onafhanklike verkrygingsresultate gebou. SwaMe verskaf ’n platform vir ’n meer diepgaande analise van die datagehalte. Dié hulpmiddel is die eerste in sy soort wat die gebruiker toelaat om die retensietyd in segmente te verdeel en gehaltemetings vir elke segment afsonderlik te bereken. Sodoende kan die navorser insig verkry in die chromatografie, en soos ek hier aantoon, word instrumentele probleme uitgewys wat nie sigbaar sou gewees het indien daar slegs een waarde per monster gerapporteer was nie. Die m/z-as word in die data se onderliggende isolasievensterstruktuur onderverdeel en gemiddelde metings word vir elke venster afsonderlik verskaf, wat metode-ontwikkeling verder vergemaklik. ’n Stel metings wat slegs een waarde per monster bereken, word ook verskaf, wat veral in uitskieteropsporing nuttig is. Die projek sluit ook die ontwerp in van ’n grafiesekoppelvlak-toepassing, Assurance, wat ’n platform bied om die twee konsole-toepassings aan te wend. Dié werktuig kan ook help met die uitvoering van sekere van die belangrikste stroomaf statistiese analise. Dit word gereeld in gehaltebeheer uitgevoer en sluit in uitskieter-identifisering van hoofkomponentanalise en die klassifisering van longitudinale data as goed of sleg deur masjienleer; die visualisering van individuele metings met die dataverspreiding kan ook plaasvind. Talle gehaltebeheerbeginsels, soos eksperimentele ontwerp en die identifisering van bronne van veranderlikheid, word ook verduidelik en in die konteks van die gehaltemetings gedemonstreer. Daarbenewens word tradisionele gehaltebeheertegnieke soos dataklassifisering ook gedemonstreer. Doctoral 2020-11-30T17:34:16Z 2021-01-31T19:47:36Z 2020-11-30T17:34:16Z 2021-01-31T19:47:36Z 2020-12 Thesis http://hdl.handle.net/10019.1/109383 en_ZA Stellenbosch University 156 pages application/pdf Stellenbosch : Stellenbosch University
spellingShingle Bioinformatics
UCTD
Proteomics
Mass spectrometry
Marina, Kriek
Quality control for data-dependent and data-independent mass-spectrometry-based proteomics
title Quality control for data-dependent and data-independent mass-spectrometry-based proteomics
title_full Quality control for data-dependent and data-independent mass-spectrometry-based proteomics
title_fullStr Quality control for data-dependent and data-independent mass-spectrometry-based proteomics
title_full_unstemmed Quality control for data-dependent and data-independent mass-spectrometry-based proteomics
title_short Quality control for data-dependent and data-independent mass-spectrometry-based proteomics
title_sort quality control for data dependent and data independent mass spectrometry based proteomics
topic Bioinformatics
UCTD
Proteomics
Mass spectrometry
url http://hdl.handle.net/10019.1/109383
work_keys_str_mv AT marinakriek qualitycontrolfordatadependentanddataindependentmassspectrometrybasedproteomics