Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Implementing a pipeline for analysing single-cell RNA sequencing data

Thesis (MSc)--Stellenbosch University, 2023.

Saved in:
Bibliographic Details
Main Author: Ahiavi, Kwame
Other Authors: Tromp, Gerard
Format: Thesis
Language:en_ZA
en_ZA
Published: Stellenbosch : Stellenbosch University 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613915071905792
access_status_str Open Access
author Ahiavi, Kwame
author2 Tromp, Gerard
author_browse Ahiavi, Kwame
Tromp, Gerard
author_facet Tromp, Gerard
Ahiavi, Kwame
author_sort Ahiavi, Kwame
collection Thesis
dc_rights_str_mv Stellenbosch University
description Thesis (MSc)--Stellenbosch University, 2023.
format Thesis
id oai:scholar.sun.ac.za:10019.1/127144
institution Stellenbosch University (South Africa)
language en_ZA
en_ZA
last_indexed 2026-06-10T12:43:43.080Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from SUNScholar — Stellenbosch University Repository
publishDate 2023
publishDateRange 2023
publishDateSort 2023
publisher Stellenbosch : Stellenbosch University
publisherStr Stellenbosch : Stellenbosch University
record_format dspace
source_str SUNScholar — Stellenbosch University Repository
spelling oai:scholar.sun.ac.za:10019.1/127144 Implementing a pipeline for analysing single-cell RNA sequencing data Ahiavi, Kwame Tromp, Gerard Van der Spuy, Gian Maasdorp, Elizna Stellenbosch University. Faculty of Medicine and Health Sciences. Dept. of Biomedical Sciences. Molecular Biology and Human Genetics. Pipeline development, Single-cell RNA-seq, Reproducible science, Workflow management, Containerization Nucleotide sequence Pipelining (Electronics) Human reproduction -- Immunological aspects Gene expression Statistical methods Thesis (MSc)--Stellenbosch University, 2023. ENGLISH ABSTRACT: Single-cell RNA sequencing (scRNA-seq) has permitted the dissection of gene expression at single-cell resolution and provides novel insights into the composition of apparently homogeneous cell types and transitions between cell states — thereby deepening our understanding of the cell as a functional unit. The data generated by scRNA-seq are characterised by sparsity, heterogeneity, and high-dimensionality as well as large scale. As a result of biological and technical limitations, scRNA-seq data are “noisier” and more complex than their bulk RNA-seq counterparts. Thus, analysing scRNA-seq data demands new statistical and computational methods. Analytical algorithms employed in scRNA-seq pipelines are prone to producing different results depending on the state at the start of the analysis and the number of iterations of computation, complicating reproducibility. I developed a highly robust, scalable, and reproducible analysis pipeline for scRNA-seq data, implemented in Nextflow — a workflow management system that complies with current best practices in bioinformatics. The pipeline implements pre-processing and comprehensive downstream analyses for scRNA-seq data. With the publicly available datasets used for testing, the pipeline identified cell types and differentially expressed genes that enabled the identification of cell subtypes. Trajectory inference also showed the differentiation trajectory of cells, identifying subclusters within cells. In addition, the pipeline documents all steps and transformations, records software packages and versions, and incorporates ontological metadata annotation. Containerisation of pipeline processes ensures that software dependencies are satisfied — contributing to consistent, robust, and reproducible science. AFRIKAANS OPSOMMING: Enkelsel-RNA volgordebepaling (esRNAv) maak dit moontlik om geenuitdrukking te bestudeer teen enkelsel-resolusie en gee nuwe insig in die samestelling van skynbaar homogene seltipes en die oorgang tussen selfases — daarmee verdiep ons verstaan van die sel as funksionele eenheid. Die data wat esRNAv skep, word gekenmerk deur ‘n yl verspreiding van waardes, groot variasie, hoë dimensionaliteit en wye skaal. esRNAv data is, as gevolg van biologiese en tegniese beperkinge, meer geneig tot agtergrond geraas, as grootmaat RNA volgordebepaling. Daarom het esRNAv data nuwe statistiese en berekeningmetodes nodig. Herproduseerbaarheid is uitdagend omdat esRNAv analitiese algoritmes in pyplyne geneig is om verskillende resultate te gee afhangende van die beginpunt en die hoeveelheid herhalings in berekeninge. Ek het ‘n stewige, herproduseerbare pyplyn wat op enige skaal toegepas kan word, ontwikkel, om esRNAv data te analiseer en het dit implementeer met Nextflow — ‘n werkvloeibestuurstelsel wat huidige beste praktyk in bioinformatika is. Die pyplyn is die eerste om beide voorverwerking en uitgebreide stroom-af analise vir esRNAv uit te voer. Die pyplyn is met datastelle wat vrylik beskikbaar is, getoets en het seltipes uitgeken. Ontwikkelingsbaanafleiding het ook die onderskeiding van selle en onderafdelings gewys. Verder hou die pyplyn rekord van alle stappe, verwerkings, sagteware pakette en weergawes, en sluit ontologiese metadata in. Die pyplyn prosesse is in virtuele houers afgesonder sodat sagteware afhanklikheid bestuur kan word. Dit dra by tot volhoubare en herproduseerbare wetenskap Masters 2023-03-06T09:53:14Z 2023-05-18T07:06:37Z 2023-03-06T09:53:14Z 2023-05-18T07:06:37Z 2023-03 Thesis http://hdl.handle.net/10019.1/127144 en_ZA en_ZA Stellenbosch University xiii, 111 pages : illustrations application/pdf Stellenbosch : Stellenbosch University
spellingShingle Pipeline development, Single-cell RNA-seq, Reproducible science, Workflow management, Containerization
Nucleotide sequence
Pipelining (Electronics)
Human reproduction -- Immunological aspects
Gene expression
Statistical methods
Ahiavi, Kwame
Implementing a pipeline for analysing single-cell RNA sequencing data
title Implementing a pipeline for analysing single-cell RNA sequencing data
title_full Implementing a pipeline for analysing single-cell RNA sequencing data
title_fullStr Implementing a pipeline for analysing single-cell RNA sequencing data
title_full_unstemmed Implementing a pipeline for analysing single-cell RNA sequencing data
title_short Implementing a pipeline for analysing single-cell RNA sequencing data
title_sort implementing a pipeline for analysing single cell rna sequencing data
topic Pipeline development, Single-cell RNA-seq, Reproducible science, Workflow management, Containerization
Nucleotide sequence
Pipelining (Electronics)
Human reproduction -- Immunological aspects
Gene expression
Statistical methods
url http://hdl.handle.net/10019.1/127144
work_keys_str_mv AT ahiavikwame implementingapipelineforanalysingsinglecellrnasequencingdata