Full Text Available

Access Repository Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Improving machine learning techniques for influenza-A classification

Influenza-A's ability to mutate constantly has resulted in recurring seasonal epidemics and pandemics. Recently, the virus's spread has been enhanced by its ability to infect multiple hosts simultaneously. Fast identification of the subtype and hosts of Influenza-A virus, is thus crucial, to quickly...

Full description

Saved in:

Bibliographic Details
Main Author:	Shaltout, Nermin Ashraf
Format:	Thesis
Published:	AUC Knowledge Fountain 2016
Subjects:	D Bioinformatics
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613416471920640
access_status_str	Open Access
author	Shaltout, Nermin Ashraf
author_browse	Shaltout, Nermin Ashraf
author_facet	Shaltout, Nermin Ashraf
author_sort	Shaltout, Nermin Ashraf
collection	Thesis
dc_rights_str_mv	The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy.
description	Influenza-A's ability to mutate constantly has resulted in recurring seasonal epidemics and pandemics. Recently, the virus's spread has been enhanced by its ability to infect multiple hosts simultaneously. Fast identification of the subtype and hosts of Influenza-A virus, is thus crucial, to quickly measure its drug resistance and virulence. Research in data mining techniques for influenza virus A host and subtype classification, has already been underway. The older studies' main goal was improving the accuracy, speed and safety of the virus analyses. With newer infectious strains of Influenza-A, appearing yearly, these techniques are still open for improvement. The current research plans to improve existing machine learning techniques for classifying Influenza-A by using the following methodologies: (a) Exploring the effectiveness of using RNA/cDNA data over protein data for virus classification. (b) Measuring the impact of preprocessing the virus, by selecting the most informative positions in the sequence, on classifier performance and speed; both neural networks (NNs) and decision trees (DTs) were analyzed. (c) Testing the previous method on more than one classification problem; host identification experiments were conducted on both subtype H1, and H5, while antiviral resistance identification was conducted on the H1N1 strain. Accuracy, sensitivity, specificity, precision and time were used as performance measures. The final results showed that: (a) DNA data is more sensitive than Protein data when using both subtypes. (b) Using the most 100 and 10 informative positions with DTs yielded an overall speed improvement of 92-100% when identifying hosts for segments of subtype H1. The performance decrease was insignificant. Using 100 and 60 informative positions with NNs yielded a speed improvement of 88% when identifying hosts of both subtypes H1, and H5. There was no significant drop in overall performance. Of the two classifiers: NNs had better performance, while DTs had better efficiency. (c) Testing the method on antiviral resistance identification of Influenza-A, showed promising results: Using the most 100 informative positions with DTs yielded an overall performance of not less than 95%, in not more than 3 seconds for all 8 segments. The method has the potential to improve the efficiency of other Influenza-A classification problems, as well as other viral classification problems in the Bioinformatics field. The thesis provided the following contributions: (a) A way to extract informative positions from DNA positions directly without converting the DNA data to protein data. This can aid in detecting silent mutations in Influenza-A virus. (b) Antiviral identification of Adamantane using all eight segments of the virus. Previously there was one known viral segment mainly responsible for antiviral resistance. (c) Measuring the efficiency of using informative positions, as a preprocessing step, in terms of speed. (d) A clear comparison between two classifier performances when using the information gain algorithm.
format	Thesis
id	oai:fount.aucegypt.edu:etds-2208
institution	American University in Cairo (Egypt)
last_indexed	2026-06-10T12:35:47.730Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from AUC Knowledge Fountain — bepress
publishDate	2016
publishDateRange	2016
publishDateSort	2016
publisher	AUC Knowledge Fountain
publisherStr	AUC Knowledge Fountain
record_format	dspace
source_str	AUC Knowledge Fountain — bepress
spelling	oai:fount.aucegypt.edu:etds-2208 Improving machine learning techniques for influenza-A classification Shaltout, Nermin Ashraf Influenza-A's ability to mutate constantly has resulted in recurring seasonal epidemics and pandemics. Recently, the virus's spread has been enhanced by its ability to infect multiple hosts simultaneously. Fast identification of the subtype and hosts of Influenza-A virus, is thus crucial, to quickly measure its drug resistance and virulence. Research in data mining techniques for influenza virus A host and subtype classification, has already been underway. The older studies' main goal was improving the accuracy, speed and safety of the virus analyses. With newer infectious strains of Influenza-A, appearing yearly, these techniques are still open for improvement. The current research plans to improve existing machine learning techniques for classifying Influenza-A by using the following methodologies: (a) Exploring the effectiveness of using RNA/cDNA data over protein data for virus classification. (b) Measuring the impact of preprocessing the virus, by selecting the most informative positions in the sequence, on classifier performance and speed; both neural networks (NNs) and decision trees (DTs) were analyzed. (c) Testing the previous method on more than one classification problem; host identification experiments were conducted on both subtype H1, and H5, while antiviral resistance identification was conducted on the H1N1 strain. Accuracy, sensitivity, specificity, precision and time were used as performance measures. The final results showed that: (a) DNA data is more sensitive than Protein data when using both subtypes. (b) Using the most 100 and 10 informative positions with DTs yielded an overall speed improvement of 92-100% when identifying hosts for segments of subtype H1. The performance decrease was insignificant. Using 100 and 60 informative positions with NNs yielded a speed improvement of 88% when identifying hosts of both subtypes H1, and H5. There was no significant drop in overall performance. Of the two classifiers: NNs had better performance, while DTs had better efficiency. (c) Testing the method on antiviral resistance identification of Influenza-A, showed promising results: Using the most 100 informative positions with DTs yielded an overall performance of not less than 95%, in not more than 3 seconds for all 8 segments. The method has the potential to improve the efficiency of other Influenza-A classification problems, as well as other viral classification problems in the Bioinformatics field. The thesis provided the following contributions: (a) A way to extract informative positions from DNA positions directly without converting the DNA data to protein data. This can aid in detecting silent mutations in Influenza-A virus. (b) Antiviral identification of Adamantane using all eight segments of the virus. Previously there was one known viral segment mainly responsible for antiviral resistance. (c) Measuring the efficiency of using informative positions, as a preprocessing step, in terms of speed. (d) A clear comparison between two classifier performances when using the information gain algorithm. 2016-06-01T07:00:00Z thesis application/pdf https://fount.aucegypt.edu/etds/1209 https://fount.aucegypt.edu/context/etds/article/2208/viewcontent/ImprovingInfluenzaAClassification.pdf The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy. Theses and Dissertations AUC Knowledge Fountain D Bioinformatics
spellingShingle	D Bioinformatics Shaltout, Nermin Ashraf Improving machine learning techniques for influenza-A classification
title	Improving machine learning techniques for influenza-A classification
title_full	Improving machine learning techniques for influenza-A classification
title_fullStr	Improving machine learning techniques for influenza-A classification
title_full_unstemmed	Improving machine learning techniques for influenza-A classification
title_short	Improving machine learning techniques for influenza-A classification
title_sort	improving machine learning techniques for influenza a classification
topic	D Bioinformatics
url	https://fount.aucegypt.edu/etds/1209 https://fount.aucegypt.edu/context/etds/article/2208/viewcontent/ImprovingInfluenzaAClassification.pdf
work_keys_str_mv	AT shaltoutnerminashraf improvingmachinelearningtechniquesforinfluenzaaclassification

Full Text Available

Improving machine learning techniques for influenza-A classification

Similar Items