Full Text Available

Access Repository Access Repository

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

3D convolution with two-stream convNets for human action recognition

Human action recognition is attempting to identify what kind of action is being performed in a given video by a person, it is considered one of the important topics in machine learning and computer vision. It’s importance comes from it’s need in many applications such as security applications and hu...

Full description

Saved in:

Bibliographic Details
Main Author:	Hosny, Karim Mohamed
Format:	Thesis
Published:	AUC Knowledge Fountain 2020
Subjects:	human action recognition\|\|machine learning\|\|UCF-101\|\|HMDB-51\|\|convolutional networks\|\|CNN\|\|3D convolution\|\|two stream convolutional network\|\|deep learning\|\|artificial intelligence\|\|computer vision\|\|ResNet-50\|\|video recognition
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1867613420384157696
access_status_str	Open Access
author	Hosny, Karim Mohamed
author_browse	Hosny, Karim Mohamed
author_facet	Hosny, Karim Mohamed
author_sort	Hosny, Karim Mohamed
collection	Thesis
dc_rights_str_mv	The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy. The author has granted the American University in Cairo or its agents a non-exclusive license to archive this thesis, dissertation, paper, or record of study, and to make it accessible, in whole or in part, in all forms of media, now or hereafter known.
description	Human action recognition is attempting to identify what kind of action is being performed in a given video by a person, it is considered one of the important topics in machine learning and computer vision. It’s importance comes from it’s need in many applications such as security applications and human computer interaction. Many methods have been researched to attempt to solve the problem, ranging from handcrafting techniques to deep neural network techniques and methods such as 3D convolution and recurrent neural networks has been used as well. Popular datasets have been curated in order to benchmark the methods researched to tackle this problem, datasets such as UCF-101 and HMDB-51 are the most popular and are being tested with for all current and past techniques in the area of human action recognition. two-stream convolutional networks, a deep learning technique, has picked up the trend in recent years to solve the human action recognition problem. Most famous method for solving the problem is by pre-processing the video to generate optical flow data or dense trajectories then feed them to a deep neural network alongside feeding static individual image frames of the video. We attempt to ask the question of can we classify human action without the need for pre-processing or handcrafted feature generation before using deep learning for classification? And how will 3D convolution affect the temporal stream and the overall classification accuracy. We contribute to solving the human action recognition problem by introducing a new end-to-end solution using two-stream convolutional network that learns static features and temporal features without any pre-processing for the data to generate optical flow or dense trajectories for video temporal information. Our method has been tested on UCF-101 and HMDB-51 datasets to compete with state of the art techniques. It shows that we were able to achieve high accuracy results without any pre-processing needed unlike current popular methods. Our method ranked among the highest in UCF-101, the only method which had a higher accuracy was a research modifying the original two-stream network by adding new fusion techniques. And ranked the highest in the HMDB-51 in comparison with the other techniques.
format	Thesis
id	oai:fount.aucegypt.edu:etds-2752
institution	American University in Cairo (Egypt)
last_indexed	2026-06-10T12:35:51.500Z
license_str	Other — see source repository
provenance_str_mv	Harvested via OAI-PMH from AUC Knowledge Fountain — bepress
publishDate	2020
publishDateRange	2020
publishDateSort	2020
publisher	AUC Knowledge Fountain
publisherStr	AUC Knowledge Fountain
record_format	dspace
source_str	AUC Knowledge Fountain — bepress
spelling	oai:fount.aucegypt.edu:etds-2752 3D convolution with two-stream convNets for human action recognition Hosny, Karim Mohamed Human action recognition is attempting to identify what kind of action is being performed in a given video by a person, it is considered one of the important topics in machine learning and computer vision. It’s importance comes from it’s need in many applications such as security applications and human computer interaction. Many methods have been researched to attempt to solve the problem, ranging from handcrafting techniques to deep neural network techniques and methods such as 3D convolution and recurrent neural networks has been used as well. Popular datasets have been curated in order to benchmark the methods researched to tackle this problem, datasets such as UCF-101 and HMDB-51 are the most popular and are being tested with for all current and past techniques in the area of human action recognition. two-stream convolutional networks, a deep learning technique, has picked up the trend in recent years to solve the human action recognition problem. Most famous method for solving the problem is by pre-processing the video to generate optical flow data or dense trajectories then feed them to a deep neural network alongside feeding static individual image frames of the video. We attempt to ask the question of can we classify human action without the need for pre-processing or handcrafted feature generation before using deep learning for classification? And how will 3D convolution affect the temporal stream and the overall classification accuracy. We contribute to solving the human action recognition problem by introducing a new end-to-end solution using two-stream convolutional network that learns static features and temporal features without any pre-processing for the data to generate optical flow or dense trajectories for video temporal information. Our method has been tested on UCF-101 and HMDB-51 datasets to compete with state of the art techniques. It shows that we were able to achieve high accuracy results without any pre-processing needed unlike current popular methods. Our method ranked among the highest in UCF-101, the only method which had a higher accuracy was a research modifying the original two-stream network by adding new fusion techniques. And ranked the highest in the HMDB-51 in comparison with the other techniques. 2020-02-09T08:00:00Z thesis application/pdf https://fount.aucegypt.edu/etds/1712 https://fount.aucegypt.edu/context/etds/article/2752/viewcontent/Thesis.pdf The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy. The author has granted the American University in Cairo or its agents a non-exclusive license to archive this thesis, dissertation, paper, or record of study, and to make it accessible, in whole or in part, in all forms of media, now or hereafter known. Theses and Dissertations AUC Knowledge Fountain human action recognition\|\|machine learning\|\|UCF-101\|\|HMDB-51\|\|convolutional networks\|\|CNN\|\|3D convolution\|\|two stream convolutional network\|\|deep learning\|\|artificial intelligence\|\|computer vision\|\|ResNet-50\|\|video recognition
spellingShingle	human action recognition\|\|machine learning\|\|UCF-101\|\|HMDB-51\|\|convolutional networks\|\|CNN\|\|3D convolution\|\|two stream convolutional network\|\|deep learning\|\|artificial intelligence\|\|computer vision\|\|ResNet-50\|\|video recognition Hosny, Karim Mohamed 3D convolution with two-stream convNets for human action recognition
title	3D convolution with two-stream convNets for human action recognition
title_full	3D convolution with two-stream convNets for human action recognition
title_fullStr	3D convolution with two-stream convNets for human action recognition
title_full_unstemmed	3D convolution with two-stream convNets for human action recognition
title_short	3D convolution with two-stream convNets for human action recognition
title_sort	3d convolution with two stream convnets for human action recognition
topic	human action recognition\|\|machine learning\|\|UCF-101\|\|HMDB-51\|\|convolutional networks\|\|CNN\|\|3D convolution\|\|two stream convolutional network\|\|deep learning\|\|artificial intelligence\|\|computer vision\|\|ResNet-50\|\|video recognition
url	https://fount.aucegypt.edu/etds/1712 https://fount.aucegypt.edu/context/etds/article/2752/viewcontent/Thesis.pdf
work_keys_str_mv	AT hosnykarimmohamed 3dconvolutionwithtwostreamconvnetsforhumanactionrecognition

Full Text Available

3D convolution with two-stream convNets for human action recognition

Similar Items