Full Text Available

Note: Clicking the button above will open the full text document at the original institutional repository in a new window.

Trending topic extraction from social media

Social media has become the first source of information for many people. The amount of information posted on social media daily has become very vast that it became difficult to track. One of the most popular social media applications is Twitter. Users follow lots of news accounts, public figures, an...

Full description

Saved in:
Bibliographic Details
Main Author: Mostafa, Nada Ayman A.
Format: Thesis
Published: AUC Knowledge Fountain 2016
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1867613409159151616
access_status_str Open Access
author Mostafa, Nada Ayman A.
author_browse Mostafa, Nada Ayman A.
author_facet Mostafa, Nada Ayman A.
author_sort Mostafa, Nada Ayman A.
collection Thesis
dc_rights_str_mv The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy.
description Social media has become the first source of information for many people. The amount of information posted on social media daily has become very vast that it became difficult to track. One of the most popular social media applications is Twitter. Users follow lots of news accounts, public figures, and their friends so they can be updated by the latest events around them. Since the dialect language and the style of writing differ from a region to another, our objective in this research is to extract trending topics for an Egyptian twitter user. In this way, the user can easily get at a glimpse of the trending topics discussed by the people he follows. To find the best approach achieving our objective, we investigate the document pivot and the feature pivot approaches. By applying the document pivot approach on the baseline data using tf-itf (term frequency-inverse tweet frequency) representation, repeated bisecting k-means clustering technique and extracting most frequent n-grams from each cluster we could achieve a recall value of 100% and F1 measure of 0.8. The application of the feature pivot approach on the baseline data using the content similarity algorithm to group related unigrams together, could achieve a recall value of 100% and F1 measure of 0.923. To validate our results we collected 12 different data sets of different sizes (200, 400, 600, and 1200) and from three different domains (sports, entertainment, and news) then applied both approaches to them. The average recall, precision and F1 measure values resulted from applying the feature pivot approach are larger than those achieved by applying the document pivot approach. To make sure this difference in results is statistically significant we applied the Two-sample one-tailed paired significance t-test that showed the results are significantly better at confidence interval of 90% The results showed that the document pivot approach could extract the trending topics for an Egyptian twitter user with an average recall value of 0.714, average precision value of 0.521, and average F1 measure value of 0.556 versus average recall, precision and F1 measure values of 0.981, 0.754, and 0.833 respectively, when applying the feature pivot approach.  
format Thesis
id oai:fount.aucegypt.edu:etds-1245
institution American University in Cairo (Egypt)
last_indexed 2026-06-10T12:35:41.195Z
license_str Other — see source repository
provenance_str_mv Harvested via OAI-PMH from AUC Knowledge Fountain — bepress
publishDate 2016
publishDateRange 2016
publishDateSort 2016
publisher AUC Knowledge Fountain
publisherStr AUC Knowledge Fountain
record_format dspace
source_str AUC Knowledge Fountain — bepress
spelling oai:fount.aucegypt.edu:etds-1245 Trending topic extraction from social media Mostafa, Nada Ayman A. Social media has become the first source of information for many people. The amount of information posted on social media daily has become very vast that it became difficult to track. One of the most popular social media applications is Twitter. Users follow lots of news accounts, public figures, and their friends so they can be updated by the latest events around them. Since the dialect language and the style of writing differ from a region to another, our objective in this research is to extract trending topics for an Egyptian twitter user. In this way, the user can easily get at a glimpse of the trending topics discussed by the people he follows. To find the best approach achieving our objective, we investigate the document pivot and the feature pivot approaches. By applying the document pivot approach on the baseline data using tf-itf (term frequency-inverse tweet frequency) representation, repeated bisecting k-means clustering technique and extracting most frequent n-grams from each cluster we could achieve a recall value of 100% and F1 measure of 0.8. The application of the feature pivot approach on the baseline data using the content similarity algorithm to group related unigrams together, could achieve a recall value of 100% and F1 measure of 0.923. To validate our results we collected 12 different data sets of different sizes (200, 400, 600, and 1200) and from three different domains (sports, entertainment, and news) then applied both approaches to them. The average recall, precision and F1 measure values resulted from applying the feature pivot approach are larger than those achieved by applying the document pivot approach. To make sure this difference in results is statistically significant we applied the Two-sample one-tailed paired significance t-test that showed the results are significantly better at confidence interval of 90% The results showed that the document pivot approach could extract the trending topics for an Egyptian twitter user with an average recall value of 0.714, average precision value of 0.521, and average F1 measure value of 0.556 versus average recall, precision and F1 measure values of 0.981, 0.754, and 0.833 respectively, when applying the feature pivot approach.   2016-06-01T07:00:00Z thesis application/pdf https://fount.aucegypt.edu/etds/246 https://fount.aucegypt.edu/context/etds/article/1245/viewcontent/Trending_Topic_Extraction_from_Social_Media_Nada_Ayman.pdf The author retains all rights with regard to copyright. The author certifies that written permission from the owner(s) of third-party copyrighted matter included in the thesis, dissertation, paper, or record of study has been obtained. The author further certifies that IRB approval has been obtained for this thesis, or that IRB approval is not necessary for this thesis. Insofar as this thesis, dissertation, paper, or record of study is an educational record as defined in the Family Educational Rights and Privacy Act (FERPA) (20 USC 1232g), the author has granted consent to disclosure of it to anyone who requests a copy. Theses and Dissertations AUC Knowledge Fountain Data Mining Social Media
spellingShingle Data Mining
Social Media
Mostafa, Nada Ayman A.
Trending topic extraction from social media
title Trending topic extraction from social media
title_full Trending topic extraction from social media
title_fullStr Trending topic extraction from social media
title_full_unstemmed Trending topic extraction from social media
title_short Trending topic extraction from social media
title_sort trending topic extraction from social media
topic Data Mining
Social Media
url https://fount.aucegypt.edu/etds/246
https://fount.aucegypt.edu/context/etds/article/1245/viewcontent/Trending_Topic_Extraction_from_Social_Media_Nada_Ayman.pdf
work_keys_str_mv AT mostafanadaaymana trendingtopicextractionfromsocialmedia