Full Text Available
Note: Clicking the button above will open the full text document at the original institutional repository in a new window.
Emoji prediction plays a vital role in digital communication by enriching text with personality, individual style, and emotional tone. However, existing frameworks primarily rely on user-generated text combined with statistical or language models, often overlooking the importance of individual user...
| Main Author: | |
|---|---|
| Format: | Thesis |
| Published: |
AUC Knowledge Fountain
2025
|
| Subjects: | |
| Tags: |
No Tags, Be the first to tag this record!
|
| Summary: | Emoji prediction plays a vital role in digital communication by enriching text with personality, individual style, and emotional tone. However, existing frameworks primarily rely on user-generated text combined with statistical or language models, often overlooking the importance of individual user traits and contextual information. Recent research highlights that integrating user-specific features, such as historical usage patterns and emotional context, can significantly improve emoji prediction accuracy. Building on these insights, we propose a comprehensive emoji prediction framework that incorporates user history, personality traits, and real-time emotional context. We leverage the Pan17 corpus, which contains a sufficient number of posts per user, to infer users’ emotional states, historical emoji usage patterns, and personality characteristics. These inferred features are then integrated alongside text embeddings to build a personalized emoji prediction model. We start by conducting an analysis to assess the individual contribution of personality, emotion, and usage patterns to the overall performance. By building separate models for each feature and evaluating them across all datasets, we show that each feature independently improves prediction performance over the baseline, with emotion and usage patterns having the most substantial impact. Additionally, We evaluate our personalized model against a traditional text-only baseline across eight datasets extracted from the Pan17 corpus, using different thresholds for the number of emojis (20, 50, 62, 100, 150, 200, 250, and 300 emojis). Our results show that the personalized model consistently outperforms the baseline, achieving improvements of 1.33% in Accuracy. Finally, we introduce a semantic evaluation framework that clusters emoji embeddings to group semantically similar emojis. Evaluation based on these clusters demonstrates that our personalized model also produces more semantically relevant predictions. |
|---|