Similar Items: When Audio-Language Models Fail to Leverage Multimodal Context for Dysarthric Speech Recognition
- A Comprehensive Analysis of Tokenization and Self-Supervised Learning in End-to-End Automatic Speech Recognition applied on French Language
- PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
- When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models
- Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts
- Fuzzy Fingerprinting Encoder Pre-trained Language Models for Emotion Recognition in Conversations: Human Assessment and Validity Study
- Tibetan-TTS:Low-Resource Tibetan Speech Synthesis with Large Model Adaptation