Similar Items: Synthetic Users, Real Differences: an Evaluation Framework for User Simulation in Multi-Turn Conversations
- Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors
- Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation
- Rose-SQL: Role-State Evolution Guided Structured Reasoning for Multi-Turn Text-to-SQL
- Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers
- FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios
- Towards Emotion Consistency Analysis of Large Language Models in Emotional Conversational Contexts