Similar Items: One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving
- Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation
- Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation
- Unified Value Alignment for Generative Recommendation in Industrial Advertising
- One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation
- CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation
- Multi-Axis Speech Similarity via Factor-Partitioned Embeddings