Channels - AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward :: FRELIP Discovery

Similar Items: AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Quick Look
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
Quick Look
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
Quick Look
Verifier-Backed Hard Problem Generation for Mathematical Reasoning
Quick Look
Trust, but Verify: Peeling Low-Bit Transformer Networks for Training Monitoring
Quick Look
Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring
Quick Look
Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime
Quick Look
Bolek: A Multimodal Language Model for Molecular Reasoning
Quick Look
A decoupled diffusion planner that adapts to changing cost limits by using cost-conditioned generation for safety and reward gradients for performance
Quick Look
EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure
Quick Look
STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation
Quick Look
BEACON: A Multimodal Dataset for Learning Behavioral Fingerprints from Gameplay Data
Quick Look
Adaptive Domain Decomposition Physics-Informed Neural Networks for Traffic State Estimation with Sparse Sensor Data
Quick Look
Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis
Quick Look
Physiologically Grounded Driver Behavior Classification: SHAP-Driven Elite Feature Selection and Hybrid Gradient Boosting for Multimodal Physiological Signals
Quick Look
Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint
Quick Look
Compute Where it Counts: Self Optimizing Language Models
Quick Look
Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models
Quick Look
UniSD: Towards a Unified Self-Distillation Framework for Large Language Models
Quick Look
Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback
Quick Look
Early Detection of Water Stress by Plant Electrophysiology: Machine Learning for Irrigation Management
Quick Look
Exponential families from a single KL identity
Quick Look
TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering
Quick Look
A Unified Framework of Hyperbolic Graph Representation Learning Methods
Quick Look
Assessing the Role of Intersection Proximity in Pedestrian Crashes: Insights from Data Mining Approach