Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications

Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications

• HRDL extends reward design to encode nuanced human preferences for long-horizon tasks. • L2HR translates natural language specifications into hierarchical reward signals for RL a

Research & Labs · February 24, 2026 (updated February 24, 2026) · 1 min · 178 words
Task-Aware Exploration via a Predictive Bisimulation Metric

Task-Aware Exploration via a Predictive Bisimulation Metric

• TEB introduces task-aware exploration for visual RL with sparse rewards. • Uses predictive bisimulation metric to learn behaviorally grounded task representations. • Adds predict

Research & Labs · February 24, 2026 (updated February 24, 2026) · 1 min · 164 words
TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models

TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models

• TPRU dataset addresses temporal and procedural gaps in multimodal LLMs, enabling richer embodied AI. • Comprised of robotic manipulation and GUI navigation scenes with 3 tasks: T

Research & Labs · February 24, 2026 (updated February 24, 2026) · 1 min · 187 words
EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

• Computer Science > Artificial Intelligence [Submitted on 18 Feb 2026] Title:EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments View PDF HTML

Research & Labs · February 19, 2026 (updated February 24, 2026) · 2 min · 271 words
EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

• Computer Science > Artificial Intelligence [Submitted on 18 Feb 2026] Title:EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments View PDF HTML

Research · February 19, 2026 (updated February 19, 2026) · 2 min · 236 words
Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

• Agentic RL extends LLM training beyond single-turn responses to full decision-making via environment interaction. • It collects on‑policy data, optimizing policies across multi‑s

RL without TD learning

RL without TD learning

• In this post, Iâll introduce a reinforcement learning (RL) algorithm based on an âalternativeâ paradigm: divide and conquer. • Unlike traditional methods, this algorithm is not b

Research · November 1, 2025 (updated February 19, 2026) · 1 min · 186 words
RL without TD learning

RL without TD learning

• In this post, Iâll introduce a reinforcement learning (RL) algorithm based on an âalternativeâ paradigm: divide and conquer. • Unlike traditional methods, this algorithm is not b

Research & Labs · November 1, 2025 (updated February 24, 2026) · 1 min · 187 words