<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Rl on Tenu Tech Brief</title>
    <link>https://cluster-site.onrender.com/tags/rl/</link>
    <description>Recent content in Rl on Tenu Tech Brief</description>
    <generator>Hugo -- 0.146.0</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 24 Feb 2026 06:06:00 +0000</lastBuildDate>
    <atom:link href="https://cluster-site.onrender.com/tags/rl/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Hierarchical Reward Design from Language: Enhancing Alignment of Agent Behavior with Human Specifications</title>
      <link>https://cluster-site.onrender.com/posts/hierarchical-reward-design-from-language-enhancing-alignment-of-agent-behavior-with-human-specifications/</link>
      <pubDate>Tue, 24 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/hierarchical-reward-design-from-language-enhancing-alignment-of-agent-behavior-with-human-specifications/</guid>
      <description>• HRDL extends reward design to encode nuanced human preferences for long-horizon tasks. • L2HR translates natural language specifications into hierarchical reward signals for RL a</description>
    </item>
    <item>
      <title>Task-Aware Exploration via a Predictive Bisimulation Metric</title>
      <link>https://cluster-site.onrender.com/posts/task-aware-exploration-via-a-predictive-bisimulation-metric/</link>
      <pubDate>Tue, 24 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/task-aware-exploration-via-a-predictive-bisimulation-metric/</guid>
      <description>• TEB introduces task-aware exploration for visual RL with sparse rewards. • Uses predictive bisimulation metric to learn behaviorally grounded task representations. • Adds predict</description>
    </item>
    <item>
      <title>TPRU: Advancing Temporal and Procedural Understanding in Large Multimodal Models</title>
      <link>https://cluster-site.onrender.com/posts/tpru-advancing-temporal-and-procedural-understanding-in-large-multimodal-models/</link>
      <pubDate>Tue, 24 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/tpru-advancing-temporal-and-procedural-understanding-in-large-multimodal-models/</guid>
      <description>• TPRU dataset addresses temporal and procedural gaps in multimodal LLMs, enabling richer embodied AI. • Comprised of robotic manipulation and GUI navigation scenes with 3 tasks: T</description>
    </item>
    <item>
      <title>EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments</title>
      <link>https://cluster-site.onrender.com/posts/enterprisegym-corecraft-training-generalizable-agents-on-high-fidelity-rl-environments/</link>
      <pubDate>Thu, 19 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/enterprisegym-corecraft-training-generalizable-agents-on-high-fidelity-rl-environments/</guid>
      <description>• Computer Science &amp;gt; Artificial Intelligence [Submitted on 18 Feb 2026] Title:EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments View PDF HTML</description>
    </item>
    <item>
      <title>EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments</title>
      <link>https://cluster-site.onrender.com/posts/enterprisegym-corecraft-training-generalizable-agents-on-high-fidelity-rl-environments/</link>
      <pubDate>Thu, 19 Feb 2026 05:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/enterprisegym-corecraft-training-generalizable-agents-on-high-fidelity-rl-environments/</guid>
      <description>• Computer Science &amp;gt; Artificial Intelligence [Submitted on 18 Feb 2026] Title:EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments View PDF HTML</description>
    </item>
    <item>
      <title>Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective</title>
      <link>https://cluster-site.onrender.com/posts/unlocking-agentic-rl-training-for-gpt-oss-a-practical-retrospective/</link>
      <pubDate>Tue, 27 Jan 2026 01:53:15 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/unlocking-agentic-rl-training-for-gpt-oss-a-practical-retrospective/</guid>
      <description>• Agentic RL extends LLM training beyond single-turn responses to full decision-making via environment interaction. • It collects on‑policy data, optimizing policies across multi‑s</description>
    </item>
    <item>
      <title>RL without TD learning</title>
      <link>https://cluster-site.onrender.com/posts/rl-without-td-learning/</link>
      <pubDate>Sat, 01 Nov 2025 09:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/rl-without-td-learning/</guid>
      <description>• In this post, Iâll introduce a reinforcement learning (RL) algorithm based on an âalternativeâ paradigm: divide and conquer. • Unlike traditional methods, this algorithm is not b</description>
    </item>
    <item>
      <title>RL without TD learning</title>
      <link>https://cluster-site.onrender.com/posts/rl-without-td-learning/</link>
      <pubDate>Sat, 01 Nov 2025 09:00:00 +0000</pubDate>
      <guid>https://cluster-site.onrender.com/posts/rl-without-td-learning/</guid>
      <description>• In this post, Iâll introduce a reinforcement learning (RL) algorithm based on an âalternativeâ paradigm: divide and conquer. • Unlike traditional methods, this algorithm is not b</description>
    </item>
  </channel>
</rss>
