• TEB introduces task-aware exploration for visual RL with sparse rewards. • Uses predictive bisimulation metric to learn behaviorally grounded task representations. • Adds predicted reward differential to prevent representation collapse under sparse rewards. • Constructs potential-based exploration bonuses measuring novelty in latent space. • Experiments on MetaWorld and Maze2D show superior exploration over recent baselines. • Provides robust, task-relevant exploration without requiring low-dimensional state access.
Article Summaries:
- Summary
A new reinforcement‑learning method, Task‑aware Exploration via a Predictive Bisimulation Metric (TEB), tackles the challenge of sparse rewards in visual domains. TEB couples task‑relevant state representations with exploration by employing a predictive bisimulation metric that measures behavioral similarity. To prevent representation collapse under sparse rewards, the authors introduce a predicted reward differential, strengthening the metric’s discriminative power. Using this robust metric, TEB generates potential‑based exploration bonuses that quantify novelty in the learned latent space. Experiments on MetaWorld and Maze2D show that TEB outperforms recent baselines, demonstrating superior exploration efficiency in complex visual environments.
Sources: