DS1 spectrogram: Skill-Critic: Refining Learned Skills for Reinforcement Learning

Skill-Critic: Refining Learned Skills for Reinforcement Learning

2306.08388

Authors

Chen Tang,Kenta Kawamoto,Masayoshi Tomizuka,Wei Zhan,Ce Hao

Abstract

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e.

sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data.

However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection.

Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport.

The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance. Code and videos are available at our website: https://sites.google.com/view/skill-critic.

Resources

Stay in the loop

Every AI paper that matters, free in your inbox daily.

Details

  • takara.ai
  • Custom AI and machine learning from the Frontier Research Team.
  • © 2026 takara.ai Ltd
  • Content is sourced from third-party publications.