Research Collection School Of Computing and Information Systems

Actor-critic for continuous action chunks: a reinforcement learning framework for long-horizon robotic manipulation with sparse reward

Jiarui YANG
Bin ZHU, Singapore Management UniversityFollow
Jingjing CHEN
Yu-Gang JIANG

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

1-2026

Abstract

Existing reinforcement learning (RL) methods struggle with long-horizon robotic manipulation tasks, particularly those involving sparse rewards. While action chunking is a promising paradigm for robotic manipulation, using RL to directly learn continuous action chunks in a stable and data-efficient manner remains a critical challenge. This paper introduces AC3 (Actor-Critic for Continuous Chunks), a novel RL framework that learns to generate high-dimensional, continuous action sequences. To make this learning process stable and dataefficient, AC3 incorporates targeted stabilization mechanisms for both the actor and the critic. First, to ensure reliable policy improvement, the actor is trained with an asymmetric update rule, learning exclusively from successful trajectories. Second, to enable effective value learning despite sparse rewards, the critic’s update is stabilized using intra-chunk n-step returns and further enriched by a self-supervised module providing intrinsic rewards at anchor points aligned with each action chunk. We conducted extensive experiments on 25 tasks from the BiGym and RLBench benchmarks. Results show that by using only a few demonstrations and a simple model architecture, AC3 achieves superior success rates on most tasks, validating its effective design.

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence (AAAI‑26), Singapore, January 20-27

First Page

Last Page

Publisher

AAAI

City or Country

Singapore

Citation

YANG, Jiarui; ZHU, Bin; CHEN, Jingjing; and JIANG, Yu-Gang. Actor-critic for continuous action chunks: a reinforcement learning framework for long-horizon robotic manipulation with sparse reward. (2026). Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence (AAAI‑26), Singapore, January 20-27. 1-9.
Available at: https://ink.library.smu.edu.sg/sis_research/10862

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

Actor-critic for continuous action chunks: a reinforcement learning framework for long-horizon robotic manipulation with sparse reward

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Actor-critic for continuous action chunks: a reinforcement learning framework for long-horizon robotic manipulation with sparse reward

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links