Wheel Loader Scooping Controller Using Deep Reinforcement Learning

Architecture

Abstract

Robotics may still seem to some like a far-fetched fantasy, but the usage of robots for variety of industry fields, especially in the construction and agricultural domains, yield many advantages, such as economic efficiency, safety and availability. As some tasks are more complex than others, they often require extensive engineering experience and tedious manual tuning beyond the control algorithm itself. Reinforcement learning (RL) is a type of machine learning technique that holds the promise of enabling robots to learn large repertoires of behavioral skills with minimal human intervention using trial and error methods. However, practical real-world applications of reinforcement learning are relatively rare, as they often require unrealistic learning time, high sample complexity, and discrepancies between simulated and real physical system behavior (i.e the reality gap). To create robust deep RL controllers (DRL), that can achieve success in real-life scenarios, some techniques can be used such as learning from both domains and performing further system identification.
In the proposed research we presents a deep reinforcement learning-based controller for an unmanned ground vehicle with a custom-built scooping mechanism. The robot’s aim is to autonomously perform earth scooping cycles with three degrees of freedom lift, tilt and the robot’s velocity. While the majority of previous research to automate scooping processes are based on data recorded by expert operators, we present a method to autonomously control a wheel loader to perform scooping cycle using deep reinforcement learning methods without any user-provided demonstrations. The controller’s learning approach is based on the actor-critic, Deep Deterministic Policy Gradient algorithm which we use to map online sensor data as input to continuously update actuator commands. The training of the scooping policy network is done solely in a simplified simulation environment using a virtual physics engine, which converged to an average of 65% fill factor from full bucket capacity and 5 sec average cycle time.

Publication
IEEE ACCESS