Half Cheetah

../../../_images/half_cheetah.gif

This environment is part of the Mujoco environments which contains general information about the environment.

Action Space

Box(-1.0, 1.0, (6,), float32)

Observation Space

Box(-inf, inf, (17,), float64)

import

gymnasium.make("HalfCheetah-v5")

Description

This environment is based on the work of P. Wawrzyński in “A Cat-Like Robot Real-Time Learning to Run”. The HalfCheetah is a 2-dimensional robot consisting of 9 body parts and 8 joints connecting them (including two paws). The goal is to apply torque to the joints to make the cheetah run forward (right) as fast as possible, with a positive reward based on the distance moved forward and a negative reward for moving backward. The cheetah’s torso and head are fixed, and torque can only be applied to the other 6 joints over the front and back thighs (which connect to the torso), the shins (which connect to the thighs), and the feet (which connect to the shins).

Action Space

../../../_images/half_cheetah.png

The action space is a Box(-1, 1, (6,), float32). An action represents the torques applied at the hinge joints.

Num

Action

Control Min

Control Max

Name (in corresponding XML file)

Joint

Type (Unit)

0

Torque applied on the back thigh rotor

-1

1

bthigh

hinge

torque (N m)

1

Torque applied on the back shin rotor

-1

1

bshin

hinge

torque (N m)

2

Torque applied on the back foot rotor

-1

1

bfoot

hinge

torque (N m)

3

Torque applied on the front thigh rotor

-1

1

fthigh

hinge

torque (N m)

4

Torque applied on the front shin rotor

-1

1

fshin

hinge

torque (N m)

5

Torque applied on the front foot rotor

-1

1

ffoot

hinge

torque (N m)

Observation Space

The observation space consists of the following parts (in order):

  • qpos (8 elements by default): Position values of the robot’s body parts.

  • qvel (9 elements): The velocities of these individual body parts (their derivatives).

By default, the observation does not include the robot’s x-coordinate (rootx). This can be included by passing exclude_current_positions_from_observation=False during construction. In this case, the observation space will be a Box(-Inf, Inf, (18,), float64), where the first observation element is the x-coordinate of the robot. Regardless of whether exclude_current_positions_from_observation is set to True or False, the x- and y-coordinates are returned in info with the keys "x_position" and "y_position", respectively.

By default, however, the observation space is a Box(-Inf, Inf, (17,), float64) where the elements are as follows:

Num

Observation

Min

Max

Name (in corresponding XML file)

Joint

Type (Unit)

0

z-coordinate of the front tip

-Inf

Inf

rootz

slide

position (m)

1

angle of the front tip

-Inf

Inf

rooty

hinge

angle (rad)

2

angle of the back thigh

-Inf

Inf

bthigh

hinge

angle (rad)

3

angle of the back shin

-Inf

Inf

bshin

hinge

angle (rad)

4

angle of the back foot

-Inf

Inf

bfoot

hinge

angle (rad)

5

angle of the front thigh

-Inf

Inf

fthigh

hinge

angle (rad)

6

angle of the front shin

-Inf

Inf

fshin

hinge

angle (rad)

7

angle of the front foot

-Inf

Inf

ffoot

hinge

angle (rad)

8

velocity of the x-coordinate of front tip

-Inf

Inf

rootx

slide

velocity (m/s)

9

velocity of the z-coordinate of front tip

-Inf

Inf

rootz

slide

velocity (m/s)

10

angular velocity of the front tip

-Inf

Inf

rooty

hinge

angular velocity (rad/s)

11

angular velocity of the back thigh

-Inf

Inf

bthigh

hinge

angular velocity (rad/s)

12

angular velocity of the back shin

-Inf

Inf

bshin

hinge

angular velocity (rad/s)

13

angular velocity of the back foot

-Inf

Inf

bfoot

hinge

angular velocity (rad/s)

14

angular velocity of the front thigh

-Inf

Inf

fthigh

hinge

angular velocity (rad/s)

15

angular velocity of the front shin

-Inf

Inf

fshin

hinge

angular velocity (rad/s)

16

angular velocity of the front foot

-Inf

Inf

ffoot

hinge

angular velocity (rad/s)

excluded

x-coordinate of the front tip

-Inf

Inf

rootx

slide

position (m)

Rewards

The total reward is: reward = forward_reward - ctrl_cost.

  • forward_reward: A reward for moving forward, this reward would be positive if the Half Cheetah moves forward (in the positive \(x\) direction / in the right direction). \(w_{forward} \times \frac{dx}{dt}\), where \(dx\) is the displacement of the “tip” (\(x_{after-action} - x_{before-action}\)), \(dt\) is the time between actions, which depends on the frame_skip parameter (default is \(5\)), and frametime which is \(0.01\) - so the default is \(dt = 5 \times 0.01 = 0.05\), \(w_{forward}\) is the forward_reward_weight (default is \(1\)).

  • ctrl_cost: A negative reward to penalize the Half Cheetah for taking actions that are too large. \(w_{control} \times \|action\|_2^2\), where \(w_{control}\) is ctrl_cost_weight (default is \(0.1\)).

info contains the individual reward terms.

Starting State

The initial position state is \(\mathcal{U}_{[-reset\_noise\_scale \times I_{9}, reset\_noise\_scale \times I_{9}]}\). The initial velocity state is \(\mathcal{N}(0_{9}, reset\_noise\_scale^2 \times I_{9})\).

where \(\mathcal{N}\) is the multivariate normal distribution and \(\mathcal{U}\) is the multivariate uniform continuous distribution.

Episode End

Termination

The Half Cheetah never terminates.

Truncation

The default duration of an episode is 1000 timesteps.

Arguments

HalfCheetah provides a range of parameters to modify the observation space, reward function, initial state, and termination condition. These parameters can be applied during gymnasium.make in the following way:

import gymnasium as gym
env = gym.make('HalfCheetah-v5', ctrl_cost_weight=0.1, ....)

Parameter

Type

Default

Description

xml_file

str

"half_cheetah.xml"

Path to a MuJoCo model

forward_reward_weight

float

1

Weight for forward_reward term (see Rewards section)

ctrl_cost_weight

float

0.1

Weight for ctrl_cost weight (see Rewards section)

reset_noise_scale

float

0.1

Scale of random perturbations of initial position and velocity (see Starting State section)

exclude_current_positions_from_observation

bool

True

Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies (see Observation State section)

Version History

  • v5:

    • Minimum mujoco version is now 2.3.3.

    • Added support for fully custom/third party mujoco models using the xml_file argument (previously only a few changes could be made to the existing models).

    • Added default_camera_config argument, a dictionary for setting the mj_camera properties, mainly useful for custom environments.

    • Added env.observation_structure, a dictionary for specifying the observation space compose (e.g. qpos, qvel), useful for building tooling and wrappers for the MuJoCo environments.

    • Return a non-empty info with reset(), previously an empty dictionary was returned, the new keys are the same state information as step().

    • Added frame_skip argument, used to configure the dt (duration of step()), default varies by environment check environment documentation pages.

    • Restored the xml_file argument (was removed in v4).

    • Renamed info["reward_run"] to info["reward_forward"] to be consistent with the other environments.

  • v4: All MuJoCo environments now use the MuJoCo bindings in mujoco >= 2.1.3.

  • v3: Support for gymnasium.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale, etc. rgb rendering comes from tracking camera (so agent does not run away from screen).

  • v2: All continuous control environments now use mujoco-py >= 1.50.

  • v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.

  • v0: Initial versions release.