Note

Ray 2.10.0 introduces the alpha stage of RLlib’s “new API stack”. The Ray Team plans to transition algorithms, example scripts, and documentation to the new code base thereby incrementally replacing the “old API stack” (e.g., ModelV2, Policy, RolloutWorker) throughout the subsequent minor releases leading up to Ray 3.0.

See here for more details on how to use the new API stack.

VectorEnv API#

rllib.env.vector_env.VectorEnv#

class ray.rllib.env.vector_env.VectorEnv(observation_space: gymnasium.Space, action_space: gymnasium.Space, num_envs: int)[source]#

An environment that supports batch evaluation using clones of sub-envs.

__init__(observation_space: gymnasium.Space, action_space: gymnasium.Space, num_envs: int)[source]#

Initializes a VectorEnv instance.

Parameters:

observation_space – The observation Space of a single sub-env.
action_space – The action Space of a single sub-env.
num_envs – The number of clones to make of the given sub-env.

static vectorize_gym_envs(make_env: Callable[[int], Any | gymnasium.Env] | None = None, existing_envs: List[gymnasium.Env] | None = None, num_envs: int = 1, action_space: gymnasium.Space | None = None, observation_space: gymnasium.Space | None = None, restart_failed_sub_environments: bool = False, env_config=None, policy_config=None) → _VectorizedGymEnv[source]#

Translates any given gym.Env(s) into a VectorizedEnv object.

Parameters:

make_env – Factory that produces a new gym.Env taking the sub-env’s vector index as only arg. Must be defined if the number of existing_envs is less than num_envs.
existing_envs – Optional list of already instantiated sub environments.
num_envs – Total number of sub environments in this VectorEnv.
action_space – The action space. If None, use existing_envs[0]’s action space.
observation_space – The observation space. If None, use existing_envs[0]’s observation space.
restart_failed_sub_environments – If True and any sub-environment (within a vectorized env) throws any error during env stepping, the Sampler will try to restart the faulty sub-environment. This is done without disturbing the other (still intact) sub-environment and without the RolloutWorker crashing.

Returns:

The resulting _VectorizedGymEnv object (subclass of VectorEnv).

vector_reset(*, seeds: List[int] | None = None, options: List[dict] | None = None) → Tuple[List[Any], List[dict]][source]#

Resets all sub-environments.

Parameters:

seed – The list of seeds to be passed to the sub-environments’ when resetting them. If None, will not reset any existing PRNGs. If you pass integers, the PRNGs will be reset even if they already exists.
options – The list of options dicts to be passed to the sub-environments’ when resetting them.

Returns:

Tuple consitsing of a list of observations from each environment and a list of info dicts from each environment.

reset_at(index: int | None = None, *, seed: int | None = None, options: dict | None = None) → Tuple[Any, dict] | Exception[source]#

Resets a single sub-environment.

Parameters:

index – An optional sub-env index to reset.
seed – The seed to be passed to the sub-environment at index index when resetting it. If None, will not reset any existing PRNG. If you pass an integer, the PRNG will be reset even if it already exists.
options – An options dict to be passed to the sub-environment at index index when resetting it.

Returns:

Tuple consisting of observations from the reset sub environment and an info dict of the reset sub environment. Alternatively an Exception can be returned, indicating that the reset operation on the sub environment has failed (and why it failed).

restart_at(index: int | None = None) → None[source]#

Restarts a single sub-environment.

Parameters:: index – An optional sub-env index to restart.

vector_step(actions: List[Any]) → Tuple[List[Any], List[float], List[bool], List[bool], List[dict]][source]#

Performs a vectorized step on all sub environments using actions.

Parameters:: actions – List of actions (one for each sub-env).
Returns:: A tuple consisting of 1) New observations for each sub-env. 2) Reward values for each sub-env. 3) Terminated values for each sub-env. 4) Truncated values for each sub-env. 5) Info values for each sub-env.

get_sub_environments() → List[Any | gymnasium.Env][source]#

Returns the underlying sub environments.

Returns:: List of all underlying sub environments.

try_render_at(index: int | None = None) → numpy.ndarray | None[source]#

Renders a single environment.

Parameters:: index – An optional sub-env index to render.
Returns:: Either a numpy RGB image (shape=(w x h x 3) dtype=uint8) or None in case rendering is handled directly by this method.

to_base_env(make_env: Callable[[int], Any | gymnasium.Env] | None = None, num_envs: int = 1, remote_envs: bool = False, remote_env_batch_wait_ms: int = 0, restart_failed_sub_environments: bool = False) → BaseEnv[source]#

Converts an RLlib MultiAgentEnv into a BaseEnv object.

The resulting BaseEnv is always vectorized (contains n sub-environments) to support batched forward passes, where n may also be 1. BaseEnv also supports async execution via the poll and send_actions methods and thus supports external simulators.

Parameters:

make_env – A callable taking an int as input (which indicates the number of individual sub-environments within the final vectorized BaseEnv) and returning one individual sub-environment.
num_envs – The number of sub-environments to create in the resulting (vectorized) BaseEnv. The already existing env will be one of the num_envs.
remote_envs – Whether each sub-env should be a @ray.remote actor. You can set this behavior in your config via the remote_worker_envs=True option.
remote_env_batch_wait_ms – The wait time (in ms) to poll remote sub-environments for, if applicable. Only used if remote_envs is True.

Returns:

The resulting BaseEnv object.

Gym.Env to VectorEnv#

Internally, RLlib uses the following wrapper class to convert your provided gym.Env class into a VectorEnv first. After that, RLlib will convert the resulting objects into a BaseEnv.

class ray.rllib.env.vector_env._VectorizedGymEnv(make_env: Callable[[int], Any | gymnasium.Env] | None = None, existing_envs: List[gymnasium.Env] | None = None, num_envs: int = 1, *, observation_space: gymnasium.Space | None = None, action_space: gymnasium.Space | None = None, restart_failed_sub_environments: bool = False, env_config=None, policy_config=None)[source]#

Internal wrapper to translate any gym.Envs into a VectorEnv object.

__init__(make_env: Callable[[int], Any | gymnasium.Env] | None = None, existing_envs: List[gymnasium.Env] | None = None, num_envs: int = 1, *, observation_space: gymnasium.Space | None = None, action_space: gymnasium.Space | None = None, restart_failed_sub_environments: bool = False, env_config=None, policy_config=None)[source]#

Initializes a _VectorizedGymEnv object.

Parameters:

make_env – Factory that produces a new gym.Env taking the sub-env’s vector index as only arg. Must be defined if the number of existing_envs is less than num_envs.
existing_envs – Optional list of already instantiated sub environments.
num_envs – Total number of sub environments in this VectorEnv.
action_space – The action space. If None, use existing_envs[0]’s action space.
observation_space – The observation space. If None, use existing_envs[0]’s observation space.
restart_failed_sub_environments – If True and any sub-environment (within a vectorized env) throws any error during env stepping, we will try to restart the faulty sub-environment. This is done without disturbing the other (still intact) sub-environments.

vector_reset(*, seeds: List[int] | None = None, options: List[dict] | None = None) → Tuple[List[Any], List[dict]][source]#

Resets all sub-environments.

Parameters:

seed – The list of seeds to be passed to the sub-environments’ when resetting them. If None, will not reset any existing PRNGs. If you pass integers, the PRNGs will be reset even if they already exists.
options – The list of options dicts to be passed to the sub-environments’ when resetting them.

Returns:

Tuple consitsing of a list of observations from each environment and a list of info dicts from each environment.

Resets a single sub-environment.

Parameters:

index – An optional sub-env index to reset.
seed – The seed to be passed to the sub-environment at index index when resetting it. If None, will not reset any existing PRNG. If you pass an integer, the PRNG will be reset even if it already exists.
options – An options dict to be passed to the sub-environment at index index when resetting it.

Returns:

restart_at(index: int | None = None) → None[source]#

Restarts a single sub-environment.

Parameters:: index – An optional sub-env index to restart.

vector_step(actions: List[Any]) → Tuple[List[Any], List[float], List[bool], List[bool], List[dict]][source]#

Performs a vectorized step on all sub environments using actions.

Parameters:: actions – List of actions (one for each sub-env).
Returns:: A tuple consisting of 1) New observations for each sub-env. 2) Reward values for each sub-env. 3) Terminated values for each sub-env. 4) Truncated values for each sub-env. 5) Info values for each sub-env.

get_sub_environments() → List[Any | gymnasium.Env][source]#

Returns the underlying sub environments.

Returns:: List of all underlying sub environments.

try_render_at(index: int | None = None)[source]#

Renders a single environment.

Parameters:: index – An optional sub-env index to render.
Returns:: Either a numpy RGB image (shape=(w x h x 3) dtype=uint8) or None in case rendering is handled directly by this method.