Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Action masking for Space.sample() #2906

Merged
merged 26 commits into from
Jun 26, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c390a6b
Allows a new RNG to be generated with seed=-1 and updated env_checker…
pseudo-rnd-thoughts Jun 8, 2022
654476e
Revert "fixed `gym.vector.make` where the checker was being applied i…
pseudo-rnd-thoughts Jun 8, 2022
ea110ad
Merge branch 'openai:master' into master
pseudo-rnd-thoughts Jun 11, 2022
2e5dc9c
Remove bad pushed commits
pseudo-rnd-thoughts Jun 13, 2022
717bc1f
Merge branch 'openai:master' into master
pseudo-rnd-thoughts Jun 13, 2022
7e73d04
Merge branch 'openai:master' into master
pseudo-rnd-thoughts Jun 15, 2022
5743cd9
Merge branch 'openai:master' into master
pseudo-rnd-thoughts Jun 16, 2022
400b1e9
Fixed spelling in core.py
pseudo-rnd-thoughts Jun 17, 2022
4281c76
Pins pytest to the last py 3.6 version
pseudo-rnd-thoughts Jun 17, 2022
5dee690
Add support for action masking in Space.sample(mask=...)
pseudo-rnd-thoughts Jun 17, 2022
bc6ab4a
Fix action mask
pseudo-rnd-thoughts Jun 17, 2022
1700e9d
Fix action_mask
pseudo-rnd-thoughts Jun 17, 2022
7f46df2
Fix action_mask
pseudo-rnd-thoughts Jun 17, 2022
cd91007
Added docstrings, fixed bugs and added taxi examples
pseudo-rnd-thoughts Jun 19, 2022
be4063e
Fixed bugs
pseudo-rnd-thoughts Jun 19, 2022
2f14eb7
Add tests for sample
pseudo-rnd-thoughts Jun 20, 2022
f52d5d5
Add docstrings and test space sample mask Discrete and MultiBinary
pseudo-rnd-thoughts Jun 20, 2022
5e699e1
Add MultiDiscrete sampling and tests
pseudo-rnd-thoughts Jun 21, 2022
634da12
Remove sample mask from graph
pseudo-rnd-thoughts Jun 21, 2022
f85055c
Update gym/spaces/multi_discrete.py
pseudo-rnd-thoughts Jun 23, 2022
4a4b166
Updates based on Marcus28 and jjshoots for Graph.py
pseudo-rnd-thoughts Jun 23, 2022
eb63c62
Updates based on Marcus28 and jjshoots for Graph.py
pseudo-rnd-thoughts Jun 23, 2022
8918914
jjshoot review
pseudo-rnd-thoughts Jun 24, 2022
a53f0e7
jjshoot review
pseudo-rnd-thoughts Jun 25, 2022
8e71e46
Update assert check
pseudo-rnd-thoughts Jun 25, 2022
875ab44
Update type hints
pseudo-rnd-thoughts Jun 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Added docstrings, fixed bugs and added taxi examples
  • Loading branch information
pseudo-rnd-thoughts committed Jun 19, 2022
commit cd910072066d16744178d96197068ed4c009ea11
8 changes: 7 additions & 1 deletion gym/envs/toy_text/taxi.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ class TaxiEnv(Env):
- 4: pickup passenger
- 5: drop off passenger

For some cases, taking these actions will have no effect on the state of the agent.
In v0.25.0, ``info["action_mask"]`` contains a numpy.ndarray for each of the action specifying
if the action will change the state.
To sample a modifying action, use ``action = env.action_space.sample(info["action_mask"])``
Or with a Q-value based algorithm ``action = np.argmax(q_values[obs, np.where(info["action_mask"] == 1)[0]])``.

### Observations
There are 500 discrete states since there are 25 taxi positions, 5 possible
locations of the passenger (including the case when the passenger is in the
Expand Down Expand Up @@ -99,7 +105,7 @@ class TaxiEnv(Env):
```

### Version History
* v3: Map Correction + Cleaner Domain Description
* v3: Map Correction + Cleaner Domain Description, v0.25.0 action masking added to the reset and step information
* v2: Disallow Taxi start location = goal location, Update Taxi observations in the rollout, Update Taxi reward threshold.
* v1: Remove (3,2) from locs, add passidx<4 check
* v0: Initial versions release
Expand Down
10 changes: 8 additions & 2 deletions gym/spaces/box.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import numpy as np

import gym.error
from gym import logger
from gym.spaces.space import Space
from gym.utils import seeding
Expand Down Expand Up @@ -146,7 +147,7 @@ def is_bounded(self, manner: str = "both") -> bool:
else:
raise ValueError("manner is not in {'below', 'above', 'both'}")

def sample(self, mask: np.ndarray = None) -> np.ndarray:
def sample(self, mask: None = None) -> np.ndarray:
r"""Generates a single random sample inside the Box.

In creating a sample of the box, each coordinate is sampled (independently) from a distribution
Expand All @@ -157,11 +158,16 @@ def sample(self, mask: np.ndarray = None) -> np.ndarray:
* :math:`(-\infty, b]` : shifted negative exponential distribution
* :math:`(-\infty, \infty)` : normal distribution

Args:
mask: A mask for sampling values from the Box space, currently unsupported.

Returns:
A sampled value from the Box
"""
if mask is not None:
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved
return np.zeros(self.shape, self.dtype)
raise gym.error.Error(
f"Box.sample cannot be provided a mask, actual value: {mask}"
)

high = self.high if self.dtype.kind == "f" else self.high.astype("int64") + 1
sample = np.empty(self.shape)
Expand Down
12 changes: 7 additions & 5 deletions gym/spaces/dict.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
"""Implementation of a space that represents the cartesian product of other spaces as a dictionary."""
from collections import OrderedDict
from collections.abc import Mapping, Sequence
from typing import Dict
from typing import Dict as TypingDict
from typing import Optional, Union

Expand Down Expand Up @@ -138,21 +137,24 @@ def seed(self, seed: Optional[Union[dict, int]] = None) -> list:

return seeds

def sample(self, mask: Dict[str, np.ndarray] = None) -> dict:
def sample(self, mask: Optional[TypingDict[str, np.ndarray]] = None) -> dict:
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved
"""Generates a single random sample from this space.

The sample is an ordered dictionary of independent samples from the constituent spaces.

Args:
mask: An optional mask for each of the subspaces, expects the same keys as the space

Returns:
A dictionary with the same key and sampled values from :attr:`self.spaces`
"""
if mask is not None:
assert isinstance(
mask, dict
), f"Expects mask to be a dict, actual type: {type(dict)}"
), f"Expects mask to be a dict, actual type: {type(mask)}"
assert (
mask.keys == self.keys()
), f"Expect mask keys to be same as space keys, mask keys: {mask.keys()}, space keys: {self.keys()}"
mask.keys() == self.spaces.keys()
), f"Expect mask keys to be same as space keys, mask keys: {mask.keys()}, space keys: {self.spaces.keys()}"
return OrderedDict(
[(k, space.sample(mask[k])) for k, space in self.spaces.items()]
)
Expand Down
26 changes: 19 additions & 7 deletions gym/spaces/discrete.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,20 +40,32 @@ def __init__(
self.start = int(start)
super().__init__((), np.int64, seed)

def sample(self, mask: np.ndarray = None) -> int:
def sample(self, mask: Optional[np.ndarray] = None) -> int:
"""Generates a single random sample from this space.

A sample will be chosen uniformly at random.
A sample will be chosen uniformly at random with the mask if provided

Args:
mask: An optional mask for if an action can be selected. Expected shape is (n,). If not possible actions, will default to `space.start`
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved

Returns:
A sampled integer from the space
"""
if mask is not None:
assert isinstance(mask, np.ndarray)
assert mask.dtype == np.int8
assert mask.shape == (self.n,)
if np.any(mask):
return int(self.start + self.np_random.choice(np.where(mask)))
assert isinstance(
mask, np.ndarray
), f"The expected type of the mask is np.ndarray, actual type: {type(mask)}"
assert (
mask.dtype == np.int8
), f"The expected dtype of the mask is np.int8, actual dtype: {mask.dtype}"
assert mask.shape == (
self.n,
), f"The expected shape of the mask is {(self.n,)}, actual shape: {mask.shape}"
assert np.all(
np.logical_or(mask == 0, mask == 1)
), f"All values of a mask should be 0 or 1, actual values: {mask}"
if np.any(mask == 1):
return int(self.start + self.np_random.choice(np.where(mask)[0]))
else:
return self.start

Expand Down
34 changes: 18 additions & 16 deletions gym/spaces/graph.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
"""Implementation of a space that represents graph information where nodes and edges can be represented with euclidean space."""
from collections import namedtuple
from typing import NamedTuple, Optional, Sequence, Union
from typing import NamedTuple, Optional, Sequence, Tuple, Union

import numpy as np

import gym
from gym.spaces.box import Box
from gym.spaces.discrete import Discrete
from gym.spaces.multi_discrete import MultiDiscrete
Expand Down Expand Up @@ -93,23 +92,18 @@ def _generate_sample_space(
f"Only Box and Discrete can be accepted as a base_space, got {type(base_space)}, you should not have gotten this error."
)

def _sample_sample_space(self, sample_space) -> Optional[np.ndarray]:
if sample_space is not None:
return sample_space.sample()
else:
return None

def sample(self, mask=None) -> NamedTuple:
def sample(
self, mask: Optional[Tuple[Optional[np.ndarray], Optional[np.ndarray]]] = None
) -> NamedTuple:
"""Generates a single sample graph with num_nodes between 1 and 10 sampled from the Graph.

Args:
mask: An optional tuple for the node space mask and the edge space mask (only valid for Discrete spaces)
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved

Returns:
A NamedTuple representing a graph with attributes .nodes, .edges, and .edge_links.
"""
if mask is not None:
raise gym.error.Error(
"Action masking for graphs are not supported at this time, please raise an issue on github."
)

node_mask, edge_mask = mask if mask is not None else (None, None)
num_nodes = self.np_random.integers(low=1, high=10)

# we only have edges when we have at least 2 nodes
Expand All @@ -121,8 +115,16 @@ def sample(self, mask=None) -> NamedTuple:
node_sample_space = self._generate_sample_space(self.node_space, num_nodes)
edge_sample_space = self._generate_sample_space(self.edge_space, num_edges)

sampled_nodes = self._sample_sample_space(node_sample_space)
sampled_edges = self._sample_sample_space(edge_sample_space)
sampled_nodes = (
node_sample_space.sample(node_mask)
if node_sample_space is not None
else None
)
sampled_edges = (
edge_sample_space.sample(edge_mask)
if edge_sample_space is not None
else None
)

sampled_edge_links = None
if sampled_edges is not None and num_edges > 0:
Expand Down
21 changes: 17 additions & 4 deletions gym/spaces/multi_binary.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,18 +51,31 @@ def shape(self) -> Tuple[int, ...]:
"""Has stricter type than gym.Space - never None."""
return self._shape # type: ignore

def sample(self, mask: np.ndarray = None) -> np.ndarray:
def sample(self, mask: Optional[np.ndarray] = None) -> np.ndarray:
"""Generates a single random sample from this space.

A sample is drawn by independent, fair coin tosses (one toss per binary variable of the space).

Args:
mask: An optional np.ndarray to mask samples, where mask == 0 will have samples == 0
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved

Returns:
Sampled values from space
"""
if mask is not None:
assert isinstance(mask, np.ndarray)
assert mask.dtype == np.int8
assert mask.shape == self.shape
assert isinstance(
mask, np.ndarray
), f"The expected type of the mask is np.ndarray, actual type: {type(mask)}"
assert (
mask.dtype == np.int8
), f"The expected dtype of the mask is np.int8, actual dtype: {mask.dtype}"
assert (
mask.shape == self.shape
), f"The expected shape of the mask is {self.shape}, actual shape: {mask.shape}"
assert np.all(
np.logical_or(mask == 0, mask == 1)
), f"All values of a mask should be 0 or 1, actual values: {mask}"

return mask * self.np_random.integers(0, 2, self.n, self.dtype)

return self.np_random.integers(low=0, high=2, size=self.n, dtype=self.dtype)
Expand Down
51 changes: 33 additions & 18 deletions gym/spaces/multi_discrete.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,17 @@ class MultiDiscrete(Space[np.ndarray]):
2. Button A: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1
3. Button B: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1

It can be initialized as ``MultiDiscrete([ 5, 2, 2 ])``
It can be initialized as ``MultiDiscrete([ 5, 2, 2 ])`` such that a sample is array([3, 1, 0])
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved

Although this feature is rarely used, :class:`MultiDiscrete` spaces may also have several axes
if ``nvec`` has several axes:

Example::

>> d = MultiDiscrete(np.array([[1, 2], [3, 4]]))
>> d.sample()
array([[0, 0],
[2, 3]])
"""

def __init__(
Expand All @@ -37,16 +46,6 @@ def __init__(

The argument ``nvec`` will determine the number of values each categorical variable can take.

Although this feature is rarely used, :class:`MultiDiscrete` spaces may also have several axes
if ``nvec`` has several axes:

Example::

>> d = MultiDiscrete(np.array([[1, 2], [3, 4]]))
>> d.sample()
array([[0, 0],
[2, 3]])

Args:
nvec: vector of counts of each categorical variable. This will usually be a list of integers. However,
you may also pass a more complicated numpy array if you'd like the space to have several axes.
Expand All @@ -63,14 +62,30 @@ def shape(self) -> Tuple[int, ...]:
"""Has stricter type than :class:`gym.Space` - never None."""
return self._shape # type: ignore

def sample(self, mask: np.ndarray = None) -> np.ndarray:
"""Generates a single random sample this space."""
if mask is not None:
assert isinstance(mask, np.ndarray)
assert mask.dtype == np.int8
assert mask.shape == self.shape
def sample(self, mask: Optional[np.ndarray] = None) -> np.ndarray:
"""Generates a single random sample this space.

Args:
mask: An optional mask for multi-discrete, expected shape is `space.nvec`. If there are no possible actions, defaults to 0
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved

multi_mask = [np.where(row) for row in mask]
Returns:
An np.ndarray of shape `space.shape`
"""
if mask is not None:
assert isinstance(
mask, np.ndarray
), f"The expected type of the mask is np.ndarray, actual type: {type(mask)}"
assert (
mask.dtype == np.int8
), f"The expected dtype of the mask is np.int8, actual dtype: {mask.dtype}"
assert np.all(
mask.shape == self.nvec
), f"The expected shape of the mask is {self.nvec}, actual shape: {mask.shape}. We don't support multi-axis nvec currently."
assert np.all(
np.logical_or(mask == 0, mask == 1)
), f"All values of a mask should be 0 or 1, actual values: {mask}"

multi_mask = [np.where(row)[0] for row in mask]
return np.array(
[
self.np_random.choice(row_mask) if len(row_mask) > 0 else 0
Expand Down
14 changes: 12 additions & 2 deletions gym/spaces/space.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Implementation of the `Space` metaclass."""

from typing import (
Any,
Generic,
Iterable,
List,
Expand Down Expand Up @@ -81,8 +82,17 @@ def shape(self) -> Optional[Tuple[int, ...]]:
"""Return the shape of the space as an immutable property."""
return self._shape

def sample(self, mask=None) -> T_cov:
"""Randomly sample an element of this space. Can be uniform or non-uniform sampling based on boundedness of space."""
def sample(self, mask: Optional[Any] = None) -> T_cov:
"""Randomly sample an element of this space.

Can be uniform or non-uniform sampling based on boundedness of space.

Args:
mask: A mask used for sampling, see Space for implementation details.
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved

Returns:
A sampled actions from the space
"""
raise NotImplementedError

def seed(self, seed: Optional[int] = None) -> list:
Expand Down
14 changes: 11 additions & 3 deletions gym/spaces/tuple.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,17 +72,25 @@ def seed(self, seed: Optional[Union[int, List[int]]] = None) -> list:

return seeds

def sample(self, mask: Tuple[np.ndarray] = None) -> tuple:
def sample(self, mask: Optional[Tuple[Optional[np.ndarray]]] = None) -> tuple:
"""Generates a single random sample inside this space.

This method draws independent samples from the subspaces.

Args:
mask: An optional tuple of optional masks for each of the subspace's samples, expects the same number of masks as spaces

Returns:
Tuple of the subspace's samples
"""
if mask is not None:
assert isinstance(mask, tuple)
assert len(mask) == len(self.spaces)
assert isinstance(
mask, tuple
), f"Expected type of mask is tuple, actual type: {type(mask)}"
assert len(mask) == len(
self.spaces
), f"Expected length of mask is {len(self.spaces)}, actual length: {len(mask)}"

return tuple(
space.sample(mask=sub_mask)
for space, sub_mask in zip(self.spaces, mask)
Expand Down
Loading