Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Action masking for Space.sample() #2906

Merged
merged 26 commits into from
Jun 26, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c390a6b
Allows a new RNG to be generated with seed=-1 and updated env_checker…
pseudo-rnd-thoughts Jun 8, 2022
654476e
Revert "fixed `gym.vector.make` where the checker was being applied i…
pseudo-rnd-thoughts Jun 8, 2022
ea110ad
Merge branch 'openai:master' into master
pseudo-rnd-thoughts Jun 11, 2022
2e5dc9c
Remove bad pushed commits
pseudo-rnd-thoughts Jun 13, 2022
717bc1f
Merge branch 'openai:master' into master
pseudo-rnd-thoughts Jun 13, 2022
7e73d04
Merge branch 'openai:master' into master
pseudo-rnd-thoughts Jun 15, 2022
5743cd9
Merge branch 'openai:master' into master
pseudo-rnd-thoughts Jun 16, 2022
400b1e9
Fixed spelling in core.py
pseudo-rnd-thoughts Jun 17, 2022
4281c76
Pins pytest to the last py 3.6 version
pseudo-rnd-thoughts Jun 17, 2022
5dee690
Add support for action masking in Space.sample(mask=...)
pseudo-rnd-thoughts Jun 17, 2022
bc6ab4a
Fix action mask
pseudo-rnd-thoughts Jun 17, 2022
1700e9d
Fix action_mask
pseudo-rnd-thoughts Jun 17, 2022
7f46df2
Fix action_mask
pseudo-rnd-thoughts Jun 17, 2022
cd91007
Added docstrings, fixed bugs and added taxi examples
pseudo-rnd-thoughts Jun 19, 2022
be4063e
Fixed bugs
pseudo-rnd-thoughts Jun 19, 2022
2f14eb7
Add tests for sample
pseudo-rnd-thoughts Jun 20, 2022
f52d5d5
Add docstrings and test space sample mask Discrete and MultiBinary
pseudo-rnd-thoughts Jun 20, 2022
5e699e1
Add MultiDiscrete sampling and tests
pseudo-rnd-thoughts Jun 21, 2022
634da12
Remove sample mask from graph
pseudo-rnd-thoughts Jun 21, 2022
f85055c
Update gym/spaces/multi_discrete.py
pseudo-rnd-thoughts Jun 23, 2022
4a4b166
Updates based on Marcus28 and jjshoots for Graph.py
pseudo-rnd-thoughts Jun 23, 2022
eb63c62
Updates based on Marcus28 and jjshoots for Graph.py
pseudo-rnd-thoughts Jun 23, 2022
8918914
jjshoot review
pseudo-rnd-thoughts Jun 24, 2022
a53f0e7
jjshoot review
pseudo-rnd-thoughts Jun 25, 2022
8e71e46
Update assert check
pseudo-rnd-thoughts Jun 25, 2022
875ab44
Update type hints
pseudo-rnd-thoughts Jun 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Updates based on Marcus28 and jjshoots for Graph.py
  • Loading branch information
pseudo-rnd-thoughts committed Jun 23, 2022
commit 4a4b166fd05bf7794f23d73a0f1953253b745989
12 changes: 7 additions & 5 deletions gym/envs/toy_text/taxi.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,15 @@ class TaxiEnv(Env):

### Info

``step`` and ``reset(return_info=True)`` will return an info dictionary that contains "p" and "action_mask".
``step`` and ``reset(return_info=True)`` will return an info dictionary that contains "p" and "action_mask" containing
the probability that the state is taken and a mask of what actions will result in a change of state to speed up training.

As Taxi is a stochastic environment for transitions then the "p" key represents the probability of the
transition. However, this value is permanently 1.0 for an unknown reason.
As Taxi's initial state is a stochastic, the "p" key represents the probability of the
transition however this value is currently bugged being 1.0, this will be fixed soon.
As the steps are deterministic, "p" represents the probability of the transition which is always 1.0

For some cases, taking these actions will have no effect on the state of the agent.
In v0.25.0, ``info["action_mask"]`` contains a numpy.ndarray for each of the action specifying
For some cases, taking an action will have no effect on the state of the agent.
In v0.25.0, ``info["action_mask"]`` contains a np.ndarray for each of the action specifying
if the action will change the state.

To sample a modifying action, use ``action = env.action_space.sample(info["action_mask"])``
Expand Down
3 changes: 2 additions & 1 deletion gym/spaces/dict.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Implementation of a space that represents the cartesian product of other spaces as a dictionary."""
from collections import OrderedDict
from collections.abc import Mapping, Sequence
from typing import Any
from typing import Dict as TypingDict
from typing import Optional, Union

Expand Down Expand Up @@ -137,7 +138,7 @@ def seed(self, seed: Optional[Union[dict, int]] = None) -> list:

return seeds

def sample(self, mask: Optional[TypingDict[str, np.ndarray]] = None) -> dict:
def sample(self, mask: Optional[TypingDict[str, Any]] = None) -> dict:
"""Generates a single random sample from this space.

The sample is an ordered dictionary of independent samples from the constituent spaces.
Expand Down
5 changes: 3 additions & 2 deletions gym/spaces/discrete.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@ def sample(self, mask: Optional[np.ndarray] = None) -> int:
A sample will be chosen uniformly at random with the mask if provided

Args:
mask: An optional mask for if an action can be selected. Expected shape is (n,).
If there are no possible actions, will default to `space.start`.
mask: An optional mask for if an action can be selected.
Expected `np.ndarray` of shape `(n,)` and dtype `np.int8` where `1` represents valid actions and `0` invalid / infeasible actions.
If there are no possible actions (i.e. `np.all(mask == 0)`) then `space.start` will be returned.

Returns:
A sampled integer from the space
Expand Down
73 changes: 47 additions & 26 deletions gym/spaces/graph.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
"""Implementation of a space that represents graph information where nodes and edges can be represented with euclidean space."""
from collections import namedtuple
from typing import NamedTuple, Optional, Sequence, Union
from typing import NamedTuple, Optional, Sequence, Tuple, Union

import numpy as np

from gym.spaces.box import Box
from gym.spaces.discrete import Discrete
from gym.spaces.multi_discrete import MultiDiscrete
from gym.spaces.multi_discrete import SAMPLE_MASK_TYPE, MultiDiscrete
from gym.spaces.space import Space
from gym.utils import seeding

Expand Down Expand Up @@ -70,58 +70,79 @@ def __init__(

def _generate_sample_space(
self, base_space: Union[None, Box, Discrete], num: int
) -> Optional[Union[Box, Discrete]]:
# the possibility of this space , got {type(base_space)}aving nothing
if num == 0:
) -> Optional[Union[Box, MultiDiscrete]]:
if num == 0 or base_space is None:
return None

if isinstance(base_space, Box):
return Box(
low=np.array(max(1, num) * [base_space.low]),
high=np.array(max(1, num) * [base_space.high]),
shape=(num, *base_space.shape),
shape=(num,) + base_space.shape,
dtype=base_space.dtype,
seed=self._np_random,
seed=self.np_random,
)
elif isinstance(base_space, Discrete):
return MultiDiscrete(nvec=[base_space.n] * num, seed=self._np_random)
elif base_space is None:
return None
return MultiDiscrete(nvec=[base_space.n] * num, seed=self.np_random)
else:
raise AssertionError(
f"Only Box and Discrete can be accepted as a base_space, got {type(base_space)}, you should not have gotten this error."
f"Expects base space to be Box and Discrete, actual space: {type(base_space)}."
)

def sample(self, mask: None = None) -> NamedTuple:
def sample(
self,
num_nodes: int,
num_edges: Optional[int] = None,
mask: Optional[
Tuple[Optional[SAMPLE_MASK_TYPE], Optional[SAMPLE_MASK_TYPE]]
] = None,
) -> NamedTuple:
"""Generates a single sample graph with num_nodes between 1 and 10 sampled from the Graph.

Args:
mask: As the number of nodes to determined during sample, it is not possible to know the mask beforehand.
num_nodes: The number of nodes that will be sampled
num_edges: An optional number of edges, otherwise, a random number between 0 and `num_nodes`^2
mask: An optional tuple of optional node and edge mask that is only possible with Discrete spaces
(Box spaces don't support sample masks).
If no `num_edges` is provided then the `edge_mask` is multiplied by the number of edges

Returns:
A NamedTuple representing a graph with attributes .nodes, .edges, and .edge_links.
"""
if mask is not None:
raise NotImplementedError(
"Graph.sample(mask) is not implemented as the number of nodes is determined within the function."
)
assert (
num_nodes > 0
), f"The number of nodes is expected to be greater than 0, actual value: {num_nodes}"

num_nodes = self.np_random.integers(low=1, high=10)
if mask is not None:
node_space_mask, edge_space_mask = mask
else:
node_space_mask, edge_space_mask = None, None

# we only have edges when we have at least 2 nodes
num_edges = 0
if num_nodes > 1:
# maximal number of edges is (n*n) allowing self connections and two-way is allowed
num_edges = self.np_random.integers(num_nodes * num_nodes)
if num_edges is None:
if num_nodes > 1:
# maximal number of edges is (n*n) allowing self connections and two-way is allowed
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved
num_edges = self.np_random.integers(num_nodes * num_nodes)
pseudo-rnd-thoughts marked this conversation as resolved.
Show resolved Hide resolved
else:
num_edges = 0
edge_space_mask = tuple(edge_space_mask for _ in range(num_edges))
else:
assert (
num_edges >= 0
), f"The number of edges is expected to be greater than 0, actual mask: {num_edges}"

node_sample_space = self._generate_sample_space(self.node_space, num_nodes)
edge_sample_space = self._generate_sample_space(self.edge_space, num_edges)
sampled_node_space = self._generate_sample_space(self.node_space, num_nodes)
sampled_edge_space = self._generate_sample_space(self.edge_space, num_edges)

sampled_nodes = (
node_sample_space.sample() if node_sample_space is not None else None
sampled_node_space.sample(node_space_mask)
if sampled_node_space is not None
else None
)
sampled_edges = (
edge_sample_space.sample() if edge_sample_space is not None else None
sampled_edge_space.sample(edge_space_mask)
if sampled_edge_space is not None
else None
)

sampled_edge_links = None
Expand Down
2 changes: 1 addition & 1 deletion gym/spaces/multi_discrete.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def sample(self, mask: Optional[SAMPLE_MASK_TYPE] = None) -> np.ndarray:
Args:
mask: An optional mask for multi-discrete, expects tuples with a `np.ndarray` mask in the position of each
action with shape `(n,)` where `n` is the number of actions and `dtype=np.int8`.
If there are no possible actions, the default action is 0
Only mask values == 1 are possible to sample unless all mask values for an action are 0 then the default action 0 is sampled.

Returns:
An `np.ndarray` of shape `space.shape`
Expand Down