Added Action masking for Space.sample() #2906

pseudo-rnd-thoughts · 2022-06-17T13:21:13Z

Adds action masking as requested in #2823 to allow spaces to mask certain actions. These masks are the positive case where 1 means that it is possible for the action to be taken and 0 for the action to not be possible. For all of the gym environments, this PR adds a parameter in sample(mask=...) with the particular type required being dependent on the space. Box is a special case where we don't implement masking due to the neural network not being able to provide values for continuous distributions, however, if a good reason is found, this could be added.
To the gym Taxi environment, we add a new info key "action_mask" which is the recommended method for using the masking for custom environments.

Example masks

>>> import numpy as np
>>> from gym import spaces

# Box space doesn't have masks

>>> space = spaces.Discrete(4)
>>> space.sample(mask=np.array([0, 1, 1, 1], dtype=np.int8)) 
2
>>> space.sample(mask=np.array([0, 0, 0, 0], dtype=np.int8)) 
0

>>> space = spaces.MultiDiscrete([4, 2])
>>> space.sample(mask=(np.array([0, 1, 0, 1], dtype=np.int8), np.array([0, 0], dtype=np.int8))) 
[1 0]

>>> space = spaces.MultiDiscrete(np.array([[4, 2], [3, 4]]))
>>> space.sample(mask=((np.array([1, 1, 1, 1], dtype=np.int8), np.array([0, 1], dtype=np.int8)), (np.array([0, 0, 0], dtype=np.int8), np.array([1, 1, 0, 0], dtype=np.int8))))  
[[2 1]
 [0 1]]

>>> space = spaces.MultiBinary([2, 3])
>>> space.sample(mask=np.array([[0, 0, 1], [1, 1, 0]], dtype=np.int8))
[[0 0 0]
 [1 1 0]]

# Composite spaces (Dict, Tuple and Graph)
>>> space = spaces.Dict(a=spaces.Discrete(3), b=spaces.Box(0, 1, (1,)))
>>> space.sample(mask={"a": np.array([0, 1, 1], dtype=np.int8), "b": None}))
OrderedDict([('a', 1), ('b', array([0.6812336], dtype=float32))])

>>> space = spaces.Tuple((spaces.Box(0, 1, (1,)), spaces.Discrete(3)))
>>> space.sample(mask=(None, np.array([0, 0, 0], dtype=np.int8)))  
(array([0.74909943], dtype=float32), 0)

>>> space = spaces.Graph(node_space=spaces.Box(0, 1, (1,)), edge_space=spaces.Discrete(3))
>>> space.sample(mask=(None, np.array([0, 1, 1], dtype=np.int8)), num_nodes=4)) 
GraphInstance(nodes=array([[0.5791068 ], [0.43347424], [0.6848027 ], [0.23124644]], dtype=float32), edges=array([2, 2]), edge_links=array([[1, 0], [2, 1]]))
>>> space.sample(mask=(None, (np.array([1, 1, 1], dtype=np.int8), np.array([1, 0, 0], dtype=np.int8), np.array([0, 1, 0], dtype=np.int8), np.array([0, 1, 1], dtype=np.int8))), num_nodes=4, num_edges=4))  
GraphInstance(nodes=array([[0.21213211], [0.4872798 ], [0.69442934], [0.92085034]], dtype=float32), edges=array([2, 0, 1, 2]), edge_links=array([[0, 2], [0, 0],  [3, 2], [2, 0]]))

Add tests for sample mask
Add tests for sample with discrete distributions (Discrete, MultiDiscrete, MultiBinary)
Add test for Box sample
Add docstrings
Add taxi docstrings
Fix taxi action mask

… to fix bug if environment doesn't use np_random in reset

…n the opposite case than was intended to (openai#2871)" This reverts commit 519dfd9.

gym/spaces/box.py

vwxyzjn

A great first pass! Thanks for adding this feature!

gym/core.py

gym/envs/toy_text/taxi.py

gym/spaces/box.py

gym/spaces/discrete.py

gym/spaces/multi_discrete.py

Markus28

I left some comments.

I have some additional questions regarding the new Graph space, not directly related to this PR:

Why do we only allow Box and Discrete spaces for node- and edge-features? I guess it's because of _generate_sample_space but we could easily do without that method.
I think we should change the sample method to allow more flexibility (e.g. sampling from a G(n,p) or G(n, M) model, etc.)
In _generate_sample_space, why would we expect the base_space to be None?

gym/envs/toy_text/taxi.py

gym/spaces/dict.py

gym/spaces/discrete.py

gym/spaces/graph.py

gym/spaces/multi_binary.py

gym/spaces/multi_discrete.py

gym/spaces/space.py

Co-authored-by: Markus Krimmel <montcyril@gmail.com>

gym/spaces/graph.py

vwxyzjn · 2022-06-26T21:23:50Z

Hi, @PseudoRnd#6426 thanks for this PR. LGTM to be merged as preliminary support for action masking.

That said, making it work with gym-microrts' action space is tricker than I thought. Here is what the gym-microrts' action space roughly looks like

import gym
import numpy as np

height = 16
width = 16
action_space_dims = [6, 4, 4, 4, 4, 7, 7 * 7]
action_plane_space = gym.spaces.MultiDiscrete(action_space_dims)
action_space = np.ones((height, width, len(action_space_dims)))
action_space[:,:,:] = action_space_dims
action_space = gym.spaces.MultiDiscrete(action_space)
print("each unit's action_space has shape", action_plane_space.shape)
print("player's action_space (controlling 256 units at the same time) has shape", action_space.shape)
mask = np.ones((16, 16, sum(action_space_dims)))
print("player's action_space mask has shape", mask.shape)

# each unit's action_space has shape (7,)
# player's action_space (controlling 256 units at the same time) has shape (16, 16, 7)
# player's action_space mask has shape (16, 16, 78)

Notice here the (16, 16) is just a batch dimension, which is the reason I could implement mask that way to make things more efficient. This however does mean integrating with gym's action masking implementation is more difficult. To use the current API I think I need to stack a lot of numpy object such as

mask = np.array(
    np.ones(6), np.ones(4), np.ones(4), np.ones(4), np.ones(4),, np.ones(7),, np.ones(49),
    np.ones(6), np.ones(4), np.ones(4), np.ones(4), np.ones(4),, np.ones(7),, np.ones(49),
    np.ones(6), np.ones(4), np.ones(4), np.ones(4), np.ones(4),, np.ones(7),, np.ones(49),
...
)

The gym-microrts use case is definitely quite specialized, so I don't think it's necessary to support it but it's good that we keep this in mind when considering additional support.

pseudo-rnd-thoughts added 11 commits June 8, 2022 17:19

Allows a new RNG to be generated with seed=-1 and updated env_checker…

c390a6b

… to fix bug if environment doesn't use np_random in reset

Revert "fixed gym.vector.make where the checker was being applied i…

654476e

…n the opposite case than was intended to (openai#2871)" This reverts commit 519dfd9.

Merge branch 'openai:master' into master

ea110ad

Remove bad pushed commits

2e5dc9c

Merge branch 'openai:master' into master

717bc1f

Merge branch 'openai:master' into master

7e73d04

Merge branch 'openai:master' into master

5743cd9

Fixed spelling in core.py

400b1e9

Pins pytest to the last py 3.6 version

4281c76

Add support for action masking in Space.sample(mask=...)

5dee690

Fix action mask

bc6ab4a

Markus28 reviewed Jun 17, 2022

View reviewed changes

gym/spaces/box.py Outdated Show resolved Hide resolved

vwxyzjn reviewed Jun 17, 2022

View reviewed changes

gym/core.py Show resolved Hide resolved

gym/envs/toy_text/taxi.py Outdated Show resolved Hide resolved

gym/spaces/box.py Show resolved Hide resolved

gym/spaces/discrete.py Outdated Show resolved Hide resolved

gym/spaces/multi_discrete.py Outdated Show resolved Hide resolved

pseudo-rnd-thoughts added 4 commits June 17, 2022 17:41

Fix action_mask

1700e9d

Fix action_mask

7f46df2

Added docstrings, fixed bugs and added taxi examples

cd91007

Fixed bugs

be4063e

Markus28 reviewed Jun 20, 2022

View reviewed changes

pseudo-rnd-thoughts and others added 7 commits June 20, 2022 14:26

Add tests for sample

2f14eb7

Add docstrings and test space sample mask Discrete and MultiBinary

f52d5d5

Add MultiDiscrete sampling and tests

5e699e1

Remove sample mask from graph

634da12

Update gym/spaces/multi_discrete.py

f85055c

Co-authored-by: Markus Krimmel <montcyril@gmail.com>

Updates based on Marcus28 and jjshoots for Graph.py

4a4b166

Updates based on Marcus28 and jjshoots for Graph.py

eb63c62

jjshoots reviewed Jun 24, 2022

View reviewed changes

gym/spaces/graph.py Outdated Show resolved Hide resolved

jjshoots reviewed Jun 24, 2022

View reviewed changes

gym/spaces/graph.py Outdated Show resolved Hide resolved

jjshoots reviewed Jun 24, 2022

View reviewed changes

gym/spaces/graph.py Outdated Show resolved Hide resolved

pseudo-rnd-thoughts added 2 commits June 24, 2022 18:39

jjshoot review

8918914

jjshoot review

a53f0e7

pseudo-rnd-thoughts added 2 commits June 25, 2022 22:45

Update assert check

8e71e46

Update type hints

875ab44

jkterry1 merged commit 024b0f5 into openai:master Jun 26, 2022

pseudo-rnd-thoughts mentioned this pull request Jul 5, 2022

Version v0.25.0 release #2949

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Action masking for Space.sample() #2906

Added Action masking for Space.sample() #2906

pseudo-rnd-thoughts commented Jun 17, 2022 •

edited

Loading

vwxyzjn left a comment

Markus28 left a comment

vwxyzjn commented Jun 26, 2022 •

edited

Loading

Added Action masking for Space.sample() #2906

Added Action masking for Space.sample() #2906

Conversation

pseudo-rnd-thoughts commented Jun 17, 2022 • edited Loading

Example masks

vwxyzjn left a comment

Choose a reason for hiding this comment

Markus28 left a comment

Choose a reason for hiding this comment

vwxyzjn commented Jun 26, 2022 • edited Loading

pseudo-rnd-thoughts commented Jun 17, 2022 •

edited

Loading

vwxyzjn commented Jun 26, 2022 •

edited

Loading