-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Action masking for Space.sample() #2906
Conversation
… to fix bug if environment doesn't use np_random in reset
…n the opposite case than was intended to (openai#2871)" This reverts commit 519dfd9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A great first pass! Thanks for adding this feature!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments.
I have some additional questions regarding the new Graph
space, not directly related to this PR:
- Why do we only allow
Box
andDiscrete
spaces for node- and edge-features? I guess it's because of_generate_sample_space
but we could easily do without that method. - I think we should change the
sample
method to allow more flexibility (e.g. sampling from a G(n,p) or G(n, M) model, etc.) - In
_generate_sample_space
, why would we expect thebase_space
to be None?
Co-authored-by: Markus Krimmel <montcyril@gmail.com>
Hi, @PseudoRnd#6426 thanks for this PR. LGTM to be merged as preliminary support for action masking. That said, making it work with gym-microrts' action space is tricker than I thought. Here is what the gym-microrts' action space roughly looks like import gym
import numpy as np
height = 16
width = 16
action_space_dims = [6, 4, 4, 4, 4, 7, 7 * 7]
action_plane_space = gym.spaces.MultiDiscrete(action_space_dims)
action_space = np.ones((height, width, len(action_space_dims)))
action_space[:,:,:] = action_space_dims
action_space = gym.spaces.MultiDiscrete(action_space)
print("each unit's action_space has shape", action_plane_space.shape)
print("player's action_space (controlling 256 units at the same time) has shape", action_space.shape)
mask = np.ones((16, 16, sum(action_space_dims)))
print("player's action_space mask has shape", mask.shape)
# each unit's action_space has shape (7,)
# player's action_space (controlling 256 units at the same time) has shape (16, 16, 7)
# player's action_space mask has shape (16, 16, 78) Notice here the (16, 16) is just a batch dimension, which is the reason I could implement mask that way to make things more efficient. This however does mean integrating with gym's action masking implementation is more difficult. To use the current API I think I need to stack a lot of numpy object such as
The gym-microrts use case is definitely quite specialized, so I don't think it's necessary to support it but it's good that we keep this in mind when considering additional support. |
Adds action masking as requested in #2823 to allow spaces to mask certain actions. These masks are the positive case where
1
means that it is possible for the action to be taken and0
for the action to not be possible. For all of the gym environments, this PR adds a parameter insample(mask=...)
with the particular type required being dependent on the space.Box
is a special case where we don't implement masking due to the neural network not being able to provide values for continuous distributions, however, if a good reason is found, this could be added.To the gym Taxi environment, we add a new info key "action_mask" which is the recommended method for using the masking for custom environments.
Example masks