Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version v0.25.0 release #2949

Merged
merged 28 commits into from
Jul 13, 2022
Merged

Version v0.25.0 release #2949

merged 28 commits into from
Jul 13, 2022

Conversation

pseudo-rnd-thoughts
Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts commented Jul 5, 2022

Release notes

API Changes

  • Step - A majority of deep reinforcement learning algorithm implementations are incorrect due to an important difference in theory and practice as done is not equivalent to termination. As a result, we have modified the step function to return five values, obs, reward, termination, truncation, info. The full theoretical and practical reason (along with example code changes) for these changes will be explained in a soon-to-be-released blog post. The aim for the change to be backward compatible (for now), for issues, please put report the issue on github or the discord. @arjun-kg
  • Render - The render API is changed such that the mode has to be specified during gym.make with the keyword render_mode, after which, the render mode is fixed. For further details see https://younis.dev/blog/2022/render-api/ and Render API #2671. This has the additional changes
    • with render_mode="human" you don't need to call .render(), rendering will happen automatically on env.step()
    • with render_mode="rgb_array", .render() pops the list of frames rendered since the last .reset()
    • with render_mode="single_rgb_array", .render() returns a single frame, like before.
  • Space.sample(mask=...) allows a mask when sampling actions to enable/disable certain actions from being randomly sampled. We recommend developers add this to the info parameter returned by reset(return_info=True) and step. See Added Action masking for Space.sample() #2906 for example implementations of the masks or the individual spaces. We have added an example version of this in the taxi environment. @pseudo-rnd-thoughts
  • Add Graph for environments that use graph style observation or action spaces. Currently, the node and edge spaces can only be Box or Discrete spaces. @jjshoots
  • Add Text space for Reinforcement Learning that involves communication between agents and have dynamic length messages (otherwise MultiDiscrete can be used). @ryanrudes @pseudo-rnd-thoughts

Bug fixes

  • Fixed car racing termination where if the agent finishes the final lap, then the environment ends through truncation not termination. This added a version bump to Car racing to v2 and removed Car racing discrete in favour of gym.make("CarRacing-v2", continuous=False) @araffin
  • In v0.24.0, opencv-python was an accidental requirement for the project. This has been reverted. @KexianShen @pseudo-rnd-thoughts
  • Updated utils.play such that if the environment specifies keys_to_action, the function will automatically use that data. @Markus28
  • When rendering the blackjack environment, fixed bug where rendering would change the dealers top car. @balisujohn
  • Updated mujoco docstring to reflect changes that were accidently overwritten. @Markus28

Misc

  • The whole project is partially type hinted using pyright (none of the project files is ignored by the type hinter). @RedTachyon @pseudo-rnd-thoughts (Future work will add strict type hinting to the core API)
  • Action masking added to the taxi environment (no version bump due to being backwards compatible) @pseudo-rnd-thoughts
  • The Box space shape inference is allows high and low scalars to be automatically set to (1,) shape. Minor changes to identifying scalars. @pseudo-rnd-thoughts
  • Added option support in classic control environment to modify the bounds on the initial random state of the environment @psc-g
  • The RecordVideo wrapper is becoming deprecated with no support for TextEncoder with the new render API. The plan is to replace RecordVideo with a single function that will receive a list of frames from an environment and automatically render them as a video using MoviePy. @johnMinelli
  • The gym py.Dockerfile is optimised from 2Gb to 1.5Gb through a number of optimisations @TheDen

… to fix bug if environment doesn't use np_random in reset
…n the opposite case than was intended to (openai#2871)"

This reverts commit 519dfd9.
@jkterry1 jkterry1 merged commit aeda7eb into openai:master Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants