Name		Name	Last commit message	Last commit date
parent directory ..
answers		answers
README.md		README.md
hw2_instructions.pdf		hw2_instructions.pdf
hw2_instructions.tex		hw2_instructions.tex
logz.py		logz.py
lunar_lander.py		lunar_lander.py
plot.py		plot.py
requirements.txt		requirements.txt
train_pg_f18.py		train_pg_f18.py

README.md

CS294-112 HW 2: Policy Gradient

Dependencies:

Python 3.5
Numpy version 1.14.5
TensorFlow version 1.10.5
MuJoCo version 1.50 and mujoco-py 1.50.1.56
OpenAI Gym version 0.10.5
seaborn
Box2D==2.3.2

Before doing anything, first replace gym/envs/box2d/lunar_lander.py with the provided lunar_lander.py file.

The only file that you need to look at is train_pg_f18.py, which you will implement.

See the HW2 PDF for further instructions.

Debugging

Error AttributeError: module '_Box2D' has no attribute 'RAND_LIMIT_swigconstant', run this:

pip3 install box2d box2d-kengz