Deep Q-Learning#

Read Approximate Value Function Methods and also have a look at The Cart Pole Environment project before you start.

Task: Implement deep Q-learning for the cart pole environment. Represent value function estimates by a two-layer ANN with 30 neurons per layer. Train about some 100 episodes and print obtained return after each episode.

Modify the algorithm given in Approximate Value Function Methods as follows:

  • Use TensorFlow’s Adam optimizer instead of manually implementing the ANN weight update.

  • Start with \(\varepsilon=1\) and decrease \(\varepsilon\) slightly after each episode. Take care that \(\varepsilon\) cannot become arbitrarily small.

Note that the cart pole environment always yields reward 1, even if the pole fell down. Maybe it’s better to use reward 0 in that case.

Solution:

# your solution

Task: After sufficiently long training run an episode and render the agent’s behavior.

Solution:

# your solution