SARSA#

Read Temporal Difference Learning (TD Learning) before you start.

In this project we solve the Frozen Lake task with SARSA (TD learning). We do not need to know then environment dynamics and implementation is much simpler than for Monte Carlo Methods.

Random Starting Positions#

By default Frozen Lake episodes always start in the upper left corner. But would like to start at a new random position in each episode. Looking at the source code we see that the starting position is chosen randomly from all cells labeled S.

Task: Create a Frozen Lake environment with standard 8-by-8 map, extract the map via Env.desc, replace all F (frozen cell) by S, and create a new environment object from the new map.

Solution:

# your solution

SARSA implementation#

Task: Implement the SARSA algorithm with decreasing \(\varepsilon\). After each episode print O or X for goal reached or not (without line breaks to get a visual impression of training progress).

Solution:

# your solution

Task: Visualize the resulting (greedy) policy.

Solution:

# your solution