Monte Carlo Methods#

Read Monte Carlo Methods before you start.

In this project we solve the Frozen Lake task with Monte Carlo methods. Thus, in contrast to the Dynamic Programming project we do not need to know then environment dynamics. Instead, the environment will be explored by the agent.

Task: Implement a function episode running an episode of Frozen Lake. Arguments are the env object and a policy (2d NumPy array of probabilities for each state-action pair). The function shall return a list of state-action-reward tuples. Episodes always start in the environment’s default initial state 0.

Solution:

# your solution

Task: Implement the on-policy Monte Carlo method with an \(\varepsilon\)-soft policy. Use incremental updates to the action value estimates. Visualize a greedy policy w.r.t. to the action value estimates.

Solution:

# your solution

Task: Visualize the number of visits per state in a heatmap. Did the agent discover the whole map? What about the policy for states in less discovered regions.

# your solution