ADP & RL

What are Stochastic Approximation algorithms?

Root finding problems that are used when the data is noisy. The function is represented as an expected value

ADP & RL

Explain Monte Carlo PI

In policy evaluation compute the mean (iteratively) instead of the expectation

Perform policy improvement as usual

ADP & RL

What is the motivation for Value Function Approximation?

Curse of dimensionality: There are too many states and actions to store in memory, and it would be too slow to learn the value for each state individually

ADP & RL

What is the motivation for off-policy learning?

Learn about a policy (target policy) from experience sampled from another one (behavior policy)

ADP & RL

Does TD learning work both on VI and PI?

It only works with PI (just sample in policy evaluation step),

doesn’t work with VI (impossible to sample minimization of expectation)

ADP & RL

What is the policy improvement theorem?

The policy improvement step returns either a strictly improved policy or the optimal one

ADP & RL

Name two algorithms based on Monte Carlo Estimation.

LSTD, LSPE

ADP & RL

How does VI with Linear Value Function Approximation work?

minimizes the error of the estimated value function to the optimal one (direct) or to the optimal Bellman equation (indirectly)

ADP & RL

How do you estimate the target policy from the behavior policy?

by importance sampling

ADP & RL

What is expected SARSA?

It used the expectation over different samples from the target policy

ADP & RL

By which law are Monte Carlo methods justified?

The law of large numbers: the mean over a large number of samples is the expected value

ADP & RL

What are the key components of Monte Carlo methods?

– Define a domain of possible inputs

– Generate inputs randomly from a probability distribution over the domain

– Perform a deterministic computation on the inputs

– Aggregate the results

