Your peers in the course ADP & RL at the TU München create and share summaries, flashcards, study plans and other learning materials with the intelligent StudySmarter learning app.

Get started now!

ADP & RL

What are Stochastic Approximation algorithms?

Root finding problems that are used when the data is noisy. The function is represented as an expected value

ADP & RL

Explain Monte Carlo PI

In policy evaluation compute the mean (iteratively) instead of the expectation

Perform policy improvement as usual

Perform policy improvement as usual

ADP & RL

What is the motivation for Value Function Approximation?

Curse of dimensionality: There are too many states and actions to store in memory, and it would be too slow to learn the value for each state individually

ADP & RL

What is the motivation for off-policy learning?

Learn about a policy (target policy) from experience sampled from another one (behavior policy)

ADP & RL

Does TD learning work both on VI and PI?

It only works with PI (just sample in policy evaluation step),

doesn’t work with VI (impossible to sample minimization of expectation)

doesn’t work with VI (impossible to sample minimization of expectation)

ADP & RL

What is the policy improvement theorem?

The policy improvement step returns either a strictly improved policy or the optimal one

ADP & RL

Name two algorithms based on Monte Carlo Estimation.

LSTD, LSPE

ADP & RL

How does VI with Linear Value Function Approximation work?

minimizes the error of the estimated value function to the optimal one (direct) or to the optimal Bellman equation (indirectly)

ADP & RL

How do you estimate the target policy from the behavior policy?

by importance sampling

ADP & RL

What is expected SARSA?

It used the expectation over different samples from the target policy

ADP & RL

By which law are Monte Carlo methods justified?

The law of large numbers: the mean over a large number of samples is the expected value

ADP & RL

What are the key components of Monte Carlo methods?

– Define a domain of possible inputs

– Generate inputs randomly from a probability distribution over the domain

– Perform a deterministic computation on the inputs

– Aggregate the results

– Generate inputs randomly from a probability distribution over the domain

– Perform a deterministic computation on the inputs

– Aggregate the results

For your degree program at the TU München there are already many courses on StudySmarter, waiting for you to join them. Get access to flashcards, summaries, and much more.

Back to TU München overview pageStudySmarter is an intelligent learning tool for students. With StudySmarter you can easily and efficiently create flashcards, summaries, mind maps, study plans and more. Create your own flashcards e.g. for ADP & RL at the TU München or access thousands of learning materials created by your fellow students. Whether at your own university or at other universities. Hundreds of thousands of students use StudySmarter to efficiently prepare for their exams. Available on the Web, Android & iOS. It’s completely free.

Best EdTech Startup in Europe

1## Learning Plan

2## Flashcards

3## Summaries

4## Teamwork

5## Feedback