ADP & RL at TU München

Flashcards and summaries for ADP & RL at the TU München

Arrow Arrow

It’s completely free

studysmarter schule studium
d

4.5 /5

studysmarter schule studium
d

4.8 /5

studysmarter schule studium
d

4.5 /5

studysmarter schule studium
d

4.8 /5

Study with flashcards and summaries for the course ADP & RL at the TU München

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

What is MDP and how is it defined?

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

What is the principle of optimality for finite horizon problems?

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

How do we ensure the boundedness of the value function for infinite horizon problems?
This was only a preview of our StudySmarter flashcards.
Flascard Icon Flascard Icon

Millions of flashcards created by students

Flascard Icon Flascard Icon

Create your own flashcards as quick as possible

Flascard Icon Flascard Icon

Learning-Assistant with spaced repetition algorithm

Sign up for free!

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

What are the properties of the Bellman operator?

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

When do VI and PI terminate?

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

What is the optimality condition?

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

What are characteristics of contraction mappings?
This was only a preview of our StudySmarter flashcards.
Flascard Icon Flascard Icon

Millions of flashcards created by students

Flascard Icon Flascard Icon

Create your own flashcards as quick as possible

Flascard Icon Flascard Icon

Learning-Assistant with spaced repetition algorithm

Sign up for free!

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

What are the characteristics of the monotonicity property?

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

What is the constant shift property important?

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

How does optimistic PI differ from regular PI?

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

What is one issue of simulation-based PI? And how do you solve it?
This was only a preview of our StudySmarter flashcards.
Flascard Icon Flascard Icon

Millions of flashcards created by students

Flascard Icon Flascard Icon

Create your own flashcards as quick as possible

Flascard Icon Flascard Icon

Learning-Assistant with spaced repetition algorithm

Sign up for free!

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

What are the advantages of Dynamic Programming (as opposed to optimization algorithms)?

Your peers in the course ADP & RL at the TU München create and share summaries, flashcards, study plans and other learning materials with the intelligent StudySmarter learning app.

Get started now!

Flashcard Flashcard

Exemplary flashcards for ADP & RL at the TU München on StudySmarter:

ADP & RL

What is MDP and how is it defined?
Markov decision process, a tuple {S, A, p, g, T} states, actions, transition probabilities, reward function, finite horizon

ADP & RL

What is the principle of optimality for finite horizon problems?
A policy is optimal if and only if all future tail problems are optimal.

ADP & RL

How do we ensure the boundedness of the value function for infinite horizon problems?
add a discount factor \gamma (geometric series -> 1/(1-\gamma)) and make reward function bounded |g(..)| <=M

ADP & RL

What are the properties of the Bellman operator?
- Monotonicity - Constant shift - Contraction

ADP & RL

When do VI and PI terminate?
VI usually requires an infinite number of iterations PI terminates after a finite number of steps (because there is a finite number of policies for a finite number of states)

ADP & RL

What is the optimality condition?
A stationary policy is optimal if and only if it attains the minimum of Bellman's equation

ADP & RL

What are characteristics of contraction mappings?
- They have a unique fixed point J* that satisfies: J*=TJ* - T^k converges to J* for k->inf

ADP & RL

What are the characteristics of the monotonicity property?
It implies the optimality of J* J* = min J_mu

ADP & RL

What is the constant shift property important?
Monotonicity and contraction only hold, if constant shift property holds also relevant for error bounds

ADP & RL

How does optimistic PI differ from regular PI?
The policy evaluation step is different: The value function for the policy gets computed approximately (apply finite number of T^k). Policy iteration stays the same. It converges to the optimal policy much faster

ADP & RL

What is one issue of simulation-based PI? And how do you solve it?
inadequate exploration: generating cost samples using the policy might bias the simulations and underrepresent some states. Two possibilities: - Break down the simulation into multiple short trajectories to have different initial states - artificially induce extra randomization

ADP & RL

What are the advantages of Dynamic Programming (as opposed to optimization algorithms)?
DP divides problems into sub problems and solves each one separately.

Sign up for free to see all flashcards and summaries for ADP & RL at the TU München

Singup Image Singup Image

Adp at

Universität Regensburg

Recht (Alina) at

Fachhochschule Bielefeld

Russ A1.1 at

TU München

R.R at

Universidad Veracruzana

RS at

TU Darmstadt

Similar courses from other universities

Check out courses similar to ADP & RL at other universities

Back to TU München overview page

What is StudySmarter?

What is StudySmarter?

StudySmarter is an intelligent learning tool for students. With StudySmarter you can easily and efficiently create flashcards, summaries, mind maps, study plans and more. Create your own flashcards e.g. for ADP & RL at the TU München or access thousands of learning materials created by your fellow students. Whether at your own university or at other universities. Hundreds of thousands of students use StudySmarter to efficiently prepare for their exams. Available on the Web, Android & iOS. It’s completely free.

Awards

Best EdTech Startup in Europe

Awards
Awards

EUROPEAN YOUTH AWARD IN SMART LEARNING

Awards
Awards

BEST EDTECH STARTUP IN GERMANY

Awards
Awards

Best EdTech Startup in Europe

Awards
Awards

EUROPEAN YOUTH AWARD IN SMART LEARNING

Awards
Awards

BEST EDTECH STARTUP IN GERMANY

Awards
X

StudySmarter - The study app for students

StudySmarter

4.5 Stars 1100 Rating
Start now!
X

Good grades at university? No problem with StudySmarter!

89% of StudySmarter users achieve better grades at university.

50 Mio Flashcards & Summaries
Create your own content with Smart Tools
Individual Learning-Plan

Learn with over 1 million users on StudySmarter.

Already registered? Just go to Login