Reinforcement Learning at University Of Zurich | Flashcards & Summaries

Suggested languages for you:

# Lernmaterialien für Reinforcement Learning an der University of Zurich

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Reinforcement Learning Kurs an der University of Zurich zu.

TESTE DEIN WISSEN

value function

Lösung anzeigen
TESTE DEIN WISSEN
• A value function estimates how good in terms of expected return it is to be in a certain state or to perform a certain action in a given state.
• Value functions are parametrized by the policy applied by the agent.
Lösung ausblenden
TESTE DEIN WISSEN

Agent

Lösung anzeigen
TESTE DEIN WISSEN
• Learner and decision maker (excluding sensations and any “internal states”)
• The agent’s goal is to maximise the total reward it receives from the environment
• Deterministically or stochastically selects an action given a certain state of its environment
Lösung ausblenden
TESTE DEIN WISSEN

Environment

Lösung anzeigen
TESTE DEIN WISSEN
• What the agent interacts with
• includes everything “outside” the agent (including sensations and any “internal states”)
• The information about the environment accessible to the agent at time t is encoded in a state variable St
• At each time step, the environment responds to the agent’s action by providing the agent with a reward.
Lösung ausblenden
TESTE DEIN WISSEN

Reward

Lösung anzeigen
TESTE DEIN WISSEN
• Real valued number with negative rewards being interpreted as punishments.
• Given an action a applied in a given state s, the reward can be deterministic, or stochastic.
• A reward is a real valued signal Rt ∈ ℛ ⊂ ℝ passing from the environment to the agent at each time step.
• The agent’s goal is to maximize (a monotonically increasing function of) the total amount of rewards it receives, not the immediate reward.
Lösung ausblenden
TESTE DEIN WISSEN

Lösung anzeigen
TESTE DEIN WISSEN
• Complete specification of an environment
• instance of the RL problem
Lösung ausblenden
TESTE DEIN WISSEN

RL methods aim

Lösung anzeigen
TESTE DEIN WISSEN

RL methods aim to maximize the expected return.

Lösung ausblenden
TESTE DEIN WISSEN

Markov property

Lösung anzeigen
TESTE DEIN WISSEN

A state signal that encodes all relevant information from past interactions with the environment (including past states, actions and received rewards)

Lösung ausblenden
TESTE DEIN WISSEN

Markov Decision Process:

Lösung anzeigen
TESTE DEIN WISSEN

A reinforcement learning task that fulfills the Markov property

Lösung ausblenden
TESTE DEIN WISSEN

The Bellman Equations

Lösung anzeigen
TESTE DEIN WISSEN

Express the recursive properties of value functions

Lösung ausblenden
TESTE DEIN WISSEN

Bandit problems:

Lösung anzeigen
TESTE DEIN WISSEN

Special case of the reinforcement learning problem: single state

Lösung ausblenden
TESTE DEIN WISSEN

Lösung anzeigen
TESTE DEIN WISSEN

Whether it is better to explore or exploit depends on:

• Values and uncertainty of the estimates
• Number of remaining steps.
Lösung ausblenden
TESTE DEIN WISSEN

Balancing exploitation and exploration

Lösung anzeigen
TESTE DEIN WISSEN
• Optimistic initial values method:
• Upper-confidence-bound action selection method:
Lösung ausblenden
• 26911 Karteikarten
• 534 Studierende
• 19 Lernmaterialien

## Beispielhafte Karteikarten für deinen Reinforcement Learning Kurs an der University of Zurich - von Kommilitonen auf StudySmarter erstellt!

Q:

value function

A:
• A value function estimates how good in terms of expected return it is to be in a certain state or to perform a certain action in a given state.
• Value functions are parametrized by the policy applied by the agent.
Q:

Agent

A:
• Learner and decision maker (excluding sensations and any “internal states”)
• The agent’s goal is to maximise the total reward it receives from the environment
• Deterministically or stochastically selects an action given a certain state of its environment
Q:

Environment

A:
• What the agent interacts with
• includes everything “outside” the agent (including sensations and any “internal states”)
• The information about the environment accessible to the agent at time t is encoded in a state variable St
• At each time step, the environment responds to the agent’s action by providing the agent with a reward.
Q:

Reward

A:
• Real valued number with negative rewards being interpreted as punishments.
• Given an action a applied in a given state s, the reward can be deterministic, or stochastic.
• A reward is a real valued signal Rt ∈ ℛ ⊂ ℝ passing from the environment to the agent at each time step.
• The agent’s goal is to maximize (a monotonically increasing function of) the total amount of rewards it receives, not the immediate reward.
Q:

A:
• Complete specification of an environment
• instance of the RL problem
Q:

RL methods aim

A:

RL methods aim to maximize the expected return.

Q:

Markov property

A:

A state signal that encodes all relevant information from past interactions with the environment (including past states, actions and received rewards)

Q:

Markov Decision Process:

A:

A reinforcement learning task that fulfills the Markov property

Q:

The Bellman Equations

A:

Express the recursive properties of value functions

Q:

Bandit problems:

A:

Special case of the reinforcement learning problem: single state

Q:

A:

Whether it is better to explore or exploit depends on:

• Values and uncertainty of the estimates
• Number of remaining steps.
Q:

Balancing exploitation and exploration

A:
• Optimistic initial values method:
• Upper-confidence-bound action selection method:

### Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

## Das sind die beliebtesten Reinforcement Learning Kurse im gesamten StudySmarter Universum

##### 04 Learning & Development

Institut Teknologi Sepuluh Nopember