# Lernmaterialien für Stochastic Optimization Learning an der Universität Graz

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Stochastic Optimization Learning Kurs an der Universität Graz zu.

Characteristics APD
Replace true function with statistical approximation
Move foreward in time
Supervised Learninh, Unsupervised Learning, Reinforcement Learning
Training by correctly labeled data
UL: data driven (clustering)
no solution provided, algorithm finds pattern
RL: decision process, algorithm learns to take actions to macimize reward
Problems involving many states and actions
For small number of states and actions: lookup table
But not realistic - use functions
Learning dimensions
- Model free or model based (model of rewards or transition properties)
- real world or simulator
- active or passive learning (policy given?
- on policy or off policy

Optimal Learning
Exploration vs Exploitation
Best long term strategy may involve sacrifice
Optimal: policy with least number of measurements or lowest sacrifice

Elements of a learning problem
1. how to make measurement?
2. effect of measurement?
3. evaluate result of measurement?
4. offline or online learning?
Nature of measurement decision
- 0/1: stoppung problems
- Z: discrete set if alternatives (ranking and selection)
- R: continuous set (temperature, speed)
- 0/1/0/0/1/0/1: subset selection
Effect of measurement
- Frequentist point of view
- Baysian point of view:
Policies
-Deterministic
- Sequential optimal: Dynamic programming
-Sequential: next measurement depends on knowledge state
— exploration
— exploitation
— epsilon greedy
— interval estimation
— boltzman exploration
Choose measurement that would improve best mean the most
Properties of KG
- optimal decision with one measurement remaining

Exploration vs Exploitation
Value function approximation
Updating Vtn
