Introduction To Reinforcement Learning Chapter 2 Solutions and Notes

Chapter 2

Week 1 Notes

A $k$-Armed Bandit Problem

Action-Value Methods

The 10-Armed Testbed

t A R
1 1 -1
2 2 1
3 2 -2
4 2 2
5 3 0

Incremental Implementation

Tracking a Nonstationary Problem

Optimistic Initial Values

Upper-Confidence-Bound Action Selection