Introduction To Reinforcement Learning Chapter 3 Solutions and Notes

 

Chapter 3

Finite Markov Decision

The Agent-Environment Interface

$s$ $a$ $s'$ $p(s'|s,a)$ $r(s,a,s')$
high search high $\alpha$ $r_{search}$
high search low $1 - \alpha$ $r_{search}$
low search high $1 - \beta$ $-3$
low search low $\beta$ $r_{search}$
high wait high $1$ $r_{wait}$
high wait low $0$  
low wait high $0$  
low wait low $1$ $r_{wait}$
low recharge high $1$ $0$
low recharge low $0$
$s$ $a$ $s'$ $r$ $p(s', r:s,a)$
high search high $r_{search}$ $\alpha$
high search low $r_{search}$ $1 - \alpha$
low search high $-3$ $1 - \beta$
low search low $r_{search}$ $\beta$
high wait high $r_{wait}$ $1$
high wait low $0$  
low wait high $0$  
low wait low $r_{wait}$ $1$
low recharge high $0$ $1$
low recharge low $0$

Goals and Rewards

Returns and Episodes