Value Iteration Algorithm Detail
05 Dec 2017Value Iteration Algorithm Detail
Value Iteration Algorithm/Flow
Let me try to express value iteration flow in below in this section.
Initialize V0(S)=0 for all S.
Loop until |Vt−Vt−1|<ε⋅1−γγLoop for all S
\[V_{t+1}(S)\leftarrow R(S)+\underset A{max}\left[\gamma\cdot\sum_{S'}P(S'\left|S,A\right.)\cdot V_{t}(S')\right]\]Such flow comes with below features:
➀Vt converges to some optimized value, say V∗.
➁no need to keep Vt+1 v.s. Vt, since it just converges.
➂asynchronous(can do random state updates).
➃we want |Vt−V∗|=maxS′|Vt−V∗|<ε,
then, the whole case is below ε⋅1−γγ, I’d like to show you ➃proof:
begin from |Vt−Vt+1|=γ⋅|Vt−1−Vt|
…this holds W.R.T the above value iterationnext, focus on ε⋅(1−γ), where Vt=γ⋅Vt−1
…the infinite horizon expected discounted rewardtherefore, Vt−1−Vt=Vt−1⋅(1−γ)
∴next, express unknown in terms of \varepsilon and 1-\gamma, this should hold, where \varepsilon is a small and unknown coefficient by design.
next, take unknown=\varepsilon\cdot(1-\gamma), because V_{t-1}-V_tcould be expressed with (1-\gamma) term in it. Then, the whole inequality becomes:
\begin{array}{l}\gamma\cdot V_{t-1}\cdot(1-\gamma)<unknown\\\Rightarrow\gamma\cdot V_{t-1}\cdot(1-\gamma)<\varepsilon\cdot(1-\gamma)\\\Rightarrow\gamma\cdot\left\|V_{t-1}-V_t\right\|<\varepsilon\cdot(1-\gamma)\\\Rightarrow\left\|V_{t-1}-V_t\right\|<\varepsilon\cdot\frac{1-\gamma}\gamma\\\Rightarrow\left\|V_{t+1}-V_t\right\|<\varepsilon\cdot\frac{1-\gamma}\gamma\end{array}
as t increases, we will have V_{t-1}\approx V_t\approx V_{t+1}.