Value Iteration Algorithm Detail
05 Dec 2017Value Iteration Algorithm Detail
Value Iteration Algorithm/Flow
Let me try to express value iteration flow in below in this section.
Initialize $V_0(S)=0$ for all $S$.
Loop until $\left|V_t-V_{t-1}\right|<\varepsilon\cdot\frac{1-\gamma}\gamma$Loop for all $S$
\[V_{t+1}(S)\leftarrow R(S)+\underset A{max}\left[\gamma\cdot\sum_{S'}P(S'\left|S,A\right.)\cdot V_{t}(S')\right]\]Such flow comes with below features:
➀$V_{t}$ converges to some optimized value, say $V^{*}$.
➁no need to keep $V_{t+1}$ v.s. $V_{t}$, since it just converges.
➂asynchronous(can do random state updates).
➃we want $\left|V_t-V^\ast\right|=\underset{S’}{max}\left|V_t-V^\ast\right|<\varepsilon$,
then, the whole case is below $\varepsilon\cdot\frac{1-\gamma}\gamma$, I’d like to show you ➃proof:
begin from $\left|V_t-V_{t+1}\right|=\gamma\cdot\left|V_{t-1}-V_t\right|$
…this holds W.R.T the above value iterationnext, focus on $\varepsilon\cdot(1-\gamma)$, where $V_t=\gamma\cdot V_{t-1}$
…the infinite horizon expected discounted rewardtherefore, $V_{t-1}-V_t=V_{t-1}\cdot(1-\gamma)$
\(\begin{array}{l}\therefore\left\|V_t-V_{t+1}\right\|\\=\gamma\cdot\left\|V_{t-1}-V_t\right\|<unknown\\=\gamma\cdot V_{t-1}\cdot(1-\gamma)<unknown\end{array}\)next, express $unknown$ in terms of $\varepsilon$ and $1-\gamma$, this should hold, where $\varepsilon$ is a small and unknown coefficient by design.
next, take $unknown=\varepsilon\cdot(1-\gamma)$, because $V_{t-1}-V_t$could be expressed with $(1-\gamma)$ term in it. Then, the whole inequality becomes:
\(\begin{array}{l}\gamma\cdot V_{t-1}\cdot(1-\gamma)<unknown\\\Rightarrow\gamma\cdot V_{t-1}\cdot(1-\gamma)<\varepsilon\cdot(1-\gamma)\\\Rightarrow\gamma\cdot\left\|V_{t-1}-V_t\right\|<\varepsilon\cdot(1-\gamma)\\\Rightarrow\left\|V_{t-1}-V_t\right\|<\varepsilon\cdot\frac{1-\gamma}\gamma\\\Rightarrow\left\|V_{t+1}-V_t\right\|<\varepsilon\cdot\frac{1-\gamma}\gamma\end{array}\)
as $t$ increases, we will have $V_{t-1}\approx V_t\approx V_{t+1}$.