Next: The algorithm using Monte
Up: Large State Space
Previous: Example showing that tied
Approximate Policy Iteration
The general structure is the same as in the Policy Iteration, except the following differences:
- We will not use
,
instead we use
(or
), which is only an approximation of
.
The reasons of using approximations are the architecture that may not be strong enough and the noise caused by the simulations.
- Let
be the greedy policy of
.
We might take
,
which is close to
.
Those two differences are a source for an error.
Figure:
Regular Policy Iteration
![\begin{figure}\psfig{file=Policy.ps,width=4in,clip=}
\end{figure}](img39.gif) |
Figure:
Approximate Policy Iteration
![\begin{figure}\psfig{file=PolicyApp.ps,width=4in,clip=}
\end{figure}](img40.gif) |
Yishay Mansour
2000-01-11