** by Milos Hauskrecht **

** Abstract **

Partially observable Markov decision processes (POMDPs) can be used to model complex control problems that include both action outcome uncertainty and imperfect observability. A control problem within the POMDP framework is expressed as a dynamic optimization problem with a value function that combines costs or rewards from multiple steps. Although the POMDP framework is more expressive than other simpler frameworks, like Markov decision processes (MDP), its associated optimization methods are more demanding computationally and only very small problems can be solved exactly in practice. The thesis focuses on two possible approaches that can be used to solve larger problems: approximation methods and exploitation of additional problem structure.

First, a number of new efficient approximation methods and improvements of existing algorithms are proposed. These include (1) the fast informed bound method based on approximate dynamic programming updates that lead to piecewise linear and convex value functions with a constant number of linear vectors, (2) a grid-based point interpolation method that supports variable grids, (3) an incremental version of the linear vector method that updates value function derivatives, as well as (4) various heuristics for selecting grid-points. The new and existing methods are experimentally tested and compared on a set of three infinite discounted horizon problems of different complexity. The experimental results show that methods that preserve the shape of the value function over updates, such as the newly designed incremental linear vector and fast informed bound methods, tend to outperform other methods on the control performance test.

Second, the thesis presents a number of techniques for exploiting additional structure in the model of complex control problems. These are studied as applied to a medical therapy planning problem - the management of patients with chronic ischemic heart disease. The new extensions proposed include factored and hierarchically structured models that combine the advantages of the POMDP and MDP frameworks and cut down the size and complexity of the information state space.

a technical report version of my PhD thesis

**
Milos Hauskrecht
Planning and control in stochastic domains with
imperfect information.
PhD thesis, MIT-LCS-TR-738, 1997
**

POMDP problems:

- Maze20: maze navigation problem from my AAAI-97 paper
- Maze20B: maze navigation problem with a zero cost absorbing goal state
- Lonnie Chrisman's shuttle problem (from Tony's page)

Supervisor: Peter Szolovits

Readers: