Planning and control in stochastic domains with imperfect information.

by Milos Hauskrecht

Abstract

Partially observable Markov decision processes (POMDPs) can be used to model complex control problems that include both action outcome uncertainty and imperfect observability. A control problem within the POMDP framework is expressed as a dynamic optimization problem with a value function that combines costs or rewards from multiple steps. Although the POMDP framework is more expressive than other simpler frameworks, like Markov decision processes (MDP), its associated optimization methods are more demanding computationally and only very small problems can be solved exactly in practice. The thesis focuses on two possible approaches that can be used to solve larger problems: approximation methods and exploitation of additional problem structure.

First, a number of new efficient approximation methods and improvements of existing algorithms are proposed. These include (1) the fast informed bound method based on approximate dynamic programming updates that lead to piecewise linear and convex value functions with a constant number of linear vectors, (2) a grid-based point interpolation method that supports variable grids, (3) an incremental version of the linear vector method that updates value function derivatives, as well as (4) various heuristics for selecting grid-points. The new and existing methods are experimentally tested and compared on a set of three infinite discounted horizon problems of different complexity. The experimental results show that methods that preserve the shape of the value function over updates, such as the newly designed incremental linear vector and fast informed bound methods, tend to outperform other methods on the control performance test.

Second, the thesis presents a number of techniques for exploiting additional structure in the model of complex control problems. These are studied as applied to a medical therapy planning problem - the management of patients with chronic ischemic heart disease. The new extensions proposed include factored and hierarchically structured models that combine the advantages of the POMDP and MDP frameworks and cut down the size and complexity of the information state space.


a technical report version of my PhD thesis

Milos Hauskrecht
Planning and control in stochastic domains with imperfect information.
PhD thesis, MIT-LCS-TR-738, 1997


POMDP problems used in the thesis

The following is a list of POMDP problems I used to test my algorithms. The problems are stored in the format proposed by Tony Cassandra.

POMDP problems:

Thesis facts

Thesis defended on August 5, 1997. Submitted to the Department of Electrical Engineering and Computer Science, MIT in August 1997 in partial fulfillment of the requirements for the degree of Doctor of Philosophy.

Supervisor: Peter Szolovits

Readers:



milos 12/21/97