Markov decision processes

Markov decision processes (MDPs) offer an elegant mathematical framework for representing planning and decision problems in the presence of uncertainty. However, a simple textbook MDP uses discrete state, discrete time and it does not consider structure when modeling the process dynamics. Such a representation is very limited in its scope to represent real-world domains, which are often factorized, include continuous quantities (such as, temperature, speed, position, etc.) and/or imperfect observations. The aim of our research is (1) to devise MDP models that offer more natural representations of complex real-world decision problems, and (2) to develop algorithmic solutions that let us solve these problems more efficiently.

Research projects:

Approximate Linear Programming (ALP) for solving large factored MDPs with continuous and hybrid state components.
Partially observable Markov decision processes and their approximations.
Hierarchical decomposition methods.
Applications of MDPs to medicine, investments, agent navigation, traffic flow optimizations.

Our most recent MDP research work focused on the development of Approximate Linear Programming (ALP) methods for solving large factored MDP with continuous or hybrid state and action spaces. We experimentally showed that we can solve large temporal optimization problems with high-dimensional state-spaces and outperform existing discretization approaches, both in terms of result quality and the efficiency of computation.

CS team members:

Milos Hauskrecht, PhD, Professor of Computer Science
Branislav Kveton, a former PhD student, now a senior scientist at Google Research
Tomas Singliar, a former PhD student, now a research scientist at Microsoft Research

Publications:

Continuous and hybrid-state MDPs
- Branislav Kveton, Milos Hauskrecht.
  Partitioned Linear Programming Approximations for MDPs
  In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, Helsinki, Finland, July 2008.
- B. Kveton, M. Hauskrecht, C. Guestrin.
  Solving Factored MDPs with Hybrid State and Action Variables.
  Journal of Artificial Intelligence Research . accepted for publication. 2006.
- B. Kveton and M. Hauskrecht.
  Learning Basis Functions in Hybrid Domains.
  Proceedings of 21st National Conference on AI (AAAI-06), Boston, MA, July 2006.
- B. Kveton and M. Hauskrecht.
  Solving Factored MDPs with Exponential-Family Transition Models.
  In Proceedings of the 16th International Conference on Planning and Scheduling, UK, June 2006.
- M. Hauskrecht and B. Kveton.
  Approximate Linear Programming for Solving Hybrid Factored MDPs.
  Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics , Fort Lauderdale, Florida, January 2006.
- B. Kveton and M. Hauskrecht.
  An MCMC Approach to Solving Hybrid Factored MDPs.
  In Proceedings of the 19th International Joint Conference on Artificial Intelligence , Edinburgh, Scotland, August 2005.
- C. Guestrin, M. Hauskrecht, B. Kveton.
  Solving Factored MDPs with Continuous and Discrete Variables.
  Proceedings of the AAAI Workshop on Learning and Planning in Markov Processes - Advances and Challenges, pages 19-24, August 2004.
- C. Guestrin, M. Hauskrecht, B. Kveton.
  Solving Factored MDPs with Continuous and Discrete Variables.
  In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence , pages 235-242, July 2004.
- B. Kveton, M. Hauskrecht.
  Heuristic Refinements of Approximate Linear Programming for Factored Continuous-State Markov Decision Processes.
  In Proceedings of the 14th International Conference on Automated Planning and Scheduling, pages 306-314, June 2004.
- M. Hauskrecht, B. Kveton.
  Linear program approximations for factored continuous-state Markov Decision Processes.
  Advances in Neural Information Processing Systems 16 , pages 895- 902, December 2003.
Partially observable MDPs:
- M. Hauskrecht.
  Value-function approximations for partially observable Markov decision processes .
  Journal of Artificial Intelligence Research, vol.13, pp. 33-94, 2000.
- M. Hauskrecht, H. Fraser.
  Planning treatment of ischemic heart disease with partially observable Markov decision processes.
  Artificial Intelligence in Medicine, vol. 18, pp. 221-244, 2000.
- M. Hauskrecht.
  Planning and control in stochastic domains with imperfect information.
  PhD dissertation, MIT-LCS-TR-738, 1997.
- M. Hauskrecht.
  Incremental methods for computing bounds in partially observable Markov decision processes.
  In Proceedings of the 14-th National Conference on Artificial Intelligence, Providence, RI, pp. 734-739, 1997.
Hierarchical MDPs and decomposition methods
- M. Hauskrecht, N. Meuleau, C. Boutilier, L. Pack Kaelbling, T. Dean.
  Hierarchical solution of Markov decision processes using macro-actions.
  In Proceedings of the 14-th Conference on Uncertainty in Artificial Intelligence, pp. 220-229, 1998.
- N. Meuleau, M. Hauskrecht, K. Kim, L. Peshkin, L. Pack Kaelbling, T. Dean, C. Boutilier.
  Solving very large weakly-coupled Markov decision processes.
  In Proceedings of the 15-th National Conference on Artificial Intelligence, Madison, WI, pp. 165-172, 1998.
- M. Hauskrecht.
  Planning with macro-actions: Effect of initial value function estimate on the convergence rate of value iteration.
  Working paper, 1998.
Applications to medicine and investments
- M. Hauskrecht, L. Ortiz, I. Tsochantaridis, E. Upfal.
  Efficient methods for computing investment strategies for multi-market commodity trading.
  Applied Artificial Intelligence, vol. 15, pp. 429-452, 2001.
- M. Hauskrecht.
  Evaluation and optimization of management plans in stochastic domains with imperfect information.
  In Proceedings of the Twelfth International Workshop on Principles of Diagnosis, pp. 71--78, 2001.
- M. Hauskrecht, G. Pandurangan, E. Upfal.
  Computing near optimal strategies for stochastic investment planning problems.
  In Proceedings of the 16-th International Joint Conference on Artificial Intelligence, pp. 1310-1315, 1999.
- M. Hauskrecht, H. Fraser.
  Modeling Treatment of Ischemic Heart Disease with Partially Observable Markov Decision Processes.
  In Proceedings of American Medical Informatics Association annual symposium on Computer Applications in Health Care, Orlando, Florida, pp. 538-542, 1998.
Semi-Markov models for vehicle routing optimization
- T. Singliar, M. Hauskrecht.
  Approximation strategies for routing in stochastic dynamic networks.
  In Proceedings of the Tenth International Symposium on Artificial Intelligence and Mathematics , Ft. Lauderdale, FL, January 2008.

The web page is updated by milos.

Markov decision processes

Research projects:

CS team members:

Publications:

Continuous and hybrid-state MDPs

Partially observable MDPs:

Hierarchical MDPs and decomposition methods

Applications to medicine and investments

Semi-Markov models for vehicle routing optimization