Game Playing (Chapter 5)

Game playing is an AI favorite: There are many kinds of games we could consider: The first three are most amenable to formal treatment.

In this class, we will focus on perfect information games, without an element of chance (though the text contains some discussion of games of chance). 

Basic game search formulation made up of three things:
Complete game tree for a simple problem.

Max is the computer; he is trying to maximize the score.  Min is the adversary; he is trying to minimize the score.

If this were non-adversarial, all Max would have to do is search for a sequence of moves that leads to a terminal state that is a winner, and then go ahead and make the first move. But, since there is an adversary, we have to find a strategy that will optimize the value for one player.

Partial Search Tree for Tic-Tac-Toe (with the computer=Max=X).

Basic Minimax Algorithm: determine the optimal strategy for Max, to determine his first move The backed-up value of the root node is the "minimax decision".

This is the best possible outcome that the player at the top node can expect. The intermediate backed-up values show the expected play that this is based on.

Example tree.

Minimax assumes that its opponent is playing by minimax as well; that is, it does not take into account "traps", or positions in which an imperfect player might make a mistake. Thus minimax might not feel like a very natural player.

Imperfect decisions: In real games, we can't do the Perfect Minimax calculation, because the tree is too big. So we need to come up with a good estimate of the backed-up value.

Evaluation Functions: If we can't go to the bottom of the tree, we can evaluate a heuristic (the "evaluation function") on intermediate positions. A good evaluation function can be worth several levels of look-ahead.

For chess:

Q = 9 R = 5 B = 3 Kn = 3 P = 1


Evaluation function - corners are very good, next to corner is bad. Next to corner becomes good, once the corner is taken. Etc.

An evaluation function of this sort can make a mere 3-ply lookahead into a very strong player.

The performance of a game is extremely dependent on the quality of the evaluation function. There is a tradeoff between time costs and accuracy of the evaluation function.

Many game-playing programs use weighted linear evaluation functions:

w1*f1 + w2*f2 + ... + wn*fn

E.g., for chess:

"material" evaluation function:

9*(#WQ - #BQ) + 5*(#WR - #BR) + 3*(#WB - #BB) + 3*(#WK - #BK) + (#WP - #BP)

To construct this type of function, first pick the features, and then adjust the weights until the program plays well. You can adjust the weights automatically by having the program play lots of games against itself ("reinforcement learning").

Basic Limited Minimax:

Perform minimax as above, but instead of evaluating payoff when the game is done, apply the evaluation function when a set depth has been reached.

Backup as usual. 

Cutting off the search

There are some problems with cutting off the search at a certain depth.

Quiescence - if the value of positions are volatile, then a cutoff at a particular point could be misleading. One solution: examine tree just below best apparent move to make sure nothing bad happens. If something bad does happen, we can do partial secondary searches beneath our other alternatives. Too expensive to extend search all another ply, but we can do it selectively.

Horizon effect - there may be a terrible threat a few ply further down 

Alpha-beta Pruning

The alpha-beta algorithm will find the same backed-up value as basic minimax, only faster.

Alpha-beta minimax algorithm:

-- node, current node being explored
-- alpha, the best score for max along the path to node
-- beta, the best score for min along the path to node

return: minimax value of node

function maxValue (node,alpha,beta,depth)
-- if cutoffTest(node.state,depth) then
-- -- return h(node.state)
-- for each s in successors(node.state) do
-- -- val = minValue(s,alpha,beta,depth+1)
-- -- alpha := max (alpha,val)
-- -- if alpha >= beta then return beta
-- return alpha

function minValue (node,alpha,beta,depth)
-- if cutoffTest(node.state,depth) then
-- -- return h(node.state)
-- for each s in successors(node.state) do
-- -- val = maxValue(s,alpha,beta,depth+1)
-- -- beta := min(beta,val)
-- -- if beta =< alpha then return alpha
-- return beta

Initial call, from a Max node:

Max is called for max nodes.
Min is called for min nodes.

maxv(state, alpha, beta):
alpha is the best max can do, considering the successors of state that have been explored so far.
beta is the beta value of state's parent.

minv(state, alpha, beta):
beta is the best min can do, considering the successors of state that have been explored so far.
alpha is the alpha value of state's parent.