In this class, we will focus on perfect information games, without an element of chance (though the text contains some discussion of games of chance).
Max is the computer; he is trying to maximize the score. Min is the adversary; he is trying to minimize the score.
If this were non-adversarial, all Max would have to do is search for a sequence of moves that leads to a terminal state that is a winner, and then go ahead and make the first move. But, since there is an adversary, we have to find a strategy that will optimize the value for one player.
Partial Search Tree for Tic-Tac-Toe (with
the computer=Max=X).
This is the best possible outcome that the player at the top node can expect. The intermediate backed-up values show the expected play that this is based on.
Minimax assumes that its opponent is playing by minimax as well; that is, it does not take into account "traps", or positions in which an imperfect player might make a mistake. Thus minimax might not feel like a very natural player.
Evaluation Functions: If we can't go to the bottom of the tree, we can evaluate a heuristic (the "evaluation function") on intermediate positions. A good evaluation function can be worth several levels of look-ahead.
Q = 9 R = 5 B = 3 Kn = 3 P = 1
Evaluation function - corners are very good, next to corner is bad. Next to corner becomes good, once the corner is taken. Etc.
An evaluation function of this sort can make a mere 3-ply lookahead into a very strong player.
The performance of a game is extremely dependent on the quality of the evaluation function. There is a tradeoff between time costs and accuracy of the evaluation function.
Many game-playing programs use weighted linear evaluation functions:
w1*f1 + w2*f2 + ... + wn*fn
E.g., for chess:
"material" evaluation function:
9*(#WQ - #BQ) + 5*(#WR - #BR) + 3*(#WB - #BB) + 3*(#WK - #BK) + (#WP - #BP)
To construct this type of function, first pick the features, and then
adjust the weights until the program plays well. You can adjust the weights
automatically by having the program play lots of games against itself ("reinforcement
learning").
Perform minimax as above, but instead of evaluating payoff when
the game is done, apply the evaluation function when a set depth has been
reached.
Backup as usual.
Quiescence - if the value of positions are volatile, then a cutoff at a particular point could be misleading. One solution: examine tree just below best apparent move to make sure nothing bad happens. If something bad does happen, we can do partial secondary searches beneath our other alternatives. Too expensive to extend search all another ply, but we can do it selectively.
Horizon effect - there may be a terrible threat a few ply further down
Alpha-beta minimax algorithm:
inputs:
-- node, current node being explored
-- alpha, the best score for max along the path to node
-- beta, the best score for min along the path to node
return: minimax value of node
function maxValue (node,alpha,beta,depth)
-- if cutoffTest(node.state,depth) then
-- -- return h(node.state)
-- for each s in successors(node.state) do
-- -- val = minValue(s,alpha,beta,depth+1)
-- -- alpha := max (alpha,val)
-- -- if alpha >= beta then return beta
-- return alpha
function minValue (node,alpha,beta,depth)
-- if cutoffTest(node.state,depth) then
-- -- return h(node.state)
-- for each s in successors(node.state) do
-- -- val = maxValue(s,alpha,beta,depth+1)
-- -- beta := min(beta,val)
-- -- if beta =< alpha then return alpha
-- return beta
Initial call, from a Max node:
maxValue(state,alpha=-infinity,beta=+infinity,0)
Max is called for max nodes.
Min is called for min nodes.
maxv(state, alpha, beta):
alpha is the best max can do, considering the successors of state that
have been explored so far.
beta is the beta value of state's parent.
minv(state, alpha, beta):
beta is the best min can do, considering the successors of state that
have been explored so far.
alpha is the alpha value of state's parent.