# Game Playing (Chapter 5)

Game playing is an AI favorite:
• not initially thought to require large amounts of knowledge
There are many kinds of games we could consider:
• games of imperfect information (e.g., Bridge, Poker; we don't know everything about the position)
• games of chance (e.g., backgammon, craps; we can see the whole board, but we don't know what moves will be legal)
• perfect information games (e.g., checkers, chess, and go; we know everything, and there is no element of chance)
• physical games (football, baseball; decisions are based on capability of performance).
The first three are most amenable to formal treatment.

In this class, we will focus on perfect information games, without an element of chance (though the text contains some discussion of games of chance).

Basic game search formulation made up of three things:
• initial state (initial board and whose move it is)
• successor function
• terminal test (when is the game over, i.e., recognizes "terminal states")
• payoff function or utility function: a numeric value for the outcome of a game. We concentrate on games with three value payoff; i.e., win, lose or draw, {1, -1, 0}. But a hand of hearts or bridge, for example, has a wider range of values.

Complete game tree for a simple problem.

Max is the computer; he is trying to maximize the score.  Min is the adversary; he is trying to minimize the score.

If this were non-adversarial, all Max would have to do is search for a sequence of moves that leads to a terminal state that is a winner, and then go ahead and make the first move. But, since there is an adversary, we have to find a strategy that will optimize the value for one player.

Basic Minimax Algorithm: determine the optimal strategy for Max, to determine his first move
• Expand the entire tree below the root.
• Evaluate the terminal nodes as wins for the minimizer or maximizer.
• Select an unlabeled node, n , all of whose children have been assigned values. If there is no such node, we're done --- return the value assigned to the root.
• If n is a minimizer move, assign it a value that is the minimum of the values of its children. If n is a maximizer move, assign it a value that is the maximum of the values of its children. Return to previous step.
The backed-up value of the root node is the "minimax decision".

This is the best possible outcome that the player at the top node can expect. The intermediate backed-up values show the expected play that this is based on.

Minimax assumes that its opponent is playing by minimax as well; that is, it does not take into account "traps", or positions in which an imperfect player might make a mistake. Thus minimax might not feel like a very natural player.

Imperfect decisions: In real games, we can't do the Perfect Minimax calculation, because the tree is too big. So we need to come up with a good estimate of the backed-up value.

Evaluation Functions: If we can't go to the bottom of the tree, we can evaluate a heuristic (the "evaluation function") on intermediate positions. A good evaluation function can be worth several levels of look-ahead.

For chess:

Q = 9 R = 5 B = 3 Kn = 3 P = 1

Reversi/Othello

Evaluation function - corners are very good, next to corner is bad. Next to corner becomes good, once the corner is taken. Etc.

An evaluation function of this sort can make a mere 3-ply lookahead into a very strong player.

The performance of a game is extremely dependent on the quality of the evaluation function. There is a tradeoff between time costs and accuracy of the evaluation function.

Many game-playing programs use weighted linear evaluation functions:

w1*f1 + w2*f2 + ... + wn*fn

E.g., for chess:

"material" evaluation function:

9*(#WQ - #BQ) + 5*(#WR - #BR) + 3*(#WB - #BB) + 3*(#WK - #BK) + (#WP - #BP)

To construct this type of function, first pick the features, and then adjust the weights until the program plays well. You can adjust the weights automatically by having the program play lots of games against itself ("reinforcement learning").

### Basic Limited Minimax:

Perform minimax as above, but instead of evaluating payoff when the game is done, apply the evaluation function when a set depth has been reached.

Backup as usual.

### Cutting off the search

There are some problems with cutting off the search at a certain depth.

Quiescence - if the value of positions are volatile, then a cutoff at a particular point could be misleading. One solution: examine tree just below best apparent move to make sure nothing bad happens. If something bad does happen, we can do partial secondary searches beneath our other alternatives. Too expensive to extend search all another ply, but we can do it selectively.

Horizon effect - there may be a terrible threat a few ply further down

### Alpha-beta Pruning

The alpha-beta algorithm will find the same backed-up value as basic minimax, only faster.

Alpha-beta minimax algorithm:

inputs:
-- node, current node being explored
-- alpha, the best score for max along the path to node
-- beta, the best score for min along the path to node

return: minimax value of node

function maxValue (node,alpha,beta,depth)
-- if cutoffTest(node.state,depth) then
-- -- return h(node.state)
-- for each s in successors(node.state) do
-- -- val = minValue(s,alpha,beta,depth+1)
-- -- alpha := max (alpha,val)
-- -- if alpha >= beta then return beta
-- return alpha

function minValue (node,alpha,beta,depth)
-- if cutoffTest(node.state,depth) then
-- -- return h(node.state)
-- for each s in successors(node.state) do
-- -- val = maxValue(s,alpha,beta,depth+1)
-- -- beta := min(beta,val)
-- -- if beta =< alpha then return alpha
-- return beta

Initial call, from a Max node:
maxValue(state,alpha=-infinity,beta=+infinity,0)

Max is called for max nodes.
Min is called for min nodes.

maxv(state, alpha, beta):
alpha is the best max can do, considering the successors of state that have been explored so far.
beta is the beta value of state's parent.

minv(state, alpha, beta):
beta is the best min can do, considering the successors of state that have been explored so far.
alpha is the alpha value of state's parent.