Heuristic Search (Chapter 4)

Take advantage of information about the problem

"Heuristic evaluation function", or "h", for short.

A metric on states in the search space; an estimate of the shortest distance to a goal (any goal, not a specific one)

Use it to decide which node to explore next; try to explore promising parts of the state space, and hope to find the goal in a reasonable amount of time.

We will use "h" to refer to this kind of function. (R&N, p 93: "Eval-Fn")

h: state --> numerical estimate of goodness (the lower the better)

h(goal) = 0 for all nodes whose states are goals

In the general search code (treesearch and graphsearch in search.py):

When h is used, the fringe is maintained as a priority queue (rather than, e.g., a FIFO or LIFO queue)

Function h is used to determine the ordering of nodes on the priority queue (there are a couple variations, depending on the search algorithm)

High-Level Issues To Keep in Mind

Recall our evaluation criteria for search:

optimality (guaranteed to find a least-cost goal, i.e., guaranteed to find a path to a goal such that there is no lower-cost path to a goal)

It may be worth the savings in time and space not to try for the best solution, but be satisfied with any solution.

Simon: people are ``satisficers'': people often don't seek optimal solutions. Often, as soon as they find a solution that meets certain requirements, they stop. (Choosing a line at the grocery store; finding a parking space).

As we will see, if the heuristic satisfies a certain property ("admissibility"), then Astar search is optimal. So, some heuristic searches are optimal.

An issue to keep in mind: It is important to distinguish between the correctness of the h-function, and the optimality of the search. An h-function may be quite useful (and may be used in an optimal search strategy) even if it is sometimes wrong!

completeness (guaranteed to find a solution if there is one)

time complexity (how many nodes are generated before a solution is found).

**Reducing the amount of state space searched is the main thing we hope to achieve by using a heuristic**

After all, we already *have* complete, optimal searches that do not use heuristics (breadth-first-search and iterative-deepening-search).

space complexity (the maximum number of nodes that must be stored on the queue/fringe)

Search strategies differ from one another along a number of dimensions. Examples:

Basic search strategy: depth-first, breadth-first, least-actual-cost (g values), best-first (h values), or a mixture?

Is the algorithm iterative, starting by looking at a small part of the state space, and then successively looking at larger parts of it?

Does it take account of cycles in the state space? (our treesearch versus graphsearch in search.py)

Does it keep information so as to allow complete backtracking if needed? Or, does it irrevocably prune parts of the tree, so that it can never return to the pruned parts?

Does it only ``look ahead'' toward the goal, or does it also consider how far it has come so far?

Best-First Search

Nodes are ordered on the priority queue in increasing order by the h values of their states.

What's wrong with the name BEST-first-search?

In contrast to breadth-first-search and depth-first-search, best-first-search may switch its strategy mid-search. For example, best-first-search may go depth-first for awhile, but return to the shallow parts of the tree if those states start to look better.

A* search:

Nodes are ordered on the priority queue in increasing order by their f values.

f(n) = h(n) + g(n)

g(n): actual cost from start to node n

h(n): estimated distance from n.state to a goal

Using the g value gives the search a breadth-first flavor.

Even if h continuously returns good values for states along a path, if no goal is reached, g will eventually dominate h and force backtracking to more shallow nodes.

A* with an admissible heuristic is optimal .

That is, the first goal node found must be optimal.

Important terminology note: In some work, a search process cannot be called Astar unless h is admissible. That is, h being admissible is part of the definition of what Astar is.

Assume by way of contradiction that the first goal node found is not optimal. We will show this is not possible.

Let fg be the first goal node found. By our assumption, fg is non-optimal. Let os be an optimal goal node.

A. g(fg) > g(os) (by our assumptions).

There are two cases to consider:

Case 1: Both os and fg have been generated and placed on the queue.

By our assumption that fg is found before os, fg is before os on the queue, and thus

B. f(fg) <= f(os).

Because os and fg are goals, their h values are 0. Thus:

C. f(os) = g(os), and f(fg) = g(fg).

From (B.) and (C.):

D. g(fg) <= g(os).

But (D.) is a contradiction with (A.)

Case 2: fg is generated and placed on the front of the queue before os is even generated.

To see this isn't possible, consider the step in which fg is generated (when a parent of fg is the first element on the queue).

fg could not possibly be placed first on the queue:

f(fg) = g(fg), because h(fg) = 0. Thus, if fg is first on the queue:

E. g(fg) < f(q) for all nodes q on the queue

There is some ancestor of os on the queue, say Aos (clearly, if there is a least-cost path to os, there is a path to os from the root! So there is some descendent of the root that is an ancestor of os on the queue.)

Because h is admissible:

F. f(Aos) <= g(os).

By (E.) and (F.): g(fg) < f(Aos) <= g(os).

Thus, we reach a contradiction with (A.): g(fg) > g(os).

A* with an admissible heuristic is complete

Suppose that there exists a goal, and there are cycles in the state space. Could A* get caught in an infinite loop?

Nope: eventually, the g-vals of the nodes in the cycle will be larger values than the g-val of the goal node, so they will be placed after the goal node on the queue.

The big problem is space. Just as breadth-first search adds levels, A* adds "f-contours" of nodes. A* will expand all nodes with f-value i before expanding any nodes with f-value i+1. Space complexity is exponential.

Therefore: A* is workable only if your heuristic is good or the problem is simple enough to be solved with blind search. But if the latter, don't bother to use heuristic search.

How are A*, breadth-first-search, and uniform-cost-search related to each other?

Example heuristics for the 8-puzzle

A. The number of tiles out of place

B. The sum of all the distances by which the tiles are out of place.

C. N * the number of direct tile reversals (for some constant N)

D. B + C

Which of A-D are admissible?

--> (A, B)

Consider A* with h(n) = zero(n) = 0 for all n. What is this algorithm?

Just uniform-cost search, which of course is guaranteed to find an optimal solution.

Clearly, A and B are better than zero(n)

For two admissible heuristic function h1 and h2, if h1(n) <= h2(n) for all n, h2 is more informed than h1 (h2 dominates h1) and ...

fewer nodes will be expanded, on average, with h2 than with h1.

Why?

Let f1 = h1 + g and f2 = h2 + g.

Since f1(n) <= f2(n) for all nodes n, any nodes expanded using h2 would also be expanded using h1.

But additional nodes may be expanded using h1.

For intuition: f1(n) is closer to g(n) than f2(n) is to g(n). Thus, more shallow nodes will look better (have lower f values) when using h1 instead of h2. The g-value is a more significant component of the f-value when using h1 instead of h2.

The larger h-values make the search venture deeper into the search space.

The larger the values the better (as long as still admissible).

From worst to best:
--0 for all nodes
--number of tiles out of place
--sum of distances of tiles out of place

Extensions of A*

This algorithm is essentially iterative deepening. The difference is that f values are used rather than depth values. Each iteration expands all nodes within a particular f-value range. In the worst case, IDA requires O(bd) storage, where b is the branching factor, and d is the length of the optimal solution path (assuming unit-cost operators).

The number of iterations grows as the number of possible f values grows.

The next f-limit will be the minimum one found that is greater than the current one.

IDA* (start)
f-limit <-- f(start)
loop
-- solution, f-limit <-- DFS-Contour (start, f-limit)
-- if solution is non-nil, return solution
-- if f-limit = infinity, return failure
end loop

DFS-Countour(node, f-limit)
static: next-f, the f-cost limit for the next contour,
initially infinity
-- if f(node) > f-limit then return nil, f(node)
-- if goalp(node) then return node, f-limit
-- for each s in successors(node):
---- solution, new-f <-- DFS-Countour(s,f-limit)
---- if solution is non-nil, then return solution, f-limit
---- if new-f > f-limit:
-------- next-f <-- min(next-f, new-f)
-- return nil, next-f

SMA* (Simplified Memory-Bounded A*): uses whatever memory is available to it, so avoids repeated states as far as its memory allows. Often better than IDA*.

Hill-Climbing Search (gradient ascent, greedy search)

Where does it get its name?

Consider the states laid out on a landscape; the height of any point is the h value of the state at that point.

Idea: move around the landscape trying to find the highest peaks (the optimal solutions).

Algorithm in text

[See picture in lecture]: If you could see, you would go behind A in front of B climb D traverse the plateau and keep going.

But it's foggy...you step in all four directions and take the step that increases altitude the most. Very *local*, but very cheap; sometimes, the best kind of solution to use.

In all of A,B,C, you get stuck:

A. You will go up A and get stuck (no state is an improvement). foothill; local maximum.

B. If you get to D you are stuck (all neighbors have the same h-values). Plateau.

C. Not in picture: Ridge. There is a direction in which we would like to move, because it would get us closer to the goal, but none of the operators takes in that direction. You could oscillate from side to side, making little progress.

``Random-restart hill-climbing:'' a series of hill-climbing searches from randomly generated initial states. Could use a fixed number of iterations, or continue until the results from the searches have not improved for a certain number of iterations.

``Simulated Annealing:'' when you get stuck on a local max, allow some downhill steps to escape the local max.

Beam Search

For problems with many solutions, it may be worthwhile to discard unpromising paths.

Beam search: best-first-search that keeps only a fixed number of states on open.

Search as shining a light on the search space. A* -- light spreads as we go deeper. Beam search: a fixed-width beam.

Add a parameter -- the beam-width -- to best-first search.
truncate the sorted list to the beam-width.

With a beam-width of infinity: best-first-search

With a beam-width of 1: hill-climbing

Yet another search technique: iterative widening (keep running the search with wider and wider beams)