== Searching ==

Searching (of various kinds) is exceedingly common.

So we care about making it fast.
We've written and tuned a string search function.

Let's examine general search: Return the index of an occurrence of a
given value in a list.  Return some special value if the value does
not occur.

searching.py

def search(L, value):
    '''Return the index of value in L, or len(L) if value doesn't
    exist in L.'''
    i = 0
    while i < len(L) and L[i] != value:
        i += 1
    
    return i

 - example:

L:   [4,3,7,13]

value: 7

len(L) is 4

i 0

 - QU: Why is the first comparison necessary?
		Otherwise would fall off the end if value didn't occur in L

	
 - QU: How does the return value get set to len(L) if value not found?
i is incremented all the way to len(L), which also stops the loop.

trace for value 5

couple of other special cases - testing!

trace for value 4

trace for value 13

trace for L [] - does it work?

Yup!  length is 0!

		
 - QU: Why not use a for-loop with a break?
       Bad style.
		
       Seems easier to write, but harder to reason about (and get
       right)
		
 - QU: How will the run-time of search grow?
		linearly

	means, if the size of the list doubles, the runtime
        doubles.  In the worst case, you need to read through
        the entire list.

- Note:  you can't get better than linear time for search (in the
  worst case)  you need to look at each list element, potentially.
  No way around that.

-  But can we tune this one?

 - Here's a trick that tunes it:

   We can eliminate that first comparison in the
   if statement by adding a /sentinel/:

def sentinel_search(L, value):
    '''Return the index of value in L, or len(L) if value doesn't
    exist in L.'''
    L.append(value)
    i = 0
    while L[i] != value:
        i += 1
    L.pop()
    return i

In the shell - look at L.pop()

we'll put the value at the end of the loop, just temporarily

then, we don't need to need to check if we are done with the list.

the while loop will stop whenever it sees value

if value isn't in the list, i will be incremented to the length 
of the list, and we will return the right value!

L:   [4,3,7,13]

value: 7

len(L) is 4

i 0


trace for value 5

couple of other special cases - testing!

trace for value 4

trace for value 13

trace for L [] - does it work?

Yup!  length is 0!

IMPORTANT:  why do we need to do the pop?
Because lists are mutable!  wouldn't you be surprised
if you called a search function, and it changed your
list?

 - Just how much of a difference does this make time-wise?

Often about twice as fast.  Makes sense since 
     	2 compares + 1 assnt => 1 compare + 1 assnt
	Sometimes 10 times.
	Sometimes it's slower!
		
 - QU: How could it ever be slower?

we have the extra append and pop operations
if the value is at the beginning of the list,
we exit the loop right away.  

compare:
    original - we do one fewer comparison i < len(list)
    new one - we do an append and a pop

in that case, the new one will be slower.    
		
 - Let's compare to the corresponding built-in function in Python:
   list.index

		list.index wins
		
remember how it works - in the shell

SHOW [time_search.py] 

Even faster searching

 - QU: how do you search in real life?  
   Do you always look through every item, in order, from first to last?

		No, we often use a divide and conquer approach!
		
 - IF THE LIST IS SORTED

 - [phone book demo]

we'll end the new material with the most interesting algorithm we
will look at:

   binary search

most of our code has been pretty straightforward/utilitarian.

here is an example where being clever really reduces time/cost.

next time - we'll go over the exam and talk about the final

		
 - This is called /binary/ search:

		Divide half each time
		
		"bi" means two, as in bicycle
		
 - QU: this is much faster worst case - you perform log_2 comparisons,
   because each time you are halving the amount of data you are
   looking at.

def binary_search(L,v):
    """Return the index of the leftmost occurrence of v in list L, or -1 if
    v is not in L."""

    i = 0
    j = len(L) - 1
    while i != j + 1:
        m = (i + j) / 2
        if L[m] == v:
            print m
            return m
        elif L[m] < v:
            i = m + 1
        else:
            j = m - 1

    if 0 <= i < len(L) and L[i] == v:
        return i
    else:
        return -1


the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
the value is 3
m=6 i=0 j=13
m=2 i=0 j=5
m=4 i=3 j=5
m=3 i=3 j=3
3
the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
the value is 10
m=6 i=0 j=13
m=10 i=7 j=13
10
the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
the value is 13
m=6 i=0 j=13
m=10 i=7 j=13
m=12 i=11 j=13
m=13 i=13 j=13
13
the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
the value is 0
m=6 i=0 j=13
m=2 i=0 j=5
m=0 i=0 j=1
0
the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
the value is 111
m=6 i=0 j=13
m=10 i=7 j=13
m=12 i=11 j=13
m=13 i=13 j=13
-1

== Sorting ==

[Props used in this lecture: baby's stacking cups, with post-it notes showing
appropriate numbers on the front in large, dark letters -- the cups have
similar enough sizes that the numbers really help.  Plus a plastic fence to
show the index we're currently at.]

 - We've seen that searching can be significantly better if the list
   is sorted.

 - We may also want a sorted list for other reasons, eg,

- the user may want to see things in sorted order
		
 - So how do we sort a list?

=== Insertion sort ===

 - here is one way

 - Idea: take the next item and put it in the right spot among the 
   already-sorted items.

 - at all times, everything to the left of the fence is sorted.

[67, 3, 56, 54, 3, 8, 2, 33]
 ^ 
i= 0 L[i] = 67
it's in the right place, considering just one item

[67, 3, 56, 54, 3, 8, 2, 33]
    ^ everything to the left of the arrow is sorted**say this over
      and over

now, put the 3 where it goes, to the left

[3, 67, 56, 54, 3, 8, 2, 33]
        ^ put the 56 where it goes to the left of 67

[3, 56, 67, 54, 3, 8, 2, 33]
            ^

[3, 54, 56, 67, 3, 8, 2, 33]
                ^

[3, 3, 54, 56, 67, 8, 2, 33]
                   ^

[3, 3, 8, 54, 56, 67, 2, 33]
                      ^

[2, 3, 3, 8, 54, 56, 67, 33]
                         ^

[2, 3, 3, 8, 33, 54, 56, 67]
                             ^ when arrow gets here, stop

at every point, everything left to the ^ is sorted.

 - this is an "invariant" (it never varies from being true).

 - an invariant has two roles:
	a) it is something we can count on,  
	   and it makes the next iteration of the loop make sense.
	b) it is something we must ensure is true, 
	   by doing something sensible in each iteration.
	
 - How do we count on it here?

It is the reason why we can stop shifting items over as soon as we hit
one that is smaller that the one we're trying to insert.
Everything to the left of THAT is smaller or the same than that

- Also, why we know we are done, when the arrow points
  past the last item
	
 - And does the iteration ensure it will still be true after?
Yes.  It takes a sorted list and adds a new item in the right spot.
	
 - Our reasoning is a bit loose here, but you get the idea.
 For true rigour, we would do a proof!
 (And it would be worth the trouble for a tricky and new algorithm.)

OK: so to code this, we need to:
   find where the current item goes in the list to the left
   move the items over to make room for it.

Here's what we will do:  save the number we are
   inserting.

use the vacated spot to move things up.

[67, 3, 56, 54, 3, 8, 2, 33]
    ^ 
v = 3

move the 67 over one

[67, 67, 56, 54, 3, 8, 2, 33]

then, put the 3 in the vacated spot

[3, 67, 56, 54, 3, 8, 2, 33]
==========================================
[3, 67, 56, 54, 3, 8, 2, 33]
        ^ 
v = 56
[3, 67, 67, 54, 3, 8, 2, 33]

[3, 56, 67, 54, 3, 8, 2, 33]

=====================================
[3, 56, 67, 54, 3, 8, 2, 33]
            ^
v = 54
[3, 56, 67, 67, 3, 8, 2, 33]

[3, 56, 56, 67, 3, 8, 2, 33]

[3, 54, 56, 67, 3, 8, 2, 33]

you know you found the right spot, when?
the item to the left is smaller than you!

[3, 54, 56, 67, 3, 8, 2, 33]
                ^
v = 3

[3, 54, 56, 67, 67, 8, 2, 33]

[3, 54, 56, 56, 67, 8, 2, 33]

[3, 54, 54, 56, 67, 8, 2, 33]

[3, 3, 54, 56, 67, 8, 2, 33]

[3, 3, 54, 56, 67, 8, 2, 33]

and so on.

[3, 3, 54, 56, 67, 8, 2, 33]
                   ^

[3, 3, 8, 54, 56, 67, 2, 33]
                      ^

[2, 3, 3, 8, 54, 56, 67, 33]
                         ^

[2, 3, 3, 8, 33, 54, 56, 67]
                             ^ when arrow gets here, stop

def insert(L, i):
    '''Move L[i] to where it belongs in L[:i], that is,
       the portion of the list to the left of i.'''
    v = L[i]
    while i > 0 and L[i - 1] > v:
        L[i] = L[i - 1]
        i -= 1
    L[i] = v

def insertion_sort(L):
    '''Sort the items in L in non-descending order.'''
    i = 0
    # L[:i] is sorted.
    while i != len(L):
        insert(L, i)
        i += 1
    return L