== Searching == Searching (of various kinds) is exceedingly common. So we care about making it fast. We've written and tuned a string search function. Let's examine general search: Return the index of an occurrence of a given value in a list. Return some special value if the value does not occur. searching.py def search(L, value): '''Return the index of value in L, or len(L) if value doesn't exist in L.''' i = 0 while i < len(L) and L[i] != value: i += 1 return i - example: L: [4,3,7,13] value: 7 len(L) is 4 i 0 - QU: Why is the first comparison necessary? Otherwise would fall off the end if value didn't occur in L - QU: How does the return value get set to len(L) if value not found? i is incremented all the way to len(L), which also stops the loop. trace for value 5 couple of other special cases - testing! trace for value 4 trace for value 13 trace for L [] - does it work? Yup! length is 0! - QU: Why not use a for-loop with a break? Bad style. Seems easier to write, but harder to reason about (and get right) - QU: How will the run-time of search grow? linearly means, if the size of the list doubles, the runtime doubles. In the worst case, you need to read through the entire list. - Note: you can't get better than linear time for search (in the worst case) you need to look at each list element, potentially. No way around that. - But can we tune this one? - Here's a trick that tunes it: We can eliminate that first comparison in the if statement by adding a /sentinel/: def sentinel_search(L, value): '''Return the index of value in L, or len(L) if value doesn't exist in L.''' L.append(value) i = 0 while L[i] != value: i += 1 L.pop() return i In the shell - look at L.pop() we'll put the value at the end of the loop, just temporarily then, we don't need to need to check if we are done with the list. the while loop will stop whenever it sees value if value isn't in the list, i will be incremented to the length of the list, and we will return the right value! L: [4,3,7,13] value: 7 len(L) is 4 i 0 trace for value 5 couple of other special cases - testing! trace for value 4 trace for value 13 trace for L [] - does it work? Yup! length is 0! IMPORTANT: why do we need to do the pop? Because lists are mutable! wouldn't you be surprised if you called a search function, and it changed your list? - Just how much of a difference does this make time-wise? Often about twice as fast. Makes sense since 2 compares + 1 assnt => 1 compare + 1 assnt Sometimes 10 times. Sometimes it's slower! - QU: How could it ever be slower? we have the extra append and pop operations if the value is at the beginning of the list, we exit the loop right away. compare: original - we do one fewer comparison i < len(list) new one - we do an append and a pop in that case, the new one will be slower. - Let's compare to the corresponding built-in function in Python: list.index list.index wins remember how it works - in the shell SHOW [time_search.py] Even faster searching - QU: how do you search in real life? Do you always look through every item, in order, from first to last? No, we often use a divide and conquer approach! - IF THE LIST IS SORTED - [phone book demo] we'll end the new material with the most interesting algorithm we will look at: binary search most of our code has been pretty straightforward/utilitarian. here is an example where being clever really reduces time/cost. next time - we'll go over the exam and talk about the final - This is called /binary/ search: Divide half each time "bi" means two, as in bicycle - QU: this is much faster worst case - you perform log_2 comparisons, because each time you are halving the amount of data you are looking at. def binary_search(L,v): """Return the index of the leftmost occurrence of v in list L, or -1 if v is not in L.""" i = 0 j = len(L) - 1 while i != j + 1: m = (i + j) / 2 if L[m] == v: print m return m elif L[m] < v: i = m + 1 else: j = m - 1 if 0 <= i < len(L) and L[i] == v: return i else: return -1 the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] the value is 3 m=6 i=0 j=13 m=2 i=0 j=5 m=4 i=3 j=5 m=3 i=3 j=3 3 the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] the value is 10 m=6 i=0 j=13 m=10 i=7 j=13 10 the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] the value is 13 m=6 i=0 j=13 m=10 i=7 j=13 m=12 i=11 j=13 m=13 i=13 j=13 13 the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] the value is 0 m=6 i=0 j=13 m=2 i=0 j=5 m=0 i=0 j=1 0 the list is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] the value is 111 m=6 i=0 j=13 m=10 i=7 j=13 m=12 i=11 j=13 m=13 i=13 j=13 -1 == Sorting == [Props used in this lecture: baby's stacking cups, with post-it notes showing appropriate numbers on the front in large, dark letters -- the cups have similar enough sizes that the numbers really help. Plus a plastic fence to show the index we're currently at.] - We've seen that searching can be significantly better if the list is sorted. - We may also want a sorted list for other reasons, eg, - the user may want to see things in sorted order - So how do we sort a list? === Insertion sort === - here is one way - Idea: take the next item and put it in the right spot among the already-sorted items. - at all times, everything to the left of the fence is sorted. [67, 3, 56, 54, 3, 8, 2, 33] ^ i= 0 L[i] = 67 it's in the right place, considering just one item [67, 3, 56, 54, 3, 8, 2, 33] ^ everything to the left of the arrow is sorted**say this over and over now, put the 3 where it goes, to the left [3, 67, 56, 54, 3, 8, 2, 33] ^ put the 56 where it goes to the left of 67 [3, 56, 67, 54, 3, 8, 2, 33] ^ [3, 54, 56, 67, 3, 8, 2, 33] ^ [3, 3, 54, 56, 67, 8, 2, 33] ^ [3, 3, 8, 54, 56, 67, 2, 33] ^ [2, 3, 3, 8, 54, 56, 67, 33] ^ [2, 3, 3, 8, 33, 54, 56, 67] ^ when arrow gets here, stop at every point, everything left to the ^ is sorted. - this is an "invariant" (it never varies from being true). - an invariant has two roles: a) it is something we can count on, and it makes the next iteration of the loop make sense. b) it is something we must ensure is true, by doing something sensible in each iteration. - How do we count on it here? It is the reason why we can stop shifting items over as soon as we hit one that is smaller that the one we're trying to insert. Everything to the left of THAT is smaller or the same than that - Also, why we know we are done, when the arrow points past the last item - And does the iteration ensure it will still be true after? Yes. It takes a sorted list and adds a new item in the right spot. - Our reasoning is a bit loose here, but you get the idea. For true rigour, we would do a proof! (And it would be worth the trouble for a tricky and new algorithm.) OK: so to code this, we need to: find where the current item goes in the list to the left move the items over to make room for it. Here's what we will do: save the number we are inserting. use the vacated spot to move things up. [67, 3, 56, 54, 3, 8, 2, 33] ^ v = 3 move the 67 over one [67, 67, 56, 54, 3, 8, 2, 33] then, put the 3 in the vacated spot [3, 67, 56, 54, 3, 8, 2, 33] ========================================== [3, 67, 56, 54, 3, 8, 2, 33] ^ v = 56 [3, 67, 67, 54, 3, 8, 2, 33] [3, 56, 67, 54, 3, 8, 2, 33] ===================================== [3, 56, 67, 54, 3, 8, 2, 33] ^ v = 54 [3, 56, 67, 67, 3, 8, 2, 33] [3, 56, 56, 67, 3, 8, 2, 33] [3, 54, 56, 67, 3, 8, 2, 33] you know you found the right spot, when? the item to the left is smaller than you! [3, 54, 56, 67, 3, 8, 2, 33] ^ v = 3 [3, 54, 56, 67, 67, 8, 2, 33] [3, 54, 56, 56, 67, 8, 2, 33] [3, 54, 54, 56, 67, 8, 2, 33] [3, 3, 54, 56, 67, 8, 2, 33] [3, 3, 54, 56, 67, 8, 2, 33] and so on. [3, 3, 54, 56, 67, 8, 2, 33] ^ [3, 3, 8, 54, 56, 67, 2, 33] ^ [2, 3, 3, 8, 54, 56, 67, 33] ^ [2, 3, 3, 8, 33, 54, 56, 67] ^ when arrow gets here, stop def insert(L, i): '''Move L[i] to where it belongs in L[:i], that is, the portion of the list to the left of i.''' v = L[i] while i > 0 and L[i - 1] > v: L[i] = L[i - 1] i -= 1 L[i] = v def insertion_sort(L): '''Sort the items in L in non-descending order.''' i = 0 # L[:i] is sorted. while i != len(L): insert(L, i) i += 1 return L