PROFESSOR: NOW BASICALLY WHAT WE DID LAST TIME WAS TO GO THROUGH ONE OF THE TWO VERY USEFUL DYNAMIC STRUCTURES, NAMELY THE B TREE. NOW, AS I MENTIONED, THE IMPORTANT THING ABOUT THESE DYNAMIC DATA STRUCTURES THAT IT'S ADAPTIVE. WE CAN ADJUST IT TO THE NUMBER OF RECORDS IN THE FILE STRUCTURE. SO WE ARE NOT CONDEMNED TO A FIXED STRUCTURE WHICH WE CANNOT CHANGE AND THEN WE HAVE TO ADD MORE RECORDS INTO THE OVERFLOW AREA. AS WE EXPLAINED. THAT CERTAINLY A VIABLE ALTERNATIVE BUT NOT A VERY GOOD ONE BECAUSE YOUR STRUCTURE WILL DEGENERATE IN TIME. WHEREAS THE DYNAMIC STRUCTURES BECAUSE OF THE NATURE, IT WILL TRY TO MAINTAIN ITSELF. SO IT'S SELF-CONTAINABLE. THE ALGORITHM TO DO THE UPDATE NAMELY INSERTION, DELETION WILL BE A LITTLE MORE COMPLEX BUT LUCKILY IT'S NOT THAT MUCH SO. SO THE B TREE IS A WAY OF GROWING THE TREE BASICALLY BY SPLITTING THE NODES. NO SPLITTING. THAT'S ONE TECHNIQUE. THAT'S VERY USEFUL. TODAY WE'RE GOING TO TALK ABOUT THE OTHER ONE. THE DYNAMIC HASHING. LIKE I SAID, WHAT I WOULD LIKE YOU TO BE ABLE TO DO IS THAT IF YOU ARE GIVEN THE PROBLEM LIKE THE ONE THAT I GIVE YOU AS AN ASSIGNMENT WHICH IS NORMALLY A SEQUENCE OF RECORDS, WELL, OBVIOUSLY I WON'T GIVE YOU THE WHOLE RECORD. I WILL JUST GIVE YOU THE KEY OF THE RECORD. SO GIVEN THIS KEY SEQUENCE, SHOW ME BY DRAWING SNAPSHOT HOW THIS STRUCTURE EVOLVE IN TIME. NORMALLY I WON'T ASK TO YOU DRAW EVERY SINGLE ONE OF THEM, SO IF I GIVE YOU A SEQUENCE OF EIGHT RECORDS, NORMALLY I ASK YOU TO DO DRAW THE FOURTH ONE, OR THE EIGHTH ONE, SOMETHING LIKE THAT. IF YOU ALREADY LOOK AT THE EXERCISE, YOU KNOW WHAT I MEAN. OKAY. SO, THE HASHING TECHNIQUE, THAT'S WELL KNOWN. WE ALL KNOW HOW TO DO THAT, NAMELY WE TAKE THE KEY, WE TRY TO SCRAMBLE THAT AND RENDER -- RANDOMIZE THAT SO WE CAN COME UP WITH HOPEFULLY A UNIQUE ADDRESS. OF COURSE, OFTENTIMES WE CANNOT ACCOMPLISH TO HAVE A UNIQUE ADDRESS. IF YOU CAN DO SO, THEN IT BECOMES A ONE TO ONE MAPPING, THAT'S TOO GOOD TO EXPECT. WE CANNOT ALWAYS HOPE FOR THAT. AT LEAST WE CAN STRIVE FOR THAT BY TRYING TO USE BETTER RANDOMIZING ALGORITHMS. SO SOME OF YOU AFTER CLASS LAST TIME CAME TO ASK ME WHY DO WE DO THE FOLDING, STUFF LIKE THAT, AND I KNOW SOME OF YOU ARE NOT PRESENT. BY FOLDING I MEAN WE TAKE A LONG STRING. I THINK I PROBABLY ASKED TO YOU CONTROL THE LIGHT SO YOU CAN SWITCH IT ON WHEN I USE THE BLACK BOARD. THANKS. SAY I HAVE A STRING LIKE THIS. THE FOLDING TECHNIQUE IS THAT I MOVE THE STRINGS SO THAT I CAN ADD UP THE CORRESPONDING BYTES BUT I DO IT IN SUCH A WAY THAT THE PREFIX OF THE STRING WILL NOT ALWAYS STAY AS THE PREFIX AND THE SUFFICIENT PHYSICIAN WILL -- SUFFIX WILL NOT ALWAYS STAY AS THE SUFFIX. LET'S SAY I'M DEALING WITH NAMES. I COULD HAVE A LOT OF ADAM. RIGHT? NOW IF I DON'T RANDOMIZE WHAT YOU WILL SEE IS THAT ALL THESE ADAM GET HIT TO THE SAME BUCKET. THAT'S NOT GOOD. IF WE TRY TO DO THIS KIND OF FOLDING AND THEN ADD THEM UP AND TRY TO COME UP WITH SOME PRIME NUMBER, I -- WE HAVE A BETTER CHANCE OF RANDOMIZING. OKAY. NOW, THIS WHAT YOU SHOULD HAVE ALREADY LEARNED IN OTHER CLASSES. SO, WHAT WE CAN ASSUME HERE IS THAT YOU ARE GIVEN, YOU ARE GIVEN THIS HASHING FUNCTION SO YOU CAN MAP FROM THE KEY TO THE ADDRESS. OKAY? THEN NORMALLY WHAT WE CAN DO IS THAT WE TAKE THE KEY, RANDOMIZE AND THE BUCKET, IF THE BUCKET IS FULL WE PUT RECORD INTO OVERFLOW AREA AND THERE ARE A NUMBER OF TECHNIQUES TO PUT THEM EITHER INTO THE OVERFLOW AREA OR WE CAN STILL SAVE OVERFLOW RECORD IN THE PRIMARY AREA BY USING SOME OVERFLOW HANDLING TECHNIQUES. OKAY. NOW, ALL OF THESE ARE RATHER MESSY. IF YOU THINK ABOUT THAT, YOU KNOW WHAT I MEAN. BECAUSE PRETTY SOON YOU GET A WHOLE THING KIND OF MIXED TOGETHER. SO EXTENDIBLE HASHING TECHNIQUE TRY TO DO AWAY WITH THAT BY BORROWING THE TECHNIQUE FROM B TREE. THE IDEA IS THE FOLLOWING -- WE KNOW THAT WE ALWAYS NEED MORE SPACE AS WE GROW THE FILE. AND IN B TREE WE DO THAT BY ALLOCATING MORE NODES AND THEN WE SPLIT THE NODES. IN EXTENDIBLE HASHING WE DO SOMETHING QUITE SIMILAR. NAMELY, WE START WITH A SMALL TABLE, HASH TABLE AND WE ONLY TRY TO ENLARGE TABLE WHEN IT BECOMES NECESSARY. NOWS AS A RULE OF THUMB NORMALLY WHAT WE TRY TO DO IS IF YOU HAVE END RECORDS YOUR FILE SIZE SHOULD BE TEN. IT SHOULD BE DOUBLE THE SIZE. LET'S SAY HAVE YOU HUNDRED RECORDS, YOU WANT TO HAVE A HASH TABLE OF TWO HUNDRED ENTRIES. WHY? BECAUSE THIS WILL ENSURE THAT YOU DON'T HAVE COLLISION OR YOU DON'T HAVE TOO MANY COLLISIONS. BUT BY THE SAME TOKEN YOU, YOU -- YOU CAN SEE YOU WASTE HALF THE SPACE. YOU MAY WANT TO START SMALL AND GRADUALLY GROW THE TABLE. THE QUESTION IS HOW WE GROW THIS TABLE. THE IDEA IS THE FOLLOWING -- LET'S TAKE OUR HASH KEY WHICH IS USUALLY NUMBER 13, 12, 14, STUFF LIKE THAT. LET'S CONVERT THAT INTO A BINARY NUMBER. WE CAN ALL DID THAT, RIGHT? BECAUSE IF WE HAVE A BINARY NUMBER WE CAN TRUNCATE THE NUMBER ANYWHERE WE WANT. WE CAN SEE THE FIRST BIT AND SECOND BIT, IF I HAVE A LONG BINARY STRING, IF I TRUNCATE AND TAKE ONLY THE FIRST BIT, THEN I HAVE TWO ALTERNATIVE, ZERO AND ONE. AND USING THIS ONE BIT I CAN ADDRESS A TABLE OF TWO OF SIZE TWO. IF I TRUNCATE MY BINARY NUMBER TO TWO BITS, THEN I CAN HAVE A TABLE OF FOUR. THREE BITS, EIGHT. YOU GOT IDEA. OKAY? IN OTHER WORDS, IF I USE BINARY NUMBERS AS ADDRESSES TO MY HASH TABLE, THEN BY TAKING LESS OR MORE NUMBER OF BITS, I CAN HAVE A SMALLER TABLE OR A LARGER TABLE. OKAY? SO THAT'S THE BASIC IDEA. AND THEN INFORMATION WILL -- THE RECORDS WILL BE STORED ACTUALLY IN THE PAGES OF THE BLOCKS AS POINTED AT BY THE HASH TABLE. SO LET ME SAY THAT ONCE MORE. BASICALLY, WE TAKE THE HASH TABLE TO BEGIN SMALL. AND WE CAN START WITH SAY A TABLE OF JUST TWO ENTRIES AND YOU NEED ONLY ONE BIT TO TELL THE TWO ENTRIES APART. AND THEN YOU KEEP ON GROWING YOUR TABLE THE WAY YOU GROW YOUR TABLE IS TO INCREASE THE LENGTHS OF YOUR BINARY HASH KEY. SO EVERY TIME YOU INCREASE IT BY ONE, YOUR HASH TABLE THE SIZE DOUBLE. YOU INCREASE BY ONE, THE SIZE DOUBLE. THAT'S HOW WE DO IT. OKAY? SO NOW WE'RE GOING TO GIVE YOU EXAMPLE AND WE'LL COME BACK TO REVISIT THIS PAGE TO EXPLAIN IT. SO, JUST A COUPLE OF DEFINITIONS AND YOU WILL SEE THE MEANING MORE CLEARLY LATER ON. SO IF I HAVE A TABLE OF EIGHT, NOW THIS TABLE WE CAN ADDRESS EVERY ENTRY IN THE TABLE USING THREE BITS. THEN WE CALL IT THE DEPTHS OF THIS TABLE SO DEPTH IS THREE. SO THIS THE NUMBER, THE D. SO THIS NUMBER 3 MEANS THAT CODE LENGTH OF THIS BINARY STRING IS THREE. THERE IS ANOTHER NUMBER WHICH FOR THE PRESENT TIME DOES NOT SEEM TO BE TOO MEANINGFUL, BUT LET ME EXPLAIN THIS ANYWAY, WHICH IS CALLED THE LOCAL DEPTHS. THIS IS THE NUMBER ASSOCIATED WITH EACH PAGE. OKAY? THE LOCAL DEPTHS, THIS NUMBER REFLECTS. NUMBER OF BITS WE NEED TO ADDRESS EVERY ENTRY IN THE HASH TABLE. AND THEN THERE IS A LOCAL DEPTH AND ITS MEANINGFUL AND USEFULNESS WILL BECOME CLEAR LATER ON. BUT JUST BY LOOK AT THIS EXAMPLE YOU CAN SEE THAT THE LOCAL DEPTHS IS ALWAYS LESS THAN OR EQUAL TO THE GLOBAL DEPTHS F I GLOBAL DEPTHS OF THREE THE NUMBER ASSOCIATED WITH THE PAGES CAN BE NO GREATER THAN THREE. IT COULD BE THREE IN SOME CASES BUT CANNOT BE BIGGER THAN THREE. AND WHY IS THAT, WE'LL SEE LATER ON. THAT'S THE PRIMARY STRUCTURE. NOTICE IN THIS STRUCTURE, IN THIS KIND OF HASHING TECHNIQUE, WE HAVE NO OVERFLOW AREA, THERE IS NO NEED TO HAVE OVERFLOW AREA. WHY? BECAUSE WE ALREADY ACCOMMODATE THAT BY THIS KIND OF SPLITTING TECHNIQUE. OKAY. NOW, I TRY TO EXPLAIN THIS JUST USING THIS ONE SLIDE AND LATER ON YOU CAN LOOK AT MORE COMPLICATED EXAMPLES. WHEN WE RECEIVE RECORDS FOR EVERY RECORD WE'LL TAKE THE KEY AND THEN COMPUTE ITS EQUIVALENT HASH KEY. WHICH IS THE BINARY STRING. AND GIVEN THAT BINARY STRING, WE CAN THEN FIND OUT THE CORRESPONDING ENTRY IN THE HASH TABLE. SO, LIKE IN THIS CASE, IF THE KEY IS A, ITS HASH STRING -- YOU HAVE IT IN YOUR NOTES, SOMETHING LIKE 0010, SOMETHING LIKE THAT. AND IF I TAKE THE TWO BITS, 00 IT GOES TO THE FIRST ENTRY. AND THE SAME WAY. IN OTHER WORDS, YOU TAKE THE KEY CONVERT IT INTO THE HASH ADDRESS AND THEN YOU CAN START IT IN THE CORRESPONDING PAGES . YOU CAN STORE IT IN THE CORRESPONDING PAGES. THIS IS A SITUATION WHERE I STORE THESE TWO PAGES HERE -- THESE TWO RECORDS HERE, THESE TWO RECORDS HERE, THIS RECORD HERE. SO THE LOCAL DEPTHS IS ACTUALLY THE NUMBER OF BITS I NEED TO TELL THESE TWO RECORD APART. SO CHECK YOUR NOTES SO THAT THE BA MUST BE CORRESPONDING TO 00 SOMETHING AND THIS TO 001 SO FIRST BIT, SECOND BIT ARE IDENTICAL AND YOU CAN TELL THEM PART BY THIRD BIT SO THAT'S WHY THIS. AND E AND F HITS THE SECOND PAGE AND THEN THESE TWO ENTRIES ARE NOT YET USED. NOW SUPPOSE I WANT TO ADD A NEW RECORD INTO THE HASH FILE AND THE NEW RECORD HAS I THINK HAS KEY F. SO IF I TRUNCATE IT TO THE FIRST TWO BITS IT'S 01 SO IT SHOULD HIT THIS BUCKET AND I SHOULD PUT IT IN THE CORRESPONDING PAGE BUT NO ROOM. AS YOU RECALL, IF IT IS A B TREE WHAT I DO IS THAT I SPLIT THAT NODE INTO TWO. AND HERE IN EXTENDIBLE HASHING WHAT YOU DO IS NOT TO SPLIT THE NODE BUT ACTUALLY DOUBLE THE SIZE OF THIS TABLE. SO WE DOUBLE THE SIZE, SO AT FIRST WE HAVE 00, 01,10, 11 THEN I INCREASE BY ONE BIT AND 010, ET CETERA. SO WE CAN ACCOMMODATE ADDITION OF A BLOCK. NOW NOTICE THAT THE LOCAL DEPTHS OF ALL THE OTHER BLOCKS REMAIN THE SAME EXCEPT THE ONE THAT'S SPLIT BECAUSE THIS TIME I NEED ONE MORE BIT TO TELL THE RECORD AS PART. SO IT BECOMES THREE. AND ALSO THE GLOBAL DEPTHS IS ALSO INCREASED BECAUSE IT'S DOUBLE IN SIZE. SO I STILL MAINTAIN A CORRECT RELATIONSHIP NAMELY LIKE I SAID, THE GLOBAL DEPTHS MUST BE GREATER THAN OR EQUAL TO THE LOCAL DEPTHS. OKAY. SO THE RECAP ALGORITHM IS FAIRLY SIMPLE. WE MAKE INSERTION WHERE EVER WE HAVE A COLLISION, WE NEED TO INCREASE THE SIZE OF THIS PAGE AND THE WAY WE DO IS WE SPLIT THIS PAGE. WHEN WE SPLIT THIS PAGE, INCREASE LOCAL DEPTHS AND WE COMPARE LOCAL DEPTHS TO GLOBAL DEPTHS. AFTER THE INCREMENT IT BECOMES LARGER THAN GLOBAL DEPTHS WE HAVE TO INCREASE THE SIZE OF THIS TABLE. NOTICE WHEN YOU DUPLICATE, WHEN YOU DUPLICATE ENTRY, 300 IT BECOMES 000 AND 001. AND THE POINTER WILL ALSO BE DUPLICATED BECAUSE THAT PAGE HASN'T SPLIT YET. I THINK HERE, SO IF I SPLIT THIS, I DUPLICATE THE POINTER. STILL POINT AT SAME PAGE. ONLY FOR THIS ENTRY WHEN I SPLIT THEY POINT AT DIFFERENT PAGES. OKAY. CAN I HAVE THE LIGHT? SO THE IMPORTANT CONCEPT IS THIS NUMBER, THE BIG D AND THAT THIS NUMBER WHICH IS THE SMALL D. SO AT ANY GIVEN POINT YOU WANT TO MAKE SURE THE BIG D IS GREATER THAN OR EQUAL TO THE SMALL D. WHEN A RECORD COMES IN, YOU USE YOUR HASH FUNCTION TO COMPUTE THIS HASH KEY AND YOU USE ONLY AS MANY AS THE GREAT D TELLS YOU. YOU USE THAT. YOU GO TO THIS PAGE. TO THIS BUCKET AND GO TO THE PAGE AND IF IT IS FULL, THEN YOU NEED TO SPLIT IT. WHEN YOU SPLIT IT, YOU INCREMENT SMALL D BY ONE. AS YOU INCREMENT YOU COMPARE THE NEW D, SMALL D WITH BIG D. IF IT IS STILL SMALLER YOU ARE OKAY YOU DON'T HAVE TO SPLIT PRIMARY AREA. IF IT BECOMES BIGGER THEN HAVE YOU TO DOUBLE THE SIZE. NOTICE THE CONSEQUENCE IS THAT THIS PRIMARY HASH IS LESS THAN HALF FULL. YOU WANT TO MAINTAIN IN ANY CASE BECAUSE IF YOU DOUBLED SIZE OF COURSE THE NEW ENTRIES ARE USUALLY EMPTY. SO, ROLE OF THUMB FOR USING THE HASH FILE IN A VERY EFFICIENT WAY TOYS KEEP IT APPROXIMATELY HALF FULL. OKAY. SO THIS IS A VERY NICE STRUCTURE AND THAT'S WHY I THINK EVERYBODY SHOULD KNOW THAT BECAUSE IT'S KIND OF USEFUL AND IN PRACTICE IS ALSO NOT THAT HARD TO PROGRAM. TURN IT OFF. THANK YOU. SO IF YOU LOOK AT YOUR EXERCISE, THE NEXT ASSIGNMENT, IS FOR TO YOU TAKE I THINK THE SAME PROBLEM AND CONSTRUCT AT FIRST A DYNAMIC HASH FILE AND THEN A REGULAR -- SORRY. THE OTHER WAY. FIRST REGULAR HASH FILE AND THEN THE EXTENDIBLE HASH FILE STRUCTURE. THE REGULAR HASH FILE WILL GO LIKE THIS. IF YOU ARE USING THE REGULAR HASH FILE THEN YOU WILL HAVE OVERFLOWED RECORDS. CAN I HAVE THE LIGHT PLEASE? OKAY. LET ME JUST PICK A SMALLER PROBLEM AND LET'S SAY I HAVE RECORDS WITH KEY N, 4, 8, 2. -- 10, 4, 8, 2 AND IF I TAKE MOD4 IT BECOMES 2, 0, 02. BY THE WAY THIS IS NOT BUT WAY TO DO THE HASHING. I REALLY SHOULD NOT BE TAKING MOD AGAINST THE POWER OF 2. I SHOULD BE CHOOSING A PRIMARY NUMBER MICE MINUS SOMETHING. JUST FOR SIMPLICITY. 4 MOD 4, 0, 8, 0, 2 IS 2. SO FIRST RECORD TEN WILL GO HERE. OH. LET'S SAY WE CAN HAVE TWO KEYS PER BUCKET. SECOND ONE WILL GO HERE. THIRD ONE WILL GO HERE. THE FOURTH ONE WILL GO HERE. SO AT THIS POINT THERE IS STILL NO OVERFLOW AFTER FOUR RECORDS. THE NEXT ONE, LET'S SAY THE NEXT ONE AGAIN, JUST FOR THE SAKE OF ARGUMENT HAS KEY 12. AND THEN ITS ADDRESS SHOULD ALSO BE ZERO. SO IT SHOULD GO HERE. NO ROOM. SO I NEED TO ADD THIS TO MY OVERFLOW AREA OKAY. SO, THIS IS THE KIND OF SNAPSHOT I'M LOOKING FOR. SO YOU TAKE THE KEY SEQUENCE, YOU INSERT INTO YOUR REGULAR HASH FILE, YOU END UP WITH SOME OVERFLOW. AND THEN IT ASKS YOU TO COMPUTE THE AVERAGE LENGTH OF THE OVERFLOW CHAIN. OKAY? THEN YOU TAKE THE SAME THING, SAME SEQUENCE, TRY TO CONSTRUCT THE DYNAMIC HASH FILE STRUCTURE. QUESTIONS? YES? STUDENT: SO YOU'RE SAYING WITH THE HASH FILE RIGHT THERE SAY ANOTHER ONE WAS ADDED TO SECTION ZERO. PROFESSOR: OKAY. GIVE ME A KEY. STUDENT: 16, I GUESS. PROFESSOR: OKAY. IT'S MOD FOUR IS ZERO. STUDENT: OKAY. PROFESSOR: SO IT HIT THIS IS PLACE. I'M FULL. GUY HERE. STUDENT: THEN YOU ADD ANOTHER ONE? PROFESSOR: OWE YEAH. STUDENT: OKAY. PROFESSOR: YOU ARE WITH ME? THEN TAKE THE SAME KEY SEQUENCE TRY TO DO IT USING THAT EXTENDIBLE HASHING TECHNIQUE. IN THAT CASE YOU NEED TO CONVERT THE KEY INTO A BINARY STRING AND USE THAT BINARY STRING TO DO HASH FILE COMPUTATION. OKAY. NOW, THERE ARE A COUPLE OF DETAILS IN THE PHYSICAL STRUCTURE. NOW, AGAIN, THESE THINGS I PRESUME YOU ALREADY KNOW BUT JUST IN CASE YOU DON'T KNOW, IT'S IN THE NOTES. THESE THINGS ARE HELPFUL. FOR THE INFORMATION WE STORE IN THE DATABASE. THE IDEA IS THAT YOU CAN TAKE YOUR KEY SPACE AND BY TRUNCATING THEM YOU STILL CAN END UP WITH UNIQUE KEYS BUT THE KEYS WILL BE IN TERMS OF LENGTH WILL BE MUCH LESS. OKAY? SO THE MAIN THING IS THAT YOU WANT TO MAINTAIN UNIQUENESS BECAUSE THAT'S WHAT KEYS ARE FOR, THEY HAVE TO BE UNIQUE. BUT YOU WANT TO MAKE IT AS SHORT AS POSSIBLE SO THAT YOU CAN STORE THAT IN LESS SPACE. SO HERE ARE SOME WELL KNOWN TECHNIQUES, THE FRONT COMPRESSION, REAR COMPRESSION, BASICALLY WHAT YOU DO IS THAT YOU ELIMINATE THE PART THAT'S IDENTICAL WITH THE PREVIOUS STRING IN THE STORED SEQUENCE. FOR EXAMPLE, IF YOU HAVE ROBERTON, ROBERTSON, ROBERTSTONE, ROBINSON, THE FIRST YOU HAVE NO CHOICE, YOU HAVE TO STORE THE WHOLE THING. THE SECOND ONE AS COMPARED TO THE FIRST ONE, YOU NOTICE THAT THE FIRST SIX LETTERS ARE THE SAME. SO INSTEAD OF STORING THE SIXTH LETTER I NEED ONLY STORE THE SIXTH AND THEN S-O-N WHICH IS THE REMAINING PART AND NEXT ONE WE HAVE SEVEN LETTERS, WE STORE SEVEN PLUS THE REMAINING ONES. AND I CAN DO THE SAME STRICT BY COMPRESSING THE REAR END SO THAT I CAN END UP WITH EVEN SHORTER STRING. ANOTHER WELL KNOWN TECHNIQUE WORKS SIMILARLY. NOW, ALL THESE TECHNIQUES I SHOULD COMMENT WORK WHEN YOU HAVE ALREADY SORTED THE KEYS AND THEY ARE STORED IN THE SORTED SEQUENCE IN YOUR TABLE. THEN YOU TRY TO COMPRESS THE LENGTH OF THE KEYS SO THAT YOU CAN STORE A SHORTER KEY. OKAY? SO HERE IS THE ORIGINAL KEY. OR SET OF KEYS. AND WE HAVE ALREADY SORTED THEM AND MAINTAINED THEM IN SORTED ORDER. THEN WHAT WE NEED TO DO IS TO TRY TO FIND THE LEAST NUMBER OF LETTERS SO THAT WE CAN TELL THEM APART. NOW, HERE IS A TECHNIQUE WHERE WE JUST TRUNCATE THEM TO THE FIRST FOUR LETTERS. NOTICE IT DOESN'T WORK TOO WELL, RIGHT? BECAUSE IF YOU COME TO BELL AND BELLSON THEN THEY ARE ACTUALLY IDENTICAL YOU TRUNCATE THEM TO FOUR LETTERS. THIS MARKER INDICATES THEN THAT THE SECOND OCCURRENCE OF B-E-L-L, WHICH IS IN A TRUNCATED WAY THE SAME AS THIS ONE, HAS MORE LETTERS SO WE HAVEN'T FINISHED HERE. NOW, A BETTER WAY IS TO TRY TO FIND THE LEAST NUMBER OF LETTERS TO TELL IT APART FROM THE PRIOR STRING. LET'S JUST LOOK AT HERE. OKAY. SO THIS IS THE GIVEN KEY SEQUENCE. I ALREADY SORTED THEM IN ASCENDING SEQUENCE, SO I LOOK AT THE FIRST ONE AND ASK MYSELF, HOW MANY LETTERS DO I NEED TO TELL IT APART FROM THE PRIOR STRING? SO WHAT IS THE STRING BEFORE THAT? NOTHING. BEFORE BABBET THE ONE PRIOR TO THAT IN THE SEQUENCE IS EMPTY STRING SO I NEED ONLY ONE LETTER. THAT'S THE MINIMUM. SO B. SECOND ONE, B-A-B-S-O-N, TO TELL BABSON APART FROM BABBET, I NEED B-A-B-S. OKAY? AND NOW TO TELL BAILEY APART FROM BABSON I NEED BAI. YOU STOP WHERE YOU HAVE ONE LETTER JUST DIFFERENT. NEXT ONE IS BAKER, I NEED BAK THEN BE. NOTICE WHEN YOU CHANGE THE FIRST LETTER GOING TO DIFFERENT SET OF NAMES, YOU START FROM THE MINIMUM NUMBER OF CHARACTERS, C. OKAY. AND YOU CAN RUN THIS ALGORITHM FORWARD AND BACKWARD. FORWARD MEANS YOU RUN IT FROM TOP TO BOTTOM AND THE NEXT COLUMN IS TO RUN ALGORITHM NOW STARTING FROM BOTTOM TO THE TOP. SO YOU START FROM HERE. THEN YOU GO UP. AND THEN YOU TAKE THE LONGER OF THE STRINGS TO FORM THE FINAL SET OF TRUNCATED KEYS. THIS IS BY RUNNING THIS ALGORITHM FORWARD AND BACKWARD. AND THESE ARE GUARANTEED TO BE UNIQUE AND MINIMAL IN LENGTH. OKAY. AND OF COURSE THERE ARE MANY VARIATIONS OF THESE -- YES? STUDENT: WHY WOULD YOU WANT TO TAKE THE LONGER OF THE STRINGS? PROFESSOR: BECAUSE THIS ONE WORKS. IF I DO IT FROM TOP TO BOTTOM. THIS ONE WORKS FROM IF I DO IT FROM BOTTOM TO TOP. STUDENT: THAT'S SO YOU CAN DO IT FROM EITHER WAY? PROFESSOR: THAT'S RIGHT. I WANT TO DO EITHER WAY, SO I TAKE, YOU CAN SEE IT AS JOINING OF THE TWO, TAKE LONGER ONE TO MAKE FINAL ONE OF THE THIS WAY YOU CAN SEARCH TOP TO BOTTOM AND BOTTOM TO TOP AND YOU'RE STILL ABLE TO TELL THEM PART. IN FACT, IF YOU ONLY WANT TO SEARCH FROM TOP TO BOTTOM USE SECOND COLUMN AND IF YOU WANT TO SEARCH FROM BOTTOM TO TOP USE THIS ONE AND THIS WILL ENABLE YOU TO GO EITHER WAY. OKAY. THESE KIND OF TECHNIQUES CAN BE COMBINED WITH TREE TECHNIQUES, NAMELY I'M NOT SAYING T-R-E. I'M SAYING T-R-I-E. WITH TRIE TECHNIQUES SO WE CAN GENERATE SMALLER TABLES. AGAIN IDEAS ARE FAIRLY SIMPLE. YOU TRY TO TAKE THE COMMON PART AND ENCODE THAT INTO A HIGHER LEVEL INDEX AND THEN GOOD TO THE LOWER PART. NOW IN THE NOTES I GIVE YOU QUITE A FEW VARIATIONS OF THESE TRIES WHICH I ASK YOU TO LOOK AT AND LET ME MENTION THE LAST ONE. WHICH IS GUARANTEED TO BE OPTIMAL IN TERMS OF ENCODING WHICH IS THE HOFFMAN CODE. MANY OF YOU MAY ALREADY KNOW THAT. THIS IS BASED ON PROBABILITY. IF I KNOW THE OCCURRENCE OF THE VARIOUS WORDS AND I WANT TO ENCODE THAT INTO BINARY NUMBERS, AND WHAT I DO OBVIOUSLY IS THAT THOSE THAT WILL OCCUR MOST FREQUENTLY I TRY TO ENCODE IT WITH THE LESS NUMBER OF BITS. AND THOSE OCCUR VERY INFREQUENTLY I CAN AFFORD TO ENCODE IT WITH LONGER STRING OF BITS AND THIS IS HOW THE HOFFMAN CODE WORKS. SO YOU TRY TO FIRST STORE YOUR WORDS IN TERMS OF FREQUENCY OF OCCURRENCE. LET'S SAY YOU HAVE WORDS A, B, C, D AND E SO THE FIRST ONE OCCURS WITH PROBABILITY POINT 5 AND THIS ONE POINT 15 AND SO ON. YOU FIRST STORE THEM IN DESCENDING ORDER. AND THEN YOU TRY TO COMBINE THEM STARTING FROM THE LEAST FREQUENT ONE. SO I START FROM HERE. SO THESE TWO D AND E THEY BOTH OCCUR WITH FREQUENCY 0.1. I COMBINE THEM, THAT GIVE ME 0.2. COMBINING THESE TWO GIVE ME 0.3 AND COMBINING THESE TWO 0.5 AND FINALLY 1.0. NOW WHEN I ASSIGN MY CODE WORD I START FROM HERE, SO I WILL ASSIGN 1 OR ZERO WHERE EVER I HAVE A BRANCH. SO I START FROM THE TOP -- IT WORKS OTHER WAY, TOO, DOESN'T MATTER. GUY HERE, 1, 0, 1, 0. SO I END UP WITH THE FOLLOWING ENCODING. A IS ENCODED INTO ONE. B, 011. C010, D001, E, 000. SO IF YOU COMPUTE AVERAGE LENGTH OF YOUR CODE, THIS HOFFMAN CODE WILL GUARANTEE TO BE MINIMAL IN LENGTH. OKAY. SO THESE ARE JUST A LOT OF DETAILS AND WHAT I WILL ASK YOU TO DO IS TO LOOK AT MY NOTES SO YOU CAN KNOW THIS -- IF YOU KNOW THE NOTION OF ENCODING. OKAY. CAN I HAVE THE LIGHT? OKAY. SO MUCH FOR PHYSICAL STRUCTURES. NOW IN THE NEXT EXERCISE BASICALLY YOU ARE DOING THE B TREE AND EXTENDIBLE HASHING BUT LATER ON WHEN YOU DO THE -- IF YOU CHOOSE TO DO THE LIBRARY INFORMATION SYSTEM PROJECT WHAT I HAVE IN MIND SO THAT YOU KNOW WHAT I'M DRIVING AT IS THE FOLLOWING. YOU WILL START FROM SO-CALLED MARC RECORD. WHICH IS I THINK IT'S INTERNATIONALLY USED BUT IT IS UNIVERSAL IN THIS COUNTRY AT LEAST, SO ALL OF THE LIBRARY RECORD IN EXTERNAL FORMAT ARE ENCODED AS MARC RECORD. THESE ARE IN THE EXTERNAL FORMAT. SO, YOU WILL NEED TO WRITE A PROGRAM STARTING FROM THIS ENCODED MARC RECORD. BE ABLE TO DECODE IT. SO THIS PROGRAM IS FOR THE DECODER. TO DECODE MARC RECORD WHICH I ENCODED. THEN A DATABASE LOADER TO LOAD IT INTO THE RELATIONAL DATABASE. NOW, SOME OF YOU MAY WANT TO COMBINE THESE TWO PROGRAMS INTO ONE. THAT'S FINE WITH ME. IF YOU WANT TO WORK THAT WAY, THAT'S FINE. BUT THE PURPOSE OF THIS PART OF EXERCISE IS THAT ACTUALLY DEALING WITH PHYSICAL DATA SO WE CAN DEAL WITH EXTERNALLY FORMATTED DATA AND BE ABLE TO DECODE THAT, EXTRACT LOGICAL INFORMATION. BECAUSE LATER ON WHEN I GIVE YOU THE DATA YOU NOTICE IF YOU JUST DO A DUMP IT'S MEANINGLESS. IT'S JUST GARBAGE BUT ONCE HAVE YOU DECODER THEN YOU CAN EXTRACT LOGICAL INFORMATION LIKE BOOK TITLE, AUTHOR'S NAME, COME TO BE MEANINGFUL. AND THEN THE LOADER WILL LOAD THAT INTO THE CORRESPONDING RELATIONAL TABLES. THAT'S THE PART THAT YOU ARE GOING TO-DATABASE THEN FROM HERE, THEN WE NEED TO DEVELOP THE WEB BASED PROGRAM AND USER INTERFACE, FORMS. SO THE USER SITTING ANYWHERE IN THE WORLD BEFORE HIS OR HER PERSONAL COMPUTER, CAN USE THE FORM TO ACCESS THE DATABASE. POST QUERIES. SO THAT GIVE IDEA. SO IN THIS ONE, YOU NEED TO DEAL WITH PHYSICAL DATA AND DO SOME ENCODING AND YOU WILL NOTICE SOME OF THE FIELDS ARE ACTUALLY ENCODED ALTHOUGH THE ONE I GIVE YOU ARE NOT ENCODED. OKAY. SO TO SUMMARIZE WHAT WE TALK ABOUT IN THE THREE LECTURES ARE THE PHYSICAL DATA STRUCTURES, NAMELY HOW WE -- GO BACK -- SO HOW WE CAN TAKE EXTERNAL DATA, ENCODED DATA AND CONSTRUCT THE FILES AND PUT THEM INTO THE STORED DATABASE AND THEN THROUGH THE FILE MANAGER AND THE INPUT/OUTPUT ACCESS ROUTINES WE CAN EFFICIENTLY MANAGE AND ACCESS INFORMATION. ONCE WE HAVE THAT, WHEN WE HAVE LOW LEVEL DATA STRUCTURES AND INDEX, THE DBMS, WHICH IS ORACLE SYSTEM, AND BE ABLE TO USE THOSE ACCESS ROW TEENS SO IT CAN SERVE THE NEEDS OF THE END USER. OKAY. SO THAT'S ALL WE'LL COVER. SO, WHEN YOU GO HOME, YOU SHOULD STUDY THOSE COMPRESSION TECHNIQUE. SOMEWHAT. BECAUSE AT LEAST ONE PROBLEM WILL BE ABOUT THAT. OKAY. THAT WILL BE ALL.