CS 1501
Data
Structures and Algorithms
Programming
Project 5
Handed
out: Week of March 20, 2000
Due: All
assignment materials (including hard copy of all source files, disk containing
source AND executable code files, writeup and completed Assignment Information
Sheet) by the BEGINNING of class on:
·
Wednesday, March 29 for the MW Section
·
Thursday, March 30 for the TH Section
Purpose: The purpose of this assignment is for you to fully understand the LZW compression algorithm, its performance and its implementation.
Procedure:
Thoroughly read the description/explanation of the LZW compression algorithm as discussed in http://www.cs.sfu.ca/cs/CC/365/li/squeeze/. Be sure to read not only the top level LZW document, but also the more detailed paper in http://www.dogma.net/markn/articles/lzw/lzw.htm . Also run the applet to see how the algorithm works.
Download the implementation of the algorithm provided and get it to work. The code is old-style C code, so you should save it as a .c node rather than a .cpp node. However, the code should work as is (possibly with some warnings). If you prefer working in C++, you may modify it, but be very careful if you choose to do this, as your changes may introduce some errors.
Examine the C code very carefully, convincing yourself exactly what is accomplished by each function and by each statement within each function. Once you have the original version of the LZW algorithm working correctly, copy it and save the original. Now modify the copy in the following ways:
1) Instead of just demonstrating LZW by compressing and then decompressing the same file, your program will actually be useful, allowing a file to be compressed or decompressed by the user. Your program should accept flags that will indicate whether it should compress or decompress a file. For example, the following command would compress a text file:
lzw -C myfile.txt
Your program should automatically append a .lzw to the end of any file name as well, so the result of your compression will be the file myfile.txt.lzw . To decompress the file you would then type
lzw -D myfile.txt.lzw
and the original file would be restored.
2) Your program should check the extension of the file argument, and do the actions listed below based on that extension.
a) If the (final) extension is .lzw and the flag is -C, output to the display that the file is already compressed and exit.
b) If the (final) extension is NOT .lzw and the flag is -D, output to the display that the file extension is not consistent with a compressed file and exit
c) If the (final) extension is .cpp and the flag is -C, before compressing the file insert all of the C++ keywords into the dictionary (immediately after all of the individual ASCII characters). To allow your program to be used without any extra files, the keywords should be hard coded into your program. Make sure your keywords are inserted in all lower case.
d) If the (final) extension is .txt and the flag is -C, before compressing the file insert some common English words and letter combinations into the dictionary (immediately after all of the individual ASCII characters). As with the C++ keywords, these should be hard coded into your program, so you cannot have too many. The exact words/combinations are up to you, but you MUST have AT LEAST 50, and they should be logical choices. For example, some logical choices would be AND, THE, NOT, ING, STR. Some bad choices would be DICHOTOMY, EXISTENTIALISM, QQRXT, ORGANO-METALLIC. Furthermore, you should enter each word/letter combination TWO TIMES (with two separate codewords), once with the first letter capitalized and once in all lower case.
Once you have your modified program working, test it with some example files. See how well your program compresses the various files by calculating its COMPRESSION RATIO. For each file you compress, record the original file size (in bytes), the compressed file size (in bytes), and the ratio of the (compressed file size)/(original file size). Clearly, the smaller the ratio, the better the compression. For the .txt and .cpp examples, calculate the compression ratio for both the original program and for your modified program. We will provide you with a number of files to use for testing. See the announcements for when/where they are available.
Write a short paper (~2 pages) that discusses each of the following:
· What does the original program use for a hash function? Is this effective?
· For the compression algorithm, how is the data actually stored in the hash table? State all variables used for the hash table and explain what they are for. Cite specific lines of code when appropriate.
· Explain (in detail) how the program handles the decompress case of the next codeword not yet being in the dictionary (for example with the string AAA as discussed in lecture). Cite specific lines of code and explain them.
· How well did the algorithm compress the various files? Cite your calculated compression ratios for the files and speculate as to why the compression may have been better with some files that with others.
· How well did your modified program compress the .cpp and .txt files compared to the original program? Again cite the compression ratios. Was the difference more or less than you expected?
Hand in a printout of your modified source code, your paper, and a disk containing all of your source and executable code (including the executable code of the unmodified algorithm), along with your Assignment Information Sheet.