Strings Amount of Information Appropriate Concept terabit of 0's low Kolmogorov Complexity terabit of random bits high Entropy terbit of bits of pi low Kolmogorov Complexity terbit of www pages medium Kolmogorov Complexity ENTROPY Examples: 1) Coin flip 2) prob(A)= prob(C)=prob(G)=prob(T)=1/4 Lemma: If X and Y are independent random events then entropy H should satisfy H(X + Y) = H(X) + H(Y) Another Example: prob(A)=1/2 prob(C)=1/4 prob(G)=prob(T)=1/8 Definition: Entropy of a probability distribution X over a set U: H(X) = Sum_{x in U} prob(x) lg (1/prob(x)) Here H(x) = lg (1/prob(x)) is the entropy in bits of the string x. Note that prob(x) = 1/2^H(x). So H(x) is about how many bits you expect to see. Source Coding Theorem (Shannon 1948) : Assuming that you have an input that is a string S where each element of S is drawn independently according to a distribution X. The is a scheme that can transmit S using only about |S| H(x) bit. Every possible scheme uses at least about |S| H(X) bits. Proof: Divide S into blocks of size n for some reasonably large n. The probability that a block will be the string value B is Pi_{x in B} prob(x). The expected probability that you will see a string B is sum_b (Pi_{x in B} prob(x))^2. Key Insight: It is very likely that you will see a block B whose probablity is about 2^{-n H(X)}. So the distribution you see for a particular block is very close to a uniform distribution over 2^(n*H(X)) Achievability: So mostly you see blocks of size n of equal probability, so you use nH(X) bits for each such equally probable block. Use any reasonable encoding on the unlikely blocks. Optimality: Forget about encoding unlikely strings. You can't do better than using an equal number of bits for equally probable outcomes KOLMOGOROV COMPLEXITY Intuition: Measures information of a fixed string, rather than a distribution over strings/objects as does entropy. Definition: The Kolmogorov complexity K(x) of a string x is defined to be K(x) = min_{ strings y, and decoders D such that D(y)=x} length of y plus length of D = min_{programs P that write x on an empty input} length of P Theorem: The programming language only affects K(x) by an additive constant Theorem: There are incompressable strings, strings where K(x)=length of x, of every length n. Proof: Pigeon hole principle Theorem: There is no algorithm M to compute K(x) Proof: To reach a contradiction, assume M exists Consider the program P_n for i = 1 to 2^n do if M(ith string of length n) >= n then output this string and halt Note that by the previous thoerem P_n always halts and outputs a string with Kolmogorov complexity at least n. But as P_n has only O(\log n). So for some sufficiently large n, we get a contradiction to the output of P_n having Kolmogorov complexity n since P_n is a program of length O(log n) that outputs this string