CS0441 lecture notes for Nov 25, 2002 Prof. Kurt VanLehn

Random variables

A random variable is neither a variable nor random.  Given a sample space S, it is a function from outcomes to real numbers.  That is, for all sÎS, f(s) is a real number.  However, random variables are tradionally denoted by X(s) rather than f(s), just to confuse us.  Any function of the outcomes is acceptable.  For instance, suppose our sample space is generated by rolling three dice.  Here are some possible random variables:

·        The sum of the faces.  Thus, X([3,4,2]) = 9

·        The product of the faces:  For instance, X([1,3,6])=18

·        The log of the sum of the squares of the faces:  e.g., X([3,4,2]) = log(3*3+4*4+2*2).

·        The value of the face of the first die.  Thus, X([3,4,1])=3=X([5,6,6]).

 

In order to understand a random variable, it is helpful to know how frequently it takes on each of its possible values.  That is, we want to graph its possible values on the x-axis and the probability that it takes on that value on the y-axis.  For instance, suppose our sample space is generated by rolling two fair 4-sided dice, and our random variable is taking the sum of their faces.   The random variable outputs values 2, 3, 4, 5, 6,  7 or 8.  There is only one way to roll 2, so its probability is (1/4)*(1/4)=1/16.  However, we can roll either [1,2] or [2,1] to generated a 3 as output, and each roll has probability 1/16, so the probability of outputing 3 is 1/16 + 1/16 = 1/8.  Continuing in this fashion and graphing the result, we get:

Expected values

Although graphing a random variable helps us understand it intuitively, we need more compact ways to summarize it’s graph.  In practice, most random variables are like the one shown above in that they rise to a peak then fall.   That is, they are a hill.  To summarize such functions, the two most important attributes are the position of the hill along the x-axis and how steep/flat it is.   To characterize its position, we can take the average of its output values, weighted by their probability of occurring.  This is called the expected value of the random variable:  E(X) = sum over sÎS of p(s)*X(s). 

 

 

Practice

 

Suppose our sample space is generated by rolling two fair 4-sided dice, and our random variable is taking the sum of their faces.   What is the expected value of the random variable? E(X) = (1/16)*2 + (2/16)*3 + (3/16)*4 + (4/16)*5 + (3/16)*6 + (2/16)*7 + (1/16)*8 = 80/16 = 5.

 

Tricks for calculating expected values

 

Since a random variable is just a real-valued function, we can defined other functions in terms of it.  For instance, we can define X(s) as the sum of two other random variables: X(s)=X1(s)+X2(s).  Suppose we want to know the expected value of such a sum.  As shown in the book, E(X)=E(X1)+E(X2).  Similarly, if X(s) = a*X1(s) + b then E(X) = a*E(X1)+b.  These theorems can be used to calculate E(X) when it can be decomposed into a weighted sum of simpler random variables. 

 

Note that in general, E(XY)¹E(X)*E(Y).   This holds only for special cases (see below), but not in general.

 

Suppose the sample space is generated by n Bernoulli trials with a probability p of success.  Suppose our random variable is the number of successes.   Then the books shows that E(X) = np.

 

Practice

 

Suppose our sample space is tossing a 4-sided die and a 6-sided die.  Suppose our random variable X is the sum of the values on the faces.  What is E(X)?  Let X1 be the value on the 4 sided die and let X2 be the value on the 6-sided die.  Then X(s)=X1(s)+X2(s) so E(X) = E(X1)+E(X2) = [(1/4)*1 + (1/4)*2 + (1/4)*3 + (1/4)*4] + [(1/6)*1 + (1/6)*2…] = (1/4)*10 + (1/6)*21 = 2.5+3.5 = 6.

 

What is the probability that heads will come up if a fair coin is flipped 10 times?  This is a Bernoulli trial with n=10 and p=0.5 so E(X)=np=10*0.5=5

 

What is the expected number of times that a 6 will appear when a fair die is rolled 10 times?  This is a Bernoulli trial with n=10 and p=1/6, so E(X) = 10/6=1.67

Variance

As I said above, we want to characterize our hill-shaped random variables both by their position along the x-axis (= expected value) and their steepness.   For steepness, we use the variance of the random variable, which is denoted V(x) and defined as V(X) = sum over sÎS of p(s)*[X(s)-E(X)]2.  

 

That is, it is the weighted sum of the squares of the distances of the values of X from the expected value of X.  We use the difference from the expected value because that gets bigger when the hill gets flatter.  We take the square of the difference to make it positive. 

 

However, taking the square means that the variance is in different units, so to speak, than the expected value.  To put it back into the same units, we can take the square root of the variance.  This is called the standard deviation and denote with the Greek letter sigma: s(X) = sqrt(V(X)).

 

Practice

 

Suppose our sample space is generated by rolling two fair 4-sided dice, and our random variable is taking the sum of their faces.   What are the variance and standard deviation of the random variable?  We already found out that E(X)=5, so  V(X) = (1/16)*[2-5]2 + (2/16)*[3-5]2 + (3/16)*[4-5]2 + (4/16)*[5-5]2 + (3/16)*[6-5]2 + (2/16)*[7-5]2 + (1/16)*[8-5]2 = 40/16 = 2.5  s(X) = sqrt(V(X) = sqrt(2.5) = 1.58. 

 

Tricks for calculating Variance

As shown in the book, V(X) = E(X2) – E(X)2.  This can be useful for calculating the variance, although one must still do a long sum for E(X2) so it doesn’t save all that much work.

 

Suppose the sample space is generated by n Bernoulli trials with a probability p of success.  Suppose our random variable is the number of successes.   Then the books shows that  V(X) = np(1-p).

 

Practice

 

Suppose our sample space is flipping a weighted coin 20 times.  The coin is weighted so that the probability of heads is 0.6.  Suppose our random variable the number of heads.  What is its variance?  This is a Benoulli trial with n=20 and p=0.6, so V(X) = np(1-p)=20*0.6*0.4= 4.8

 

Average-case computational complexity

Suppose a computation’s running time depends on the input to the program.  Suppose different inputs occur with different probability.  Then the inputs are a sample space, and the running time is a random variable defined on it.  We can characterize the running time by graphing it or by calculating its expected value and variance.  The latter are more compact and hence preferred.

Chebyshev’s inequality

Knowing the expected value and variance of a random variable provides a lot of information about it.  In fact, we can estimate the values of the whole function given just those two numbers by using the Chebyshev inequality:  p(|X(s) –E(X)|³r) £ V(X)/r2.

 

Practice

 

Suppose the number of tin cans recycleed in a day at a recycling center is a random variable with an expected value of 50,000 and a variance of 2,500.   Provide an upper bound on the probability that the company will process more than 60,000 cans on a certain day or less than 40,000.   That is, we want the maximum value that p(X(s)³60,000 or X(s) £ 40,000) = p(|X(s)-E(X)|³10,000) £V(X)/(10,000)2 = 2,500/108