Introduction V – Discrete Random Variables

Step by Step

Our journey into the world of data; Our journey to our destination, to mastering data and probability, to being able to predict things and find correlations, to write algorithms that learn isn’t the easiest journey but going it step by step makes it easier than if we would skip some steps. Our next step are the discrete random variables. Discrete means here the same as in discrete sample spaces. We can have a finite or infinite number of outcomes.

A discrete random variable is a variable that takes a discrete set of values

If X is a discrete random variable, why is X random then? X is random because it’s value, the number it gets assigned to, depends on a random outcome.

If X is a random variable and a is an event then X=a means X(\omega )=a for all \omega that are in the set of event a.

Probability Mass Function (PMF)

The Probability Mass Function assigns every event a probability. If a is an event then:

  • p(a) = P(X=a)
  • 0 \le p(a) \le 1
  • if a is a value the discrete random variable X never takes, then p(a)=0

Example: Suppose we roll a six sided die twice. Let X(i, j) be the minimum of the two outcomes: X(i, j) = min(i, j);  so X = 2 if the two outcomes are (5,2)

  • X=1; p(1)=\frac { 11 }{ 36 }
  • X=2; p(2)=\frac { 9 }{ 36 }
  • X=3; p(3)=\frac { 7 }{ 36 }
  • X=4; p(4)=\frac { 5 }{ 36 }
  • X=5; p(5)=\frac { 3 }{ 36 }
  • X=6; p(6)=\frac { 1 }{ 36 }


The inequality X\le a is a set of all outcomes \omega so X(\omega )\le a.

In the example above P(X\le 3) = \frac { 27 }{ 36 }

Cumulative Distribution Function (CDF)

The cumulative distribution function gives the probability that X\le a:

F(a) = P(X\le a)

To calculate the CDF one just sums up all probabilities up to a:

\sum _{ j=1 }^{ a }{ P(X=j) }

The CDF follows four rules:

  1.  0 \le F(a) \le 1
  2. \lim _{ a\rightarrow \infty  }{ F(a)=1 }
  3. \lim _{ a\rightarrow -\infty  }{ F(a)=0 }
  4. F is non decreasing

Expected Value \mu

The expected value, also mean, average or \mu (“mu”), is the outcome one expects when one does the experiment. It is the point where the distribution is balanced and the best way to summarise a distribution in one number:

E(X) = \sum _{ j=1 }^{ n }{ p(x_{ j } )x_{ j }}


  • If X and Y are random variables on \Omega : E(X+Y) = E(X) + E(Y)
  • If a and b are constants: E(aX + b) = aE(X) + b
  • E(h(X)) =\sum _{ j=1 }^{ n }{ h(x_{ j })p(x_{ j }) }

Variance \sigma^{ 2 }

The expected value might be the best way to summarise a distribution in one number but the expected values doesn’t tell us how spread the distribution around the mean is. The Variance tells us exactly that. But we often rather use the Standard Deviation \sigma  because the standard deviation is in the same units as X (random variable). The Variance on the other hand is in the same units as the square of X.

Var(X)=\sigma^{ 2 } = E((x-\mu)^{ 2 }) = \sum_{ j }{ p(x_{ j })(x-\mu)^{ 2 } }

Std(X)=\sigma=\sqrt { Var(X) }


  • If X and Y are Independent: Var(X+Y)=Var(X)+Var(Y)
  • If a and b are constants: Var(aX+b)=a^{ 2 }Var(X)
  • Var(X) = E(X^{ 2 }) - E(X)^{ 2 }



One thought on “Introduction V – Discrete Random Variables

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s