# Bayesian Inference I – Maximum Likelihood Estimate

There are times when we don’t know the values of parameters. The maximum likelihood estimate and following methods will enable us the estimate the values of parameters.

## Maximum Likelihood Estimate (MLE)

### What is the maximum likelihood estimate?

The maximum likelihood estimate gives us the biggest probability for an experiment it tells us therefore for which value of the parameter the outcome has the biggest probability. Furthermore it is a point estimate as it tells us just the probability in one point and doesn’t gives us an interval.

Definiton: Given data the Maximum likelihood Estimate for the parameter p is the value of p that maximizes the likelihood P(data|p).
We denote $\hat { p }$ the maximum likelihood estimate of p.

Example: Suppose we flip a coin 100 times and get 55 heads. What is the MLE for p?

Answer: A coin flip has just two outcomes so the experiment of flipping a coin 100 times is binomial distributed. We therefore get:

$P(55\;heads|p)=(\begin{matrix} 100 \\ 55 \end{matrix})p^{ 55 }(1-p)^{ 45 }$

The function is a function of p and we can get the maximum by taking the derivative and setting it equal to 0.

$\frac{ d }{ dp }=(\begin{matrix} 100 \\ 55 \end{matrix})(55p^{ 54 }(1-p)^{ 45 }-45p^{ 55 }(1-p)^{ 44 })=0\Rightarrow 55p^{ 54 }(1-p)^{ 45 }=45p^{ 55 }(1-p)^{ 44 }\Rightarrow 55(1-p)=45p\Rightarrow 55=100p\Rightarrow \hat { p }=0.55$

$\hat { p }=0.55$ makes somewhat sense as we had 55 heads in 100 trials.

### Log likelihood

As you probably noticed, the maximum likelihood calculation above is quite computational. We can reduce the amount of calculations by taking the natural logarithm. We then called it the Log likelihood (surprisingly).

$P(55\;heads|p)=ln((\begin{matrix} 100 \\ 55 \end{matrix}))+55ln(p)+45ln(1-p)$

$\frac{ d }{ dp }=\frac{ 55 }{ p }+\frac{ 45 }{ 1-p }=0\Rightarrow 55(1-p)=45p\Rightarrow\hat { p } = 0.55$

Taking the natural logarithm can therefore make the computation less difficult.

### Continuous Case

Let us now come to the continuous case. We basically do the same as before but take the PDF instead of the PMF.

Example: Suppose the lifetime of banana brand phones is exponential distributed. We don’t know the parameter $\lambda$ but we have information of five friends who own a banana brand phone. The lifetimes of their phones were 3, 3, 1, 2 and 4 years, respectively. What is the MLE for $\lambda$?

Answer: We assume that the lifetimes of the phones were independent so their joint PDF is just the product of each individual PDF $\lambda e^{ -\lambda x }$. We then get:

$f(x_{ 1 }, x_{ 2 }, x_{ 3 }, x_{ 4 }, x_{ 5 })=\lambda^{ 5 } e^{ -\lambda (x_{ 1 } + x_{ 2 } + x_{ 3 } + x_{ 4 } + x_{ 5 }) }$

As $x_{ 1 }=3,\;x_{ 2 }=3,\;x_{ 3 }=1,\;x_{ 4 }=2,\;x_{ 5 }=4$, we get:

$f(3, 3, 1, 2, 4)=\lambda^{ 5 } e^{ -13\lambda }$ or $ln(f(3, 3, 1, 2, 4))=5ln(\lambda)-13\lambda$

We can than calculate the MLE for $\lambda$:

$\frac{ d }{ d\lambda }=\frac{ 5 }{ \lambda }-13=0 \Rightarrow \hat { \lambda }=\frac{ 5 }{ 13 }$

### Capture / Recapture Method

One method that uses the MLE is the capture / recapture method which allows us to estimate the size of a population, assuming that each animal in the population is equally likely to be captured.

Example: Suppose we catch 10 birds of the same kind on one day and tag them. After a few month we capture 20 birds of the same kind as before. 4 of them were already tagged. What is the estimated population size?
Answer: We can find the MLE for n with a tiny python script:

from scipy.misc import comb

n = 20
p = (comb(n-10,16)*comb(10,4))/comb(n,20)
while ((comb(n-10,16)*comb(10,4))/comb(n,20)) >= p:
p = (comb(n-10,16)*comb(10,4))/comb(n,20)
n += 1

print ('MLE for n: {}'.format(n-1))



Our python script tells us that the MLE for n is 50, what makes somewhat sense, as $\frac{ 10 }{ 50 }$ is exactly the ratio of tagged animals among the captured animals.

Even though the MLE allows us to estimate parameters we have to be cautious as the MLE is a point estimate and point estimates never tell the whole story. We will soon see situations were we shouldn’t use the MLE or just very carefully.