Frequentist Inference I – Null Hypothesis Significance Testing

Beside Bayesian Inference there is Frequentist Inference, a well established way of making assumptions about an experiment. Frequentist Inference is especially used in medical research and similar fields, while Bayesian inference is mostly used in modern technology and had a comeback when computers became more powerful.

Null Hypothesis Significance Test (NHST)

Suppose a friend tosses a coin 100 times and asks you afterwards if you think that the coin is fair or not. What would you answer when 95 heads show up? When 70 heads show up? When 53 heads show up? You might say that the first two outcomes are certainly not from a fair coin, but if you think about it more carefully you will figure out that there is a chance to have 70 or even 95 heads in 100 tosses even when the coin is fair. The question should therefore rather be: How certainly can I say if the coin is fair or not. The Null Hypothesis Significance Test helps us in this case. It answers the question if the data is well outside the region where we would expect it.

Before we can start with NHST we have to define a few things:

• $H_{ 0 }$: Null Hypothesis; The default assumption for the model generating the data.
• $H_{ A }$: Alternative Hypothesis; The hypothesis that must be accepted when the null hypothesis is rejected.
• X: Is the test statistic and is computed from the data.
• Rejection Region: If X is in the Rejection Region we accept $H_{ A }$
• ”Acceptance Region”: If X is in the “Acceptance Region” we do not reject $H_{ 0 }$.

Example: Lets simplify our Experiment. Suppose we toss a coin 10 times and have to decide afterwards if the coin is fair or not. We then have the following assumptions:

• $\theta$ is the probability that the coin lands heads.
• $H_{ 0 }$: “The coin is fair”; $\theta = 0.5$
• $H_{ A }$: “The coin is not fair”; $\theta \neq 0.5$
• X: “Number of heads in 10 tosses”
• Null Distribution: $p(x|\theta=0.5)\sim Bin(10,0.5)$

And the following table:

 X 0 1 2 3 4 5 6 7 8 9 10 p(x|$H_{ 0 }$) 0.001 0.01 0.044 0.117 0.205 0.246 0.205 0.117 0.044 0.01 0.001

Now Let our rejection region be R{0, 1, 2, 8, 9, 10}. So if we have less than three or more than 7 heads, we reject our null hypothesis and suppose that the coin is not fair.

Simple Hypothesis: The null hypothesis from above is a simple hypothesis because the hypothesis and the “hypothesis distribution” is completely specified. In our case because $\theta = 0.5$

Composite Hypothesis: The alternative hypothesis from above is a composite hypothesis because it is not fully specified. That is because we just know that $\theta \neq 0.5$ but we don’t really know which value $\theta$ has.

Types of error

There are two types of error for an NHST.

• Type I error: False rejection of $H_{ 0 }$; We reject our null hypothesis though it is “correct”.
• Type II error: False acceptance of $H_{ 0 }$; We accept or null hypothesis though it is “false”.

Significance Level and Power

From the types of errors above we can derive the significance level and the power.

• Significance Level: The probability that we incorrectly reject $H_{ 0 }$. We therefore have in more mathematical terms: P(reject $H_{ 0 }$|$H_{ 0 }$)=P(type I error)
• Significance Level is the probability of thinking there is something interesting, something unusual, when there is not.
• Power: The probability that we correctly reject $H_{ 0 }$. P(reject $H_{ 0 }$|$H_{ A }$)
• Power is the probability of detecting something interesting, something unusual, when there is indeed something interesting, unusual.

We always try to have a small significance level and a large power.

P-Values

Working with p-values makes the significance test often easy.

The p-value gives is the probability, assuming $H_{ 0 }$, of seeing data at least as extreme as our experimental data. If the p-value is less than our significance level $\alpha$, we reject $H_{ 0 }$.

Z-Test for normal hypothesis

One example for p-values is the z-test. To do the z-test we first have to define $H_{ 0 }$ and $H_{ A }$. They are often normal-distributed. We then take the sample mean of our experimental data and standardise it while we assume that $H_{ 0 }$ is true.

$Z_{ n }=\frac{ \overline{ x }_{ n }-\mu }{ \frac{ \sigma }{ n } }$

We then calculate the p-value:

$P(\overline{ x }>\overline{ x }_{ n }|H_{ 0 })=P(z>z_{ n })=1-P(z\le z_{ n })$

t-Tests

Another way of NHST is a t-test. Like many other tests, the t-test assumes data from a normal distribution. We should alway test if that applies. Either through plotting the data or by applying a normality test.

The t-test has the student-t-distribution as its basis: T~t(df) where df are the degrees of freedom.

One Sample t-Test

We use the z-test from above when we have unknown $\mu$ but known $\sigma$. But what do we do when we don’t know $\sigma$. Then we use the one-sample t-test and estimate $\sigma$.

We assume $x_{ 1 },...,x_{ n }\sim N(\mu,\sigma^{ 2 })$ where both the mean and the variance are unknown.

Our null hypothesis is then that $\mu=\mu_{ o }$ for some specific value $\mu_{ 0 }$.

We calculate the test-statistic with: $t=\frac{ \overline{ x }-\mu_{ 0 } }{ \frac{ s }{ \sqrt{ n } } }$, where $s^{ 2 }=\frac{ 1 }{ n-1 }\sum_{ i=1 }^{ n }{ (x_{ i }-\overline{ x })^{ 2 } }$

t is the standardised mean and $s^{ 2 }$ is the sample variance.

Our null distribution $f(t|H_{ o })$ is then the pdf of t(n-1) (t-distribution with n-1 degrees of freedom).

Using this information we can calculate the p-value and either reject or not reject the null hypothesis

Example:

• Data: $x_{ 1 },...,x_{ n } \sim N(\mu,\sigma^{ 2 })$
• $\mu$ and $\sigma$ are unknown but we know that the data follows a normal distribution.
• $H_{ 0 }: \mu=0 \; H_{ A }: \mu>0 \; \alpha = 0.05$

• $\overline { x }=2.2$
• We don’t know the mean nor the variance so we use the one-sample t-test:
• $s^{ 2 }=\frac{ 1 }{ 4 }((1-2.2)^{ 2 }+(2-2.2)^{ 2 }+(3-2.2)^{ 2 }+(6-2.2)^{ 2 }+(-1-2.2)^{ 2 })=6.7$
• $t=\frac{ 2.2 -0 }{ \frac{ \sqrt{ 6.7 } }{ \sqrt{ 5 } } }=0.065 > \alpha$ So we do not reject our null hypothesis.

Two Sample t-Test

The two sample t-test allows us to compare the means of two samples that have the same variance but we don’t know the variance.

• We assume $x_{ 1 },...,x_{ n } \sim N(\mu_{ x },\sigma^{ 2 })$ and $y_{ 1 },...,y_{ m } \sim N(\mu_{ y },\sigma^{ 2 })$
• $H_{ 0 }: \mu_{ x }=\mu_{ y }$
• test statistic: $\frac{ \overline{ x }-\overline{ y } }{ s_{ p } }$ where $s_{ p }^{ 2 }$ is the pooled variance.
• $s_{ p }^{ n }=\frac{ (n-1)s_{ x }^{ 2 }+(m-1)s_{y}^{ 2 } }{ n+m-2 }(\frac{ 1 }{ n }+\frac{ 1 }{ m })$ where $s_{ x }^{2}, s_{ y }^{ 2 }$ are the sample variances.
• Our null hypothesis $f(t|H_{ 0 })$ is then pdf of T~t(n+m-2)
• We can calculate the p-value like always.