# Frequentist Inference II – NHST II

This here shall be a summary of the most common significance tests.

## Designing NHST

1. specify $H_{ 0 }$ and $H_{ 1 }$
2. choose test statistic of which we know the null distribution and alternative distribution
3. specify rejection region, significance level and decide if the rejection region is one or two-sided
4. compute power using the alternative distribution(s)

## Running a NHST

1. collect data and compute test statistic X
2. Check if X is in rejection region (p<$\alpha$)

## Common significance tests

z-test

• Use: Compare data mean to the mean of our hypothesis
• Data: x_{ 1 },…,x_{ n }
• Assumptions: $x_{ i } \sim N(\mu, \sigma^{ 2 })$ where the mean is unknown but the variance is known
• $H_{ 0 }$: $\mu=\mu_{ 0 }$ for a specified value $\mu_{ 0 }$.
• $H_{ A }$: $\mu\neq\mu_{ 0 } \;or\; \mu < \mu_{ o } \;or\; \mu>\mu_{ 0 }$
• test statistic: $z=\frac{ \overline{ x }-\mu_{ 0 } }{ \frac{ \sigma }{ \sqrt{ n } } }$
• null distribution: $f(z|H_{ 0 })$ is pdf of Z~N(0,1)
• p-value: The p-value is calculated like always

One-Sample t-Test for the mean

• Use: Compare the data mean to the mean of our hypothesis
• Data: $x_{ 1 },...,x_{ n }$
• Assumptions: $x_{ i }\sim N(\mu,\sigma^{ 2 })$ where both the mean and the variance is unknown.
• $H_{ o }$: $\mu=\mu_{ 0 }$ for a specific value $\mu_{ 0 }$
• $H_{ A }$: $\mu\neq\mu_{ 0 } \;or\; \mu < \mu_{ o } \;or\; \mu>\mu_{ 0 }$
• test statistic: $t=\frac{ \overline{ x }-\mu_{ 0 } }{ \frac{ s }{ \sqrt{ n } } }$ where $s^{ 2 }=\frac{ 1 }{ n-1 }\sum_{ i=1 }^{ n }{ (x_{ i }-\overline{ x })^{ 2 } }$
• null distribution: $f(t|H_{ 0 })$ is the pdf of T~t(n-1)
• p-values: the p-value is calculated like always

Two-Sample t-Test for comparing means (assuming equal variance)

• Use: Compare the sample means of two groups
• Data: $x_{ 1 },...,x_{ n }$ and $y_{ 1 },...,y_{ m }$
• Assumptions: $x_{ i }\sim N(\mu_{ x },\sigma^{ 2 })$ and $x_{ j }\sim N(\mu_{ y },\sigma^{ 2 })$. The means and the variance is unknown but the variance of the two groups is the same.
• $H_{ 0 }$: $\mu_{ x }=\mu_{ y }$
• $H_{ A }$: $\mu_{ x }\neq\mu_{ y } \;or\; \mu_{ x } < \mu_{ y } \;or\; \mu_{ x }>\mu_{ y }$
• test statistic: $t=\frac{ \overline{ x }-\overline{ y } }{ s_{ p } }$ where $s_{ p }^{ 2 }=\frac{ (n-1)s_{ x }^{ 2 }+(m-1)s_{ y }^{ 2 } }{ n+m-2 }(\frac{ 1 }{ n }+\frac{ 1 }{ m })$
• null distribution: $f(|H_{ 0 })$ is the pdf of T~t(n+m-2)
• p-values: the p-value is calculated like always

One-way ANOVA (F-test for equal means)

• Use: Compare the means of n groups each with m data points
• Data:
• $x_{ 1,1 },...,x_{ 1,m }$
• $x_{ 2,1 },...,x_{ 2,m }$
• $x_{ n,1 },...,x_{ n,m }$
• Assumptions: all groups follow a normal distribution with $\mu_{ n }$ and $\sigma^{ 2 }$ where the means are possibly different but the variance is the same for all groups.
• $H_{ 0 }$: $\mu_{ 1 }=\mu_{ 2 }=...=\mu_{ n }$ all means are the same.
• $H_{ A }$: Not all means are the same.
• test statistic: $w=\frac{ MS_{ B } }{ MS_{ w } }$ where:
•  $\overline{ x }_{ i }$ is the group mean; $\overline{ x }_{ i }=\frac{ \sum_{ j=1 }^{ m }{ x_{ j } } }{ m }$
• $\overline{ x }$ is the mean of all groups; $\overline{ x }=\frac{ \sum_{ i=1 }^{ n }{ \overline{ x }_{ i } } }{ n }$
• $s_{ i }^{ 2 }$ is the sample variance of the group
• $MS_{ B }$ is the variance between the group means; $MS_{ B } = \frac{ m }{ n-1 }\sum_{ i=1 }^{ n }{ (\overline{ x }_{ i }-\overline{ x })^{ 2 } }$
• $MS_{ w }$ is the sample mean of all the sample variances; $MS_{ w }=\frac{ \sum_{ i=1 }^{ n }{ s_{ i }^{ 2 } } }{ n }$
• null distribution: The null distribution $f(w|H_{ 0 })$ is the pdf of W~F(n-1, n(m-1)) with n-1 and n(m-1) degrees of freedom.
• p-values: the p-value is calculated like always.

Chi-Square test for goodness of fit

• Use: Test if discrete data fits a finite probability mass function.
• Data: for each outcome $\omega_{ i }$ an observed outcome $O_{ i }$
• Assumptions: none
• $H_{ 0 }$: The data was drawn from a specific discrete distribution
• $H_{ A }$: The data was drawn from a different distribution
• test statistic:
• Likelihood ratio statistic: $G=2\sum{ O_{ i }ln(\frac{ O_{ i } }{ E_{ i } }) }$
• Pearson’s Chi-Square statistic: $X^{ 2 }=\sum{ \frac{ (O_{ i }-E_{ i })^{ 2 } }{ E_{ i } } }$
• null distribution: $f(G|H_{ o })$ and $f(X^{ 2 }|H_{ 0 })$ have approximately the same pdf as $Y\sim X^{ 2 }(df)$ where the degrees of freedom are the number of outcomes minus the number of parameters we have to calculate. That can be the mean, or any other thing.
• p-value: the p-value is calculated like always.