8 Chi-Square
Two primary uses: Tests of independence of observations Goodness of fit Confusion matrix/contingency table: Create a table of observed frequencies Calculate column and row totals Create expected frequencies1 Calculate ∆2 b/w observed & expected Sum squared differences/expected2 Compare observed χ² to critical3
https://www.statsdirect.com/help/chi_square_tests/22.htm
https://www.youtube.com/watch?v=mSNoAODXD5c https://rforhr.com/disparateimpact.html
Expected is the (observed row sum * observed column sum)/grand total χ² formula = sum(Observed – Expected)^2/Expected) Can use observed matrix directing in R with {chisq.test(observed, correct=FALSE)} or calculate in Excel manually using {=CHISQ.DIST.RT(observed chi-square,1)} Males were not more likely than females to have passed the test, χ2(1) = 1.03, p = .31. Yates continuity correction accounts for the fact that 2x2 tables, as is the case here, tends to be upwardly biased, but it’s debatable if it needs to be applied. Fisher’s exact test is another test required when sample size is small (N < 30) but not often necessary as we typically work with larger sample sizes in IO. The critical value for a χ² with 1 degree of freedom is 3.84, so observed χ² values greater than this are statistically significant.
8.1 Chi-square (\(\chi^2\)) Distribution
Outliers: values outside the typical range of scores Skewness: measure of the asymmetry of a distribution Skewed distributions are common and violate statistical assumptions Lopsided distribution of values or scores influenced by outliers Negatively (left) skewed (e.g., job satisfaction ratings) Positively (right) skewed (e.g., income) Kurtosis: measures flatness/peakedness of the distribution
8.2 An Example
Pearson's Chi-squared test
data: observed
X-squared = 4.7826, df = 1, p-value = 0.02875
8.3 Calculating p-value
The p-value is calculated from the \(\chi^2\), where the values at 3.84 or greater are staistically significant. In this example, the obtained value is greater than the critical value.
8.4 Beyond the 2x2 Matrix
= matrix(data = c(58, 30, 50, 50, 24, 58), nrow = 2, ncol = 3, byrow = T)
observed observed
[,1] [,2] [,3]
[1,] 58 30 50
[2,] 50 24 58
chisq.test(observed, correct=FALSE)
Pearson's Chi-squared test
data: observed
X-squared = 1.7194, df = 2, p-value = 0.4233
<- 9
dfs = seq(0,20,.01)
chi2 <- matrix(ncol = dfs, nrow = length(chi2))
df_p
for(i in 1:dfs){
<- pchisq(chi2, df = i, lower.tail = F)
df_p[,i]
}
<- data.frame(df_p)
df_p colnames(df_p) <- 1:dfs
<- cbind(chi2, df_p)
df <- gather(data = df, key = dfs, value = p, 2:10)
df
<- ggplot(df, aes(x = chi2, y = p, color = dfs))
p <- p + geom_line()
p <- p + scale_x_continuous(name = expression(chi^2), expand = c(0,0), breaks = c(0:20))
p <- p + scale_y_continuous(name = "p-value", expand = c(0,0), limits = c(0,1), breaks = seq(0,1,.25))
p <- p + theme_minimal()
p <- p + theme(panel.grid.minor = element_blank()
p
) p