
8 Chi-square
The chi-square (
Two primary uses: Tests of independence of observations Goodness of fit Confusion matrix/contingency table: Create a table of observed frequencies Calculate column and row totals Create expected frequencies1 Calculate ∆2 b/w observed & expected Sum squared differences/expected2 Compare observed χ² to critical3
https://www.statsdirect.com/help/chi_square_tests/22.htm https://www.youtube.com/watch?v=mSNoAODXD5c https://rforhr.com/disparateimpact.html
Expected is the (observed row sum * observed column sum)/grand total χ² formula = sum(Observed – Expected)^2/Expected) Can use observed matrix directing in R with {chisq.test(observed, correct=FALSE)} or calculate in Excel manually using {=CHISQ.DIST.RT(observed chi-square,1)} Males were not more likely than females to have passed the test, χ2(1) = 1.03, p = .31. Yates continuity correction accounts for the fact that 2x2 tables, as is the case here, tends to be upwardly biased, but it’s debatable if it needs to be applied. Fisher’s exact test is another test required when sample size is small (N < 30) but not often necessary as we typically work with larger sample sizes in IO. The critical value for a χ² with 1 degree of freedom is 3.84, so observed χ² values greater than this are statistically significant.
8.1 Chi-square Distribution
Outliers: values outside the typical range of scores Skewness: measure of the asymmetry of a distribution Skewed distributions are common and violate statistical assumptions Lopsided distribution of values or scores influenced by outliers Negatively (left) skewed (e.g., job satisfaction ratings) Positively (right) skewed (e.g., income) Kurtosis: measures flatness/peakedness of the distribution
8.2 Calculating p-values
The p-value is calculated from the

8.3 Goodness of Fit Example
This is an example using a single variable the easy way.
Data importing from my validation dataset project: ajthurston.com/validation
dat <- read.csv("../data/validation.csv", header = T)
table(dat$male)
0 1
109 214
observed <- c(females = 109, males = 214)
chisq.test(observed)
Chi-squared test for given probabilities
data: observed
X-squared = 34.133, df = 1, p-value = 5.147e-09
And here’s doing so manually.
observed <- c(females = 109, males = 214)
expected <- c(females = 161.5, females = 161.5)
gof <- list(
chisq = sum((observed-expected)^2/expected),
df = length(observed)-1
)
gof$p <- pchisq(gof$chisq, df = gof$df, lower.tail = FALSE)
gof$chisq
[1] 34.13313
$df
[1] 1
$p
[1] 5.146758e-09
But expected frequencies are not always evenly split, so we can also test our observed frequencies against a known proportion. Here, I’m testing if the number of veterans in the validation dataset are representative of the national veteran rate of 6%.
summarytools::freq(dat$vet)Frequencies
dat$vet
Type: Integer
Freq % Valid % Valid Cum. % Total % Total Cum.
----------- ------ --------- -------------- --------- --------------
0 290 89.78 89.78 89.78 89.78
1 33 10.22 100.00 10.22 100.00
<NA> 0 0.00 100.00
Total 323 100.00 100.00 100.00 100.00
observed <- c(civ = 290, vet = 33)
chisq.test(observed, p = c(.94,.06))
Chi-squared test for given probabilities
data: observed
X-squared = 10.183, df = 1, p-value = 0.001417
8.4 Yates’ Continuity Correction
The Pearson chi-square test compares observed and expected frequencies using a continuous distribution (
To address this, Yates (1934) introduced a continuity correction. The adjustment subtracts 0.5 from the absolute difference between observed (
This correction reduces the chi-square value, producing a more conservative test. In R, chisq.test() applies Yates’ correction by default when analyzing a 2×2 table and correct = TRUE. For larger tables, or when correct = FALSE, the standard Pearson statistic is used.
- Only applies to 2×2 contingency tables
- Reduces Type I error risk with small samples
- Often overly conservative when cell counts are moderate to large
- Fisher’s exact test is generally preferred when expected counts fall below 5
8.5 Test of Independence Example
This is an example using a observed contingency table.
dat |> count(male,vet) male vet n
1 0 0 93
2 0 1 16
3 1 0 197
4 1 1 17
Based on these counts, we can craft a contingency table or confusion matrix to test for independence, that is, if the veteran rates is independent of sex in this sample.
observed <- matrix(data = c(93,16,197,17), nrow = 2, ncol = 2)
chisq.test(observed)
Pearson's Chi-squared test with Yates' continuity correction
data: observed
X-squared = 2.8746, df = 1, p-value = 0.08999
You can also test directly
chisq.test(x = dat$male, y = dat$vet, correct = FALSE)
Pearson's Chi-squared test
data: dat$male and dat$vet
X-squared = 3.5711, df = 1, p-value = 0.05879
So here we have failed to reject the null hypothesis (
8.6 Using the Generalized Linear Model
model <- glm(vet ~ male, data = dat)
summary(model)
Call:
glm(formula = vet ~ male, data = dat)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.14679 0.02894 5.072 6.66e-07 ***
male -0.06735 0.03555 -1.894 0.0591 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.09128009)
Null deviance: 29.628 on 322 degrees of freedom
Residual deviance: 29.301 on 321 degrees of freedom
AIC: 147.42
Number of Fisher Scoring iterations: 2
Since the
-1.894*-1.894[1] 3.587236
8.7
observed = matrix(data = c(58, 30, 50, 50, 24, 58), nrow = 2, ncol = 3, byrow = T)
observed [,1] [,2] [,3]
[1,] 58 30 50
[2,] 50 24 58
chisq.test(observed, correct=FALSE)
Pearson's Chi-squared test
data: observed
X-squared = 1.7194, df = 2, p-value = 0.4233
dfs <- 9
chi2 = seq(0,20,.01)
df_p <- matrix(ncol = dfs, nrow = length(chi2))
for(i in 1:dfs){
df_p[,i] <- pchisq(chi2, df = i, lower.tail = F)
}
df_p <- data.frame(df_p)
colnames(df_p) <- 1:dfs
df <- cbind(chi2, df_p)
df <- gather(data = df, key = dfs, value = p, 2:10)
p <- ggplot(df, aes(x = chi2, y = p, color = dfs))
p <- p + geom_line()
p <- p + scale_x_continuous(name = expression(chi^2), expand = c(0,0), breaks = c(0:20))
p <- p + scale_y_continuous(name = "p-value", expand = c(0,0), limits = c(0,1), breaks = seq(0,1,.25))
p <- p + theme_minimal()
p <- p + theme(panel.grid.minor = element_blank()
)
p