8 Chi-square

The chi-square ( $χ^{2}$ ) compares observed categorical frequencies to frequencies expected due to random chance alone.

Two primary uses: Tests of independence of observations Goodness of fit Confusion matrix/contingency table: Create a table of observed frequencies Calculate column and row totals Create expected frequencies1 Calculate ∆2 b/w observed & expected Sum squared differences/expected2 Compare observed χ² to critical3

https://www.statsdirect.com/help/chi_square_tests/22.htm https://www.youtube.com/watch?v=mSNoAODXD5c https://rforhr.com/disparateimpact.html

Expected is the (observed row sum * observed column sum)/grand total χ² formula = sum(Observed – Expected)^2/Expected) Can use observed matrix directing in R with {chisq.test(observed, correct=FALSE)} or calculate in Excel manually using {=CHISQ.DIST.RT(observed chi-square,1)} Males were not more likely than females to have passed the test, χ2(1) = 1.03, p = .31. Yates continuity correction accounts for the fact that 2x2 tables, as is the case here, tends to be upwardly biased, but it’s debatable if it needs to be applied. Fisher’s exact test is another test required when sample size is small (N < 30) but not often necessary as we typically work with larger sample sizes in IO. The critical value for a χ² with 1 degree of freedom is 3.84, so observed χ² values greater than this are statistically significant.

8.1 Chi-square Distribution

Outliers: values outside the typical range of scores Skewness: measure of the asymmetry of a distribution Skewed distributions are common and violate statistical assumptions Lopsided distribution of values or scores influenced by outliers Negatively (left) skewed (e.g., job satisfaction ratings) Positively (right) skewed (e.g., income) Kurtosis: measures flatness/peakedness of the distribution

8.2 Calculating p-values

The p-value is calculated from the $χ^{2}$ , where the values at 3.84 or greater are staistically significant. In this example, the obtained value is greater than the critical value.

8.3 Goodness of Fit Example

This is an example using a single variable the easy way.

Data importing from my validation dataset project: ajthurston.com/validation

dat <- read.csv("../data/validation.csv", header = T)
table(dat$male)


  0   1 
109 214

observed <- c(females = 109, males = 214)
chisq.test(observed)


    Chi-squared test for given probabilities

data:  observed
X-squared = 34.133, df = 1, p-value = 5.147e-09

And here’s doing so manually.

observed <- c(females = 109, males = 214)
expected <- c(females = 161.5, females = 161.5)

gof <- list(
  chisq = sum((observed-expected)^2/expected),
  df = length(observed)-1
)
gof$p <- pchisq(gof$chisq, df = gof$df, lower.tail = FALSE)
gof

$chisq
[1] 34.13313

$df
[1] 1

$p
[1] 5.146758e-09

But expected frequencies are not always evenly split, so we can also test our observed frequencies against a known proportion. Here, I’m testing if the number of veterans in the validation dataset are representative of the national veteran rate of 6%.

summarytools::freq(dat$vet)

Frequencies  
dat$vet  
Type: Integer  

              Freq   % Valid   % Valid Cum.   % Total   % Total Cum.
----------- ------ --------- -------------- --------- --------------
          0    290     89.78          89.78     89.78          89.78
          1     33     10.22         100.00     10.22         100.00
       <NA>      0                               0.00         100.00
      Total    323    100.00         100.00    100.00         100.00

observed <- c(civ = 290, vet = 33)
chisq.test(observed, p = c(.94,.06))


    Chi-squared test for given probabilities

data:  observed
X-squared = 10.183, df = 1, p-value = 0.001417

8.4 Yates’ Continuity Correction

The Pearson chi-square test compares observed and expected frequencies using a continuous distribution ( $χ^{2}$ ), but the actual data are discrete counts. For small sample sizes in a 2×2 contingency table, this mismatch can inflate the chi-square statistic, making results appear more significant than they should.

To address this, Yates (1934) introduced a continuity correction. The adjustment subtracts 0.5 from the absolute difference between observed ( $O$ ) and expected ( $E$ ) counts before squaring:

$χ_{Y a t e s}^{2} = \sum \frac{(| O - E | - 0.5)^{2}}{E}$

This correction reduces the chi-square value, producing a more conservative test. In R, chisq.test() applies Yates’ correction by default when analyzing a 2×2 table and correct = TRUE. For larger tables, or when correct = FALSE, the standard Pearson statistic is used.

Only applies to 2×2 contingency tables
Reduces Type I error risk with small samples
Often overly conservative when cell counts are moderate to large
Fisher’s exact test is generally preferred when expected counts fall below 5

8.5 Test of Independence Example

This is an example using a observed contingency table.

dat |> count(male,vet)

  male vet   n
1    0   0  93
2    0   1  16
3    1   0 197
4    1   1  17

Based on these counts, we can craft a contingency table or confusion matrix to test for independence, that is, if the veteran rates is independent of sex in this sample.

observed <- matrix(data = c(93,16,197,17), nrow = 2, ncol = 2)
chisq.test(observed)


    Pearson's Chi-squared test with Yates' continuity correction

data:  observed
X-squared = 2.8746, df = 1, p-value = 0.08999

You can also test directly

chisq.test(x = dat$male, y = dat$vet, correct = FALSE)


    Pearson's Chi-squared test

data:  dat$male and dat$vet
X-squared = 3.5711, df = 1, p-value = 0.05879

So here we have failed to reject the null hypothesis ( $H_{0}$ ) that veteran status and sex are independent.

8.6 Using the Generalized Linear Model

model <- glm(vet ~ male, data = dat)
summary(model)


Call:
glm(formula = vet ~ male, data = dat)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.14679    0.02894   5.072 6.66e-07 ***
male        -0.06735    0.03555  -1.894   0.0591 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.09128009)

    Null deviance: 29.628  on 322  degrees of freedom
Residual deviance: 29.301  on 321  degrees of freedom
AIC: 147.42

Number of Fisher Scoring iterations: 2

Since the $t$ distribution with 1 degree of freedom is comparable to $χ^{2}$ with 1 degree of freedom, we can square the t value from the GLM result and compare:

-1.894*-1.894

[1] 3.587236

8.7

observed = matrix(data = c(58, 30, 50, 50, 24, 58), nrow = 2, ncol = 3, byrow = T)
observed

     [,1] [,2] [,3]
[1,]   58   30   50
[2,]   50   24   58

chisq.test(observed, correct=FALSE)


    Pearson's Chi-squared test

data:  observed
X-squared = 1.7194, df = 2, p-value = 0.4233

dfs <- 9
chi2 = seq(0,20,.01)
df_p <- matrix(ncol = dfs, nrow = length(chi2))

for(i in 1:dfs){
  df_p[,i] <- pchisq(chi2, df = i, lower.tail = F)
}

df_p <- data.frame(df_p)
colnames(df_p) <- 1:dfs
df <- cbind(chi2, df_p)
df <- gather(data = df, key = dfs, value = p, 2:10)

p <- ggplot(df, aes(x = chi2, y = p, color = dfs))
p <- p + geom_line()
p <- p + scale_x_continuous(name = expression(chi^2), expand = c(0,0), breaks = c(0:20))
p <- p + scale_y_continuous(name = "p-value", expand = c(0,0), limits = c(0,1), breaks = seq(0,1,.25))
p <- p + theme_minimal()
p <- p + theme(panel.grid.minor = element_blank()
)
p