Chapter 5 2×2 contingency tables
5.1 Readings and Resources
- Field (2017), Chapter 19
5.2 Example data set
Hraba & Grant (1970) describe a replication of Clark & Clark (1947) in which black and white children from Lincoln, Nebraska were shown dolls that were either black or white. They were then asked a series of questions, including “Give me the doll that is a nice doll.” This data set contains the frequency of children giving the same-race or different race doll in response to this question.
Frequencies of choosing given dolls, by race | ||
---|---|---|
Data from Hraba and Grant (1970) | ||
White child | Black child | |
Same-race doll | 50 | 48 |
Different-race doll | 21 | 41 |
Hraba & Grant (1970) were interested in how race was related to the probability that children would select the doll that looks more like them (a doll that represented either a black or white child). Clark & Clark (1947) had previously shown that black children selected white dolls more often when asked to select the “nice” doll.
Download the Excel/R data set (frequencies)
☁️Open in statscloudDownload the Excel/R data set (long format)
☁️Open in statscloud
We will use the notation below to refer to the cells of the table.
Column 1 | Column 2 | |
---|---|---|
Row 1 | a | b |
Row 2 | c | d |
5.3 Visualization
The most typical visualization for count data such as that in a contingency table is a plot of the proportions of the key response, broken down by group. We plot the observed proportion and the standard errors of the proportions.
In the case of the Hraba and Grant data, we would break down the data by race of the child, and, say, compute the proportion of same-race choices. We make this choice because we are interested in how choices vary by the race of the child. Figure 5.1 shows a common way of depicting the observed proportions.
Figure 5.1: Proportions of same-race dolls selected as ‘nice’ by black and white children. Error bars are standard errors.
The left axis shows odds for reference. Showing the odds is not typical; this is done so that you can compare the values to the ones you’ll compute later in the section on odds ratios.
5.3.0.1 Computing standard errors
To create a plot like the one above, we need to compute the proportions and the standard errors. The proportions are straightforward and I won’t review that. The standard error of a proportion is a bit more complicated. It is:
SE(ˆp)=√ˆp(1−ˆp)N
where ˆp is the estimate of the proportion and N is the total number that the proportion is based on.
❗ Try computing the standard error of the proportion of white children who choose the same-race doll.
For the white children in the Hraba and Grant data, the total number of white children is N=71. The proportion choosing the same race dolls is ˆp=50/71=0.704.
SE(ˆp)=√0.704×(1−0.704)71=√0.003=0.054
5.4 Assessing the effect size
When choosing a measure of effect size to summarize the data in a contingency table, you must first decide how to conceptualize your design. One important question is *whether everyone in the table is part of a single “group” or are they best thought of as in separate groups? Or, put another way, could a single person have ended up in any of the four cells, or only in two?
For many applications, the separate-groups point of view is more appropriate: when we study race, sex, gender, nationality, age—attributes that are more stable—we tend to think of our goal as comparing groups. Think of Hraba and Grant’s design, and the data in Figure~@ref{fig:raceDollsDesc}; it would not made any sense if I had put “Doll selection” on the x axis instead of the children.
In other cases, the single-group point of view is more appropriate: for instance, where every person might choose to do one, or both, of two things. In this case we think of the goal as assessing the association between those two attributes.
This does choice does not affect the test we do, but it does affect how we present our effect. For single-group presentations, measures of association like Yule’s Q are more appropriate.
For separate-group presentations, we directly compare the groups by computing the difference in the proportions or the odds ratio.
5.4.1 Yule’s Q coefficient
Yule’s Q is a kind of correlation or association coefficient for contingency tables. It can be between -1 and 1, with -1 indicating a perfect negative association, 1 indicating a perfect positive association, and 0 indicating no association.
Consider the 2×2 contingency table below with cells shaded below.
Column 1 | Column 2 | |
---|---|---|
Row 1 | a | b |
Row 2 | c | d |
Under the hypothesis of no association, we would expect roughly the same proportion of observations in a (relative to b) as in c (relative to d). If instead of finding observations in c, we find them in d, we have evidence for an association. When there are large values in the shaded cells compared to the non-shaded cells, this indicates a “positive” association between the two factors.
Yule’s Q coefficient explicitly makes use of this expectation.
Q=ad−cbad+cb
The numerator is how large the shaded cells are relative to the unshaded cells. The denominator scales Q by how large it can possibly be (so Q values of -1 and 1 will be perfect associations; this will occur when either cb or ad is very small relative to the other). You can interpret it like a correlation.
❗ Try computing the Yule’s Q coefficient for Hraba and Grant’s results.
For Hraba and Grant’s data, the Yule’s Q coefficient is:
Q=50×41−21×4850×41+21×48=10423058=0.341
You can (roughly) interpret this like a correlation; it is above 0, indicating that a positive relationship. In this case, a “positive” relationship means that cells a and d, the cells in which children chose white dolls are over-represented compared to what we’d expect if children picked a same-race doll equally often.
5.4.2 Difference between proportions
As Figure 5.1 shows, the proportion of white children choosing the same-race doll as “nice” was ˆpwhite=50/71=0.704; among black children, ˆpblack=48/89=0.539 selected the same-race doll as “nice”. The difference between these two proportions is
pdiff=ˆp1−ˆp2
which serves as a measure of how big the difference is. For Hraba and Grant’s data,
pdiff=ˆpwhite−ˆpblack=0.704−0.539=0.165
Put in terms of percentages, white children tended to select the same-race doll as “nice” at a rate about 16.5 percentage points higher than black children did.
5.4.2.1 Confidence interval
We will also include a measure of uncertainty with our point estimate above. We can compute a confidence interval1 using the formula
(pdiff)CI95%=pdiff±1.96×SE(pdiff).
The standard error of pdiff, needed for the above calculation, is just
SE(pdiff)=√SE(ˆp1)2+SE(ˆp2)2
where SE(ˆp) is computed as in Section 5.3.0.1 for each group.
❗ Try computing the 95% confidence interval on the difference in probabilities for Hraba and Grant’s data.
We first compute the standard error of the difference. We already computed the standard error of the white children’s proportion in 5.3.0.1. For black children, the corresponding standard error is 0.053.
The standard error of the difference is thus SE(pdiff)=√0.0542+0.0532=√0.003+0.003=√0.006=0.076
We now have everything we need to compute the 95% confidence interval.
(pdiff)CI95%=pdiff±1.96×SE(pdiff)=0.165±1.96×0.076=0.165±0.148=[0.017,0.313]
The 95% confidence interval runs from very small differences (1 or 2 percentage points) to very large differences (31 percentage points).
5.4.3 Odds ratio
One of the most common summaries of a 2×2 table is the odds ratio. The odds ratio makes sense when either the row or the columns specify “groups” of interest. In the case of the Hraba and Grant data, we are interested in the columns (race of the children) as groups.
Instead of thinking in proportions, we think in odds. We’re interested in the odds of each group ending up in a particular category, and whether this is different across groups.
See also StatsPearls for an explanation of the material below.
5.4.3.1 Point estimate
Each of our seaparate groups gets a separate odds. If the columns in our data table specify our groups (as they do in the Hraba and Grant data above), then the odds for group one (column 1) are just a/c. For group two (column 2) they are b/d.
❗ Try computing the observed odds that a white child selected the same-race doll in Hraba and Grant’s data.
For Hraba and Grant’s data, the observed odds of a white child choosing the same-race doll is 50/21=2.381.
When asked to select the “nice” doll, 2.381 times as many white children selected the same-race (white) doll as selected the different-race (black) doll.
The odds ratio is just the ratio of the odds. The main question is which of our two odds we put into the numerator, and which we put in the denominator. It is easier to interpret odds greater than 1, so we usually put the larger of the two odds in the numerator. If, on the other hand, we had a strong reason to report one of the numbers as a kind of reference group, we might put the reference group in the denominator.
If our columns are our groups, and a/c is the larger of the two odds, then the odds ratio would be
OR=a/cb/d
❗ Try computing the odds ratio for Hraba and Grant’s results.
For Hraba and Grant’s data, the observed odds of a white child choosing the same-race doll is 50/21=2.381. The observed odds of a black child choosing the same-race doll is 48/41=1.171.
The odds ratio is OR=2.3811.171=2.034
The observed odds of a white child choosing a same race doll was 2.034 times higher than a black child. If the groups were the same, we’d expect this number to be about 1.
5.4.3.2 Confidence interval
We can compute a confidence interval for the odds ratio. To do this, we have to note that the odds ratio cannot go below 0 (odds are always positive!) so the sampling distribution of the odds ratio will be skewed. When computing standard errors and confidence intervals, we want to start with a nice, symmetric variable. In this case, we will work with the natural logarithm of the odds ratio, the log-odds-ratio (LOR).
When the OR is 1 (even odds, or both groups show the same odds) the LOR is log(1)=0.
Our point estimate for the log-odds-ratio is just the logarithm of the odds ratio, log(OR):
LOR=log(OR)=log(2.034)=0.71
We now need to find the standard error of the log-odds-ratio. This is: SE(LOR)=√1a+1b+1c+1d
❗ Try computing the standard error of the log-odds-ratio for Hraba and Grant’s results.
For Hraba and Grant data’s data, this is:
SE(LOR)=√150+148+121+141=√0.02+0.021+0.048+0.024=√0.113=0.336
The sampling distribution of the log-odds-ratio is approximately normal, so we can use our 1.96 standard error rule to build a 95% confidence interval.
(LOR)CI95%=LOR±1.96×SE(LOR)
❗ Try computing the 95% confidence interval for the log-odds ratio for Hraba and Grant’s results.
(LOR)CI95%=LOR±1.96×SE(LOR)=0.71±1.96×0.336=0.71±0.658=[0.051,1.368]
This is 95% confidence interval for the log-odds-ratio. We want the confidence interval for the odds ratio. All we have to do now is do the reverse of the logarithm, exponentiation:
(OR)CI95%=[eLLOR,eULOR] where LLOR and ULOR are the lower and upper bound for the LOR confidence interval, respectively.
❗ Try computing the confidence interval on the odds ratio for Hraba and Grant’s results.
(OR)CI95%=[e0.051,e1.368]=[1.053,3.929]
Notice that the lower bound of the 95% confidence interval is very close to 1 (the null hypothesis), implying that we cannot rule out very small effects (at the 5% level). Also note that it does exclude 1, implying that the effect will be just barely significant at the 5% level.
5.5 Assessing the null hypothesis of no relationship
The most common test of association or “no relationship” between the two factors is the χ2 (chi-square) test of independence.
I will not summarize the test here; see Chapter 19 of Field (2017) for details.
❗ Try computing the χ2 test for the Hraba and Grant data.
You should see a table that looks something like the one below (but may have additional rows).
Chi-square Tests | |||
---|---|---|---|
Value | df | Asymp. Sig. (2-sided) |
|
Pearson Chi-square | 4.525 | 1 | 0.033 |
Continuity correction | 3.857 | 1 | 0.050 |
Hraba & Grant (1970) concluded that “[The data] shows that black and white children preferred the doll of their own race. The white children were significantly more ethnocentric [when asked to choose the "nice" doll]…”
5.6 A full report
Your full report should contain a graph with error bars, a measure of association or effect (e.g., Yule’s Q or the odds ratio), a test statistic (e.g., X2), and a p value for the test.
Because a 2×2 contingency table is so compact, you should also include the frequencies themselves in your report as a table.
How you format your report depends somewhat on what statistics you’ve chosen to report.
5.6.1 Yule’s Q
If you’ve chosen Yule’s Q (you’re more interested in a measure of the strength of association of the variables) you can format your report as below.
White children selected same-race dolls more often than black children, suggesting that same-race doll-preference is associated with race. This association was statistically significantat α=0.05 (Yule’s Q=0.341, X2=3.857, p=0.05, with Yates’ continuity correction).
5.6.2 Difference between proportions
If you’ve chosen to report the difference in proportions, you can format your report as below.
White children selected same-race dolls more often than black children, that black and white children differ in same-race doll preference. The difference between black and white children was statistically significant at α=0.05 (pdiff=0.165, CI95%:[0.017,0.313], X2=3.857, p=0.05, with Yates’ continuity correction).
5.6.3 Odds ratio
If you’ve chosen to report an odds ratio, you can format your report as below.
White children selected same-race dolls more often than black children, that black and white children differ in same-race doll preference. The difference between black and white children was statistically significant at α=0.05 (OR=2.034, CI95%:[1.053,3.929], X2=3.857, p=0.05, with Yates’ continuity correction).
Regardless of which statistic is chosen for the report, it would be expected that size of the effect would be discussed in context. For the Hraba and Grant data, looking at the confidence interval reveals that the data appear to be consistent with very small (about 1; essentially even odds) to moderately large (about 4) odds ratios (for a 95% CI; this would be different if we adjusted our confidence coefficient).
5.7 Things to watch out for
5.7.1 Probability vs odds
If your design is two groups, you’ll have to choose between reporting a difference in proportions and an odds ratio (there’s even a third option: relative risk). One issue with odds ratios is that they can be very large even when the absolute probabilities are small.
For instance, the chances of dying by rare events. The US-based National Center for Health Statistics reported that in their database, 1 in 161,856 people died from lightning strike, while 1 in 1,498 died in from fire. The odds ratio for fire vs lightning is 108.12, indicating that you are much more likely to die from fire than by lightning. But you are not likely to die from either. The difference between the two probabilities is only 0.00066.
When choosing to report either odds or probability, make sure that your report is not misleading, and be sure to include tables and graphs so that the reader doesn’t just see the odds ratio or difference, but can also see overall how rare the events are.
5.7.2 Small cell counts
When cell counts are small, it can be the case that not enough is known to produce good estimates of how much the table will vary (that is, we don’t know enough about the statistical uncertainty). Be cautious interpreting the results when cell counts are small.
5.7.3 Dependence between observations
Analyses of contingency tables assumes that each count (observation) is independent from the others. This can be violated, for instance, of observations from the same person appear in more than one cell. Lack of independence invalidates the analysis.
References
Clark, K. B., & Clark, M. K. (1947). Racial identification and preference in negro children. In T. M. Newcomb & E. L. Hartley (Eds.), Readings in social psychology (2nd ed.). New York: Holt.
Field, A. (2017). Discovering statistics using IBM SPSS (5th ed.). SAGE Publications Ltd.
Hraba, J., & Grant, G. (1970). Black is beautiful: A reexamination of racial preference and identification. Journal of Personality and Social Psychology, 16(3), 398–402.
Note that this calculation does not have a continuity correction added, so it will be slightly inconsistent with the usual χ2 test, which does. This is not something to worry too much about; it is unwise to be overly concerned about whether the result is just over, or under, the arbitrary criterion for significance.↩︎