Chapter 5 2×2 contingency tables

5.1 Readings and Resources

  • Field (2017), Chapter 19

5.2 Example data set

Hraba & Grant (1970) describe a replication of Clark & Clark (1947) in which black and white children from Lincoln, Nebraska were shown dolls that were either black or white. They were then asked a series of questions, including “Give me the doll that is a nice doll.” This data set contains the frequency of children giving the same-race or different race doll in response to this question.

Frequencies of choosing given dolls, by race
Data from Hraba and Grant (1970)
White child Black child
Same-race doll 50 48
Different-race doll 21 41

Hraba & Grant (1970) were interested in how race was related to the probability that children would select the doll that looks more like them (a doll that represented either a black or white child). Clark & Clark (1947) had previously shown that black children selected white dolls more often when asked to select the “nice” doll.


The above data set contains the total frequencies as summaries, instead of the raw data where each row would be a separate observation. In order to analyze these data in SPSS, you’ll need to use “Weight cases”. See this video for more details.

We will use the notation below to refer to the cells of the table.

Column 1 Column 2
Row 1 a b
Row 2 c d

5.3 Visualization

The most typical visualization for count data such as that in a contingency table is a plot of the proportions of the key response, broken down by group. We plot the observed proportion and the standard errors of the proportions.

In the case of the Hraba and Grant data, we would break down the data by race of the child, and, say, compute the proportion of same-race choices. We make this choice because we are interested in how choices vary by the race of the child. Figure 5.1 shows a common way of depicting the observed proportions.

Proportions of same-race dolls selected as 'nice' by black and white children. Error bars are standard errors.

Figure 5.1: Proportions of same-race dolls selected as ‘nice’ by black and white children. Error bars are standard errors.

The left axis shows odds for reference. Showing the odds is not typical; this is done so that you can compare the values to the ones you’ll compute later in the section on odds ratios.

SPSS will not easily create a graph like the one above. You can easily create a similar one in Excel, however. You’ll need to use a spreadsheet to create columns for the \(x\) axis label, the proportion, and the standard error. Create the chart, then add custom standard errors based on the column containing the standard errors.

5.3.0.1 Computing standard errors

To create a plot like the one above, we need to compute the proportions and the standard errors. The proportions are straightforward and I won’t review that. The standard error of a proportion is a bit more complicated. It is:

\[ SE(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{N}} \]

where \(\hat{p}\) is the estimate of the proportion and \(N\) is the total number that the proportion is based on.

❗ Try computing the standard error of the proportion of white children who choose the same-race doll.

For the white children in the Hraba and Grant data, the total number of white children is \(N=71\). The proportion choosing the same race dolls is \(\hat{p}=50/71=0.704\).

\[ \begin{eqnarray} SE(\hat{p}) &=& \sqrt{\frac{0.704\times(1-0.704)}{71}}\\ &=& \sqrt{0.003}\\ &=& 0.054 \end{eqnarray} \]

5.4 Assessing the effect size

When choosing a measure of effect size to summarize the data in a contingency table, you must first decide how to conceptualize your design. One important question is *whether everyone in the table is part of a single “group” or are they best thought of as in separate groups? Or, put another way, could a single person have ended up in any of the four cells, or only in two?

For many applications, the separate-groups point of view is more appropriate: when we study race, sex, gender, nationality, age—attributes that are more stable—we tend to think of our goal as comparing groups. Think of Hraba and Grant’s design, and the data in Figure~@ref{fig:raceDollsDesc}; it would not made any sense if I had put “Doll selection” on the \(x\) axis instead of the children.

In other cases, the single-group point of view is more appropriate: for instance, where every person might choose to do one, or both, of two things. In this case we think of the goal as assessing the association between those two attributes.

This does choice does not affect the test we do, but it does affect how we present our effect. For single-group presentations, measures of association like Yule’s \(Q\) are more appropriate.

For separate-group presentations, we directly compare the groups by computing the difference in the proportions or the odds ratio.

5.4.1 Yule’s \(Q\) coefficient

Yule’s \(Q\) is a kind of correlation or association coefficient for contingency tables. It can be between -1 and 1, with -1 indicating a perfect negative association, 1 indicating a perfect positive association, and 0 indicating no association.

Consider the \(2\times 2\) contingency table below with cells shaded below.

Column 1 Column 2
Row 1 a b
Row 2 c d

Under the hypothesis of no association, we would expect roughly the same proportion of observations in a (relative to b) as in c (relative to d). If instead of finding observations in c, we find them in d, we have evidence for an association. When there are large values in the shaded cells compared to the non-shaded cells, this indicates a “positive” association between the two factors.

Yule’s \(Q\) coefficient explicitly makes use of this expectation.

\[ Q = \frac{ad - cb}{ad + cb} \]

The numerator is how large the shaded cells are relative to the unshaded cells. The denominator scales \(Q\) by how large it can possibly be (so \(Q\) values of -1 and 1 will be perfect associations; this will occur when either \(cb\) or \(ad\) is very small relative to the other). You can interpret it like a correlation.

❗ Try computing the Yule’s \(Q\) coefficient for Hraba and Grant’s results.

For Hraba and Grant’s data, the Yule’s \(Q\) coefficient is:

\[ \begin{eqnarray} Q &=& \frac{50\times41 - 21\times48}{50\times41 + 21\times48}\\ &=& \frac{1042}{3058}\\ &=& 0.341 \end{eqnarray} \]

You can (roughly) interpret this like a correlation; it is above 0, indicating that a positive relationship. In this case, a “positive” relationship means that cells \(a\) and \(d\), the cells in which children chose white dolls are over-represented compared to what we’d expect if children picked a same-race doll equally often.

SPSS will not compute Yule’s \(Q\) coefficient, but it is easy to compute in Excel or by hand.

5.4.2 Difference between proportions

As Figure 5.1 shows, the proportion of white children choosing the same-race doll as “nice” was \(\hat{p}_{white}=50/71=0.704\); among black children, \(\hat{p}_{black}=48/89=0.539\) selected the same-race doll as “nice”. The difference between these two proportions is

\[ p_{diff} = \hat{p}_1 - \hat{p}_2 \]

which serves as a measure of how big the difference is. For Hraba and Grant’s data,

\[ \begin{eqnarray} p_{diff} &=& \hat{p}_{white} - \hat{p}_{black}\\ &=& 0.704 - 0.539\\ &=&0.165 \end{eqnarray} \]

Put in terms of percentages, white children tended to select the same-race doll as “nice” at a rate about 16.5 percentage points higher than black children did.

5.4.2.1 Confidence interval

We will also include a measure of uncertainty with our point estimate above. We can compute a confidence interval1 using the formula

\[ (p_{diff}) CI_{95\%} = p_{diff} \pm 1.96\times SE(p_{diff}). \]

The standard error of \(p_{diff}\), needed for the above calculation, is just

\[ SE(p_{diff}) = \sqrt{ SE(\hat{p}_1)^2 + SE(\hat{p}_2)^2} \]

where \(SE(\hat{p})\) is computed as in Section 5.3.0.1 for each group.

❗ Try computing the 95% confidence interval on the difference in probabilities for Hraba and Grant’s data.

We first compute the standard error of the difference. We already computed the standard error of the white children’s proportion in 5.3.0.1. For black children, the corresponding standard error is 0.053.

The standard error of the difference is thus \[ \begin{eqnarray} SE(p_{diff}) &=& \sqrt{0.054^2 + 0.053^2}\\ &=& \sqrt{0.003 + 0.003}\\ &=& \sqrt{0.006}\\ &=& 0.076 \end{eqnarray} \]

We now have everything we need to compute the 95% confidence interval.

\[ \begin{eqnarray} (p_{diff}) CI_{95\%} &=& p_{diff} \pm 1.96\times SE(p_{diff})\\ &=& 0.165 \pm 1.96 \times 0.076\\ &=& 0.165 \pm 0.148\\ &=& [0.017,0.313] \end{eqnarray} \]

The 95% confidence interval runs from very small differences (1 or 2 percentage points) to very large differences (31 percentage points).

5.4.3 Odds ratio

One of the most common summaries of a \(2\times2\) table is the odds ratio. The odds ratio makes sense when either the row or the columns specify “groups” of interest. In the case of the Hraba and Grant data, we are interested in the columns (race of the children) as groups.

Instead of thinking in proportions, we think in odds. We’re interested in the odds of each group ending up in a particular category, and whether this is different across groups.

See also StatsPearls for an explanation of the material below.

SPSS will compute the estimate of the odds ratio and the confidence interval. After you weight your cases (see here), choose Analyze/Descriptive Statistics/Crosstabs…" and under “Statistics…”, choose “Risk”.

5.4.3.1 Point estimate

Each of our seaparate groups gets a separate odds. If the columns in our data table specify our groups (as they do in the Hraba and Grant data above), then the odds for group one (column 1) are just \(a/c\). For group two (column 2) they are \(b/d\).

❗ Try computing the observed odds that a white child selected the same-race doll in Hraba and Grant’s data.

For Hraba and Grant’s data, the observed odds of a white child choosing the same-race doll is \(50/21=2.381\).

When asked to select the “nice” doll, 2.381 times as many white children selected the same-race (white) doll as selected the different-race (black) doll.

The odds ratio is just the ratio of the odds. The main question is which of our two odds we put into the numerator, and which we put in the denominator. It is easier to interpret odds greater than 1, so we usually put the larger of the two odds in the numerator. If, on the other hand, we had a strong reason to report one of the numbers as a kind of reference group, we might put the reference group in the denominator.

If our columns are our groups, and \(a/c\) is the larger of the two odds, then the odds ratio would be

\[ OR = \frac{a/c}{b/d} \]

❗ Try computing the odds ratio for Hraba and Grant’s results.

For Hraba and Grant’s data, the observed odds of a white child choosing the same-race doll is \(50/21=2.381\). The observed odds of a black child choosing the same-race doll is \(48/41=1.171\).

The odds ratio is \[ \begin{eqnarray} OR &=& \frac{2.381}{1.171}\\ &=& 2.034 \end{eqnarray} \]

The observed odds of a white child choosing a same race doll was 2.034 times higher than a black child. If the groups were the same, we’d expect this number to be about 1.

5.4.3.2 Confidence interval

We can compute a confidence interval for the odds ratio. To do this, we have to note that the odds ratio cannot go below 0 (odds are always positive!) so the sampling distribution of the odds ratio will be skewed. When computing standard errors and confidence intervals, we want to start with a nice, symmetric variable. In this case, we will work with the natural logarithm of the odds ratio, the log-odds-ratio (\(LOR\)).

When the \(OR\) is 1 (even odds, or both groups show the same odds) the \(LOR\) is \(\log(1)=0\).

Our point estimate for the log-odds-ratio is just the logarithm of the odds ratio, \(\log(OR)\):

\[ \begin{eqnarray} LOR &=& \log(OR) \\ &=& \log(2.034) \\ &=& 0.71 \end{eqnarray} \]

We now need to find the standard error of the log-odds-ratio. This is: \[ SE(LOR) = \sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} \]

❗ Try computing the standard error of the log-odds-ratio for Hraba and Grant’s results.

For Hraba and Grant data’s data, this is:

\[ \begin{eqnarray} SE(LOR) &=& \sqrt{\frac{1}{50}+\frac{1}{48}+\frac{1}{21}+\frac{1}{41}}\\ &=& \sqrt{0.02+0.021+0.048+0.024}\\ &=& \sqrt{0.113}\\ &=& 0.336 \end{eqnarray} \]

The sampling distribution of the log-odds-ratio is approximately normal, so we can use our \(1.96\) standard error rule to build a 95% confidence interval.

\[ \begin{eqnarray} (LOR) CI_{95\%} &=& LOR \pm 1.96\times SE(LOR) \end{eqnarray} \]

❗ Try computing the 95% confidence interval for the log-odds ratio for Hraba and Grant’s results.

\[ \begin{eqnarray} (LOR) CI_{95\%} &=& LOR \pm 1.96\times SE(LOR)\\ &=& 0.71 \pm 1.96 \times 0.336\\ &=& 0.71 \pm 0.658\\ &=& [0.051, 1.368] \end{eqnarray} \]

This is 95% confidence interval for the log-odds-ratio. We want the confidence interval for the odds ratio. All we have to do now is do the reverse of the logarithm, exponentiation:

\[ \begin{eqnarray} (OR) CI_{95\%}&=& [e^{L_{LOR}}, e^{U_{LOR}}] \end{eqnarray} \] where \(L_{LOR}\) and \(U_{LOR}\) are the lower and upper bound for the \(LOR\) confidence interval, respectively.

❗ Try computing the confidence interval on the odds ratio for Hraba and Grant’s results.

\[ \begin{eqnarray} (OR) CI_{95\%}&=& [e^{0.051}, e^{1.368}]\\ &=& [1.053, 3.929] \end{eqnarray} \]

Notice that the lower bound of the 95% confidence interval is very close to 1 (the null hypothesis), implying that we cannot rule out very small effects (at the 5% level). Also note that it does exclude 1, implying that the effect will be just barely significant at the 5% level.

5.5 Assessing the null hypothesis of no relationship

The most common test of association or “no relationship” between the two factors is the \(\chi^2\) (chi-square) test of independence.

I will not summarize the test here; see Chapter 19 of Field (2017) for details.

SPSS will compute the \(X^2\) statistic. After you weight your cases (see here), choose “Analyze/Descriptive Statistics/Crosstabs…” and under “Statistics…”, choose “Chi-square”. Use the row in the output that specifies “Continuity correction”. The \(p\) value is given in the “Asymp. Sig.” column.

❗ Try computing the \(\chi^2\) test for the Hraba and Grant data.

After you weight cases, you should go to “Analyze/Descriptive Statistics/Crosstabs…”. Put the groups (“Child”) in the rows and the response (“Doll”) in the columns. Ensure that under the “Statistics…” menu, “Chi-square” is selected at the top.

You should see a table that looks something like the one below (but may have additional rows).

Chi-square Tests
Value df Asymp. Sig.
(2-sided)
Pearson Chi-square 4.525 1 0.033
Continuity correction 3.857 1 0.050

Hraba & Grant (1970) concluded that “[The data] shows that black and white children preferred the doll of their own race. The white children were significantly more ethnocentric [when asked to choose the "nice" doll]…”

5.6 A full report

Your full report should contain a graph with error bars, a measure of association or effect (e.g., Yule’s \(Q\) or the odds ratio), a test statistic (e.g., \(X^2\)), and a \(p\) value for the test.

Because a \(2\times2\) contingency table is so compact, you should also include the frequencies themselves in your report as a table.

How you format your report depends somewhat on what statistics you’ve chosen to report.

5.6.1 Yule’s \(Q\)

If you’ve chosen Yule’s \(Q\) (you’re more interested in a measure of the strength of association of the variables) you can format your report as below.

White children selected same-race dolls more often than black children, suggesting that same-race doll-preference is associated with race. This association was statistically significantat \(\alpha=0.05\) (Yule’s \(Q=0.341\), \(X^2=3.857\), \(p=0.05\), with Yates’ continuity correction).

5.6.2 Difference between proportions

If you’ve chosen to report the difference in proportions, you can format your report as below.

White children selected same-race dolls more often than black children, that black and white children differ in same-race doll preference. The difference between black and white children was statistically significant at \(\alpha=0.05\) (\(p_{diff}=0.165\), \(CI_{95\%}: [0.017,0.313]\), \(X^2=3.857\), \(p=0.05\), with Yates’ continuity correction).

5.6.3 Odds ratio

If you’ve chosen to report an odds ratio, you can format your report as below.

White children selected same-race dolls more often than black children, that black and white children differ in same-race doll preference. The difference between black and white children was statistically significant at \(\alpha=0.05\) (\(OR=2.034\), \(CI_{95\%}: [1.053,3.929]\), \(X^2=3.857\), \(p=0.05\), with Yates’ continuity correction).

Regardless of which statistic is chosen for the report, it would be expected that size of the effect would be discussed in context. For the Hraba and Grant data, looking at the confidence interval reveals that the data appear to be consistent with very small (about 1; essentially even odds) to moderately large (about 4) odds ratios (for a 95% CI; this would be different if we adjusted our confidence coefficient).

5.7 Things to watch out for

5.7.1 Probability vs odds

If your design is two groups, you’ll have to choose between reporting a difference in proportions and an odds ratio (there’s even a third option: relative risk). One issue with odds ratios is that they can be very large even when the absolute probabilities are small.

For instance, the chances of dying by rare events. The US-based National Center for Health Statistics reported that in their database, 1 in 161,856 people died from lightning strike, while 1 in 1,498 died in from fire. The odds ratio for fire vs lightning is 108.12, indicating that you are much more likely to die from fire than by lightning. But you are not likely to die from either. The difference between the two probabilities is only 0.00066.

When choosing to report either odds or probability, make sure that your report is not misleading, and be sure to include tables and graphs so that the reader doesn’t just see the odds ratio or difference, but can also see overall how rare the events are.

5.7.2 Small cell counts

When cell counts are small, it can be the case that not enough is known to produce good estimates of how much the table will vary (that is, we don’t know enough about the statistical uncertainty). Be cautious interpreting the results when cell counts are small.

5.7.3 Dependence between observations

Analyses of contingency tables assumes that each count (observation) is independent from the others. This can be violated, for instance, of observations from the same person appear in more than one cell. Lack of independence invalidates the analysis.

References

Clark, K. B., & Clark, M. K. (1947). Racial identification and preference in negro children. In T. M. Newcomb & E. L. Hartley (Eds.), Readings in social psychology (2nd ed.). New York: Holt.

Field, A. (2017). Discovering statistics using IBM SPSS (5th ed.). SAGE Publications Ltd.

Hraba, J., & Grant, G. (1970). Black is beautiful: A reexamination of racial preference and identification. Journal of Personality and Social Psychology, 16(3), 398–402.


  1. Note that this calculation does not have a continuity correction added, so it will be slightly inconsistent with the usual \(\chi^2\) test, which does. This is not something to worry too much about; it is unwise to be overly concerned about whether the result is just over, or under, the arbitrary criterion for significance.↩︎