Chapter 10 Multi-way designs
10.1 Readings and Resources
10.1.1 ANOVA
- (Field, 2017)
- Chapter 14
- Foltz, B.: “Two-way ANOVA w/o Replication, A Visual Guide” (and later videos)
- Grande, T.: “Conducting a Two-Way ANOVA in SPSS”
- Buchanan, E.: “JASP/Excel - Two-Way Between Subjects ANOVA Example”
- datalabcc: “ANOVA — jamovi”
10.1.2 Ordered logistic model
- National Centre for Research Methods “Module 5: Ordinal Regression” (Focuses on SPSS, but the content is not exclusive to SPSS)
- Ordinal Logistic Regression | R Data Analysis Examples
- Greenwell, McCarthy, Boehmke, & Dungang Liu (2018): “Residuals and Diagnostics for Binary and Ordinal Regression Models: An Introduction to the
sure
Package”
10.2 Example data set
## Parsed with column specification:
## cols(
## DIRECTION = col_character(),
## DISTANCE = col_double(),
## STN_NUMBER = col_double(),
## STN_NAME = col_character()
## )
When we think about what it means for something to be far away or close, we couple an objective concept of distance (say, 1000 meters) with a more subjective feeling about that distance. Two people may agree about the former (e.g., “the restaurant is 2km away”), but disagree about the latter (e.g., “That is so far”; “Are you kidding? It’s just around the corner!”).
Maglio & Polman (2014) were interested in what affects that psychological, subjective sense of distance in the context of traveling within a large city (Toronto, in this case). In particular, they were interested in how one’s orientation (traveling toward or away from somewhere) might affect one’s sense of distance.
Below you can see a section of the Bloor-Danforth line (green, horizontal) centered on Bay station. On this line routes travel west, through St. George and Spadina stations, and east, through Bloor-Younge and Sherbourne stations.
(Toronto subway map below obtained from the Toronto Transit Commission)
Maglio and Polman asked 202 travelers at Bay station how far away they “felt” one of the four stations was (Spadina, St. George, Bloor-Younge, or Sherbourne; randomly selected for each participant). An example of the survey is shown below. The objective walking distance from Bay station to each of the four stations is 1100m west, 760m, west, 410m east, and 1100m east, respectively.
(Other materials from the study can be obtained from Magio and Polman’s Open Science Framework page)
Additionally, researchers asked people on both the east (travelers headed toward Sherbourne) and west (travelers headed toward Spadina) sides of the platform.
Maglio and Polman’s design has two factors: the station about which the participant was asked (four levels, one for each station) and the direction the participants were traveling (two levels, east or west). Maglio and Polman wanted to test the hypothesis that orientation—in this case, traveling toward a station—affects the subjective distance to that point, as operationalized by the participant’s response to the survey on a 7-point scale.
In statistical terms, Maglio and Polman were interested in an interaction: does direction of travel (factor) change one’s assessment of the subjective distance (dependent variable) to the surrounding stations (factor)?
10.3 Descriptives
We begin by describing the data. We have eight “design cells” (two directions of travel, east/west crossed with four stations). Additionally, we have seven possible responses, which leaves \(8\times7=56\) total response frequencies, shown in the table below.
## `summarise()` regrouping output by 'DIRECTION', 'STN_NAME' (override with `.groups` argument)
Summaries and response proportions | |||||||||
---|---|---|---|---|---|---|---|---|---|
Maglio & Polman (2014), Experiment 1 | |||||||||
Station / Direction | N | M | SD | Response1 | |||||
1 | 2 | 3 | 4 | 5 | 6 | ||||
Spadina | |||||||||
West | 25 | 2.640 | 0.995 | 8.0% | 44.0% | 28.0% | 16.0% | 4.0% | |
East | 26 | 3.654 | 1.093 | 3.8% | 11.5% | 23.1% | 38.5% | 23.1% | |
St. George | |||||||||
West | 25 | 1.640 | 0.810 | 56.0% | 24.0% | 20.0% | |||
East | 26 | 2.769 | 1.032 | 15.4% | 15.4% | 50.0% | 15.4% | 3.8% | |
Bloor-Younge | |||||||||
West | 26 | 2.192 | 1.234 | 34.6% | 34.6% | 15.4% | 7.7% | 7.7% | |
East | 23 | 1.609 | 0.722 | 52.2% | 34.8% | 13.0% | |||
Sherbourne | |||||||||
West | 25 | 4.000 | 1.118 | 12.0% | 20.0% | 28.0% | 36.0% | 4.0% | |
East | 26 | 2.769 | 1.142 | 15.4% | 26.9% | 26.9% | 26.9% | 3.8% | |
1
Empty cells denote no responses. Response category 7 is omitted because no participant used it.
|
The table also shows the number of participants in each of the eight station/direction combinations, and the corresponding mean and standard deviation of the responses.
The table, though large, allows us to see all the results at a glance. West-traveling participants responded with smaller distances than east-traveling participants when asked to rate the distance to the western stations, Spadina and St. George. The opposite is true of the eastern stations, Bloor-Younge and Sherbourne. This is consistent with distances being “shrunk” in the direction of travel.
10.4 Visualizations
Although the table is an effect way of showing the data, we should also consider graphical representations of the data. Maglio and Polman showed means and standard errors5, and we will follow them before making a more detailed plot.
10.4.1 Means / standard errors
Means and standard errors are the typical way that data is represented in many research settings. We have already discussed the downsides of this in previous sections (e.g. Section 8.4). The same discussion applies here: means and standard errors hide much of the data.
## `summarise()` regrouping output by 'STN_NAME' (override with `.groups` argument)
However, our solutions previously (violin plots, box plots, dot plots) are somewhat inappropriate for these particular data. Why? Because the data are discrete: they can only take on one of 7 values. This means that observations will be particularly clumped together. Box plots and violin plots are good for continuous data, but less good for discrete data like this, where they will have a lumpy look.
The best way of obtaining a plot similar to Figure 10.1 in SPSS is to create it along with your ANOVA. In the Analyze→General Linear Model→Univariate… enter the “Plots…” box and add a plot with the direction factor as separate lines and the station number on the horizontal axis. Under Error Bars (below), select “Standard error” and set the multiplier to “1”. You’ll get the graph along with your ANOVA output.
The easiest way of getting a graph like 10.1 in jamovi is to create it along with your ANOVA. Before you run your ANOVA, make sure you’ve set the station name variable to be ordinal in the data setup, and order them from west to east (Spadina, St. George, Bloor-Younge, and Sherbourne).
Then start your ANOVA. When you’ve set up your ANOVA, scroll down to the “Estimated Marginal Means” settings, and indicate that you’d like “Marginal means plots” in the output (right), with standard errors as error bars (left). Then, highlight both the station variable and the direction variable and drag them to the right, to under “Term 1”.
The plot should appear to the right.
The easiest way of getting a graph like 10.1 in JASP is to create it along with your ANOVA. Before you run your ANOVA, make sure you’ve set the ordering of the station name variable the data setup: order them from west to east (Spadina, St. George, Bloor-Younge, and Sherbourne).
Then start your ANOVA. When you’ve set up your ANOVA, scroll down to the “Descriptive plots” settings and indicate you want standard errors as error bars (below). Then add station name on the horizontal axis and direction as separate lines.
The plot should appear to the right.
10.4.2 Dot plot, with dodge/jitter
We could take a similar approach to that in Section 9.2 by creating a dotplot as in Figure 10.2.
We could also add proportions for more detail (see below). Although the dotplot is better than means and standard errors because it shows all the data, it relies on our ability to count the number of dots in a category, which can be difficult with large numbers of dots. The number of dots in some of these cells is bordering on too many already.
In jamovi, you can get something similar to the above by ordering your station variable, then under the Descriptives module put “Distance” under “Variable”, then the direction and station name variables under “Split by”. Under “Box plots” below, choose “Data” and set it to “Jittered”.
The plot should appear to the right.
Side-by-side histograms, as below, are probably a better option.
10.4.3 Frequencies
Figure 10.3 shows the relative frequency histograms for each of the conditions and response categories. The two directions (east and west) for each station diverge in opposite directions allowing one to see how the direction of travel matters for each station. The same patterns as in Figure 10.1 are evident: participants traveling west give smaller distance estimates to the western stations (Spadina and St. George), and vice versa for the eastern stations.
## `summarise()` regrouping output by 'STN_NAME', 'DIRECTION' (override with `.groups` argument)
## Warning: Using alpha for a discrete variable is not advised.
## Warning: Removed 40 rows containing missing values (geom_text).
What is lacking in this graph is measure of uncertainty (e.g., standard errors). It isn’t immediately clear that a measure of uncertainty would improve this particular graph, however; it might be better to rely on an an additional graph and/or the inferential statistics presented elsewhere in order to get a sense of the uncertainty.
10.5 ANOVA
The most common method for that designs like this one is a factorial ANOVA with two factors and the interaction. If we think about the design and look at Figure 10.3, it is immediately obvious that this approach is likely to be…suboptimal. These responses are…
- ordinal (they have no interval or ratio interpretation),
- discrete (they can only take on one of 7 values), and
- skewed (thanks to ceiling/floor effects).
In order to get practice with ANOVA, and to ensure we can replicate the results Maglio and Polman report, we will start with an two-way factorial ANOVA before turning to an ordered logistic analysis.
Model statistics | |
---|---|
Multiple \(R^2\) | 0.381 |
Multiple \(R^2_{adj}\) | 0.359 |
\(F_{7,194}\) | 17.080 |
Sig. (\(p\)) | 0.000 |
Resid. SD | 1.036 |
Resid. df | 194 |
Analysis of Variance table | |||||||
---|---|---|---|---|---|---|---|
Type II sums of squares | |||||||
Term | SS | d.f. | MS | \(F\) | Sig. (\(p\)) | \(\eta^2\) | \(\eta^2_p\) |
DIRECTION | 0.713 | 1 | 0.713 | 0.664 | 0.416 | 0.001 | 0.002 |
STN_NAME | 75.158 | 3 | 25.053 | 23.349 | 0.000 | 0.223 | 0.265 |
DIRECTION:STN_NAME | 52.413 | 3 | 17.471 | 16.283 | 0.000 | 0.156 | 0.201 |
Residuals | 208.152 | 194 | 1.073 |
If you performed the ANOVA above in SPSS, JASP, jamovi, or statscloud without changing any of the default settings, it is likely your results are slightly different from the ones above (take a look at the \(F\) values, for instance). In many software packages, Type III sums of squares by default, but the above table shows Type II sums of squares (as is the default in R).
You can find an explanation of the difference in Langsrud (2003). I recommend using Type II by default (you can set this in most software packages under the Model settings), but the differences are unlikely to be large in most applications.
At the time of this writing, statscloud does not allow changing the from Type III.
10.5.1 Model checking
##
## Fligner-Killeen test of homogeneity of variances
##
## data: DISTANCE by direction_and_station
## Fligner-Killeen:med chi-squared = 3, df = 7, p-value = 0.8
10.6 Ordinal analysis
## formula: DISTANCE ~ STN_NUMBER * DIRECTION
## data: for_clm0
##
## link threshold nobs logLik AIC niter max.grad cond.H
## logit flexible 202 -273.75 571.51 7(0) 2.31e-10 1.7e+02
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## STN_NUMBER2 -1.838 0.535 -3.44 0.00059 ***
## STN_NUMBER3 -0.904 0.512 -1.76 0.07767 .
## STN_NUMBER4 2.311 0.544 4.25 2.2e-05 ***
## DIRECTIONEAST 1.720 0.515 3.34 0.00083 ***
## STN_NUMBER2:DIRECTIONEAST 0.415 0.728 0.57 0.56850
## STN_NUMBER3:DIRECTIONEAST -2.642 0.750 -3.52 0.00043 ***
## STN_NUMBER4:DIRECTIONEAST -3.768 0.775 -4.86 1.2e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Threshold coefficients:
## Estimate Std. Error z value
## 1|2 -1.663 0.383 -4.34
## 2|3 -0.123 0.354 -0.35
## 3|4 1.361 0.379 3.59
## 4|5 2.863 0.439 6.52
## 5|6 6.213 1.078 5.76
## Likelihood ratio tests of cumulative link models:
##
## formula: link: threshold:
## intercept_only DISTANCE ~ 1 logit flexible
## station DISTANCE ~ STN_NUMBER logit flexible
## main_effects DISTANCE ~ STN_NUMBER + DIRECTION logit flexible
## full_interaction DISTANCE ~ STN_NUMBER * DIRECTION logit flexible
##
## no.par AIC logLik LR.stat df Pr(>Chisq)
## intercept_only 5 650 -320
## station 8 607 -295 49.55 3 1.0e-10 ***
## main_effects 9 608 -295 0.88 1 0.35
## full_interaction 12 572 -274 42.49 3 3.2e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In SPSS, you can find ordered logistic analysis under Analyze→Regression→Ordinal… . In order build the model above, you’ll need to manually build the model under “Location…” (location because you’ll build a model on the mean—the location parameter—of the logistic distribution).
In the Location settings, select “Custom” at the top, and then add the main effects of direction and station name. Then add the interaction of the two to complete the full factorial model.
In jamovi, ordered logistic analysis is under the Regression module, “Ordinal outcomes”. Before you start an ordinal regression, though, make sure your variables are all set up properly in the data settings (distance and station name should be ordinal, with the proper ordering).
Once your variables are set up properly, head over to put ordinal regression settings and put your variables in the appropriate boxes. Then you’ll want to use the Model Builder to create “blocks”.
Neither JASP nor statscloud include ordinal logistic analyses at this time.
10.6.1 Model checking
## Tests of nominal effects
##
## formula: DISTANCE ~ STN_NUMBER * DIRECTION
## Df logLik AIC LRT Pr(>Chi)
## <none> -274 572
## STN_NUMBER 12 -268 583 12.22 0.43
## DIRECTION 4 -272 576 3.63 0.46
## STN_NUMBER:DIRECTION
## `summarise()` regrouping output by 'DIRECTION', 'STN_NUMBER' (override with `.groups` argument)
## `summarise()` regrouping output by 'DIRECTION', 'STN_NUMBER', 'DISTANCE' (override with `.groups` argument)
## Warning: Using alpha for a discrete variable is not advised.
## Warning: Removed 12336 rows containing missing values (geom_text).
See Greenwell et al. (2018) for details.
## `summarise()` regrouping output by 'STN_NAME' (override with `.groups` argument)
Observations' contributions to ordinal model fit | |||||||||
---|---|---|---|---|---|---|---|---|---|
Experiment 1, Maglio & Polman (2014) | |||||||||
Station | Dir. | N in group | Resp. | Contr. to \(\chi^2_5\) | Why | Obs. | Exp. | Obs. p | Exp. p |
Bl-Y | W | 26 | 5 | 3.626 | ↑ | 2 | 0.567 | 0.077 | 0.022 |
St.G | E | 26 | 3 | 1.756 | ↑ | 13 | 9.021 | 0.500 | 0.347 |
St.G | W | 25 | 3 | 1.657 | ↑ | 5 | 2.833 | 0.200 | 0.113 |
Spdn | W | 25 | 2 | 1.367 | ↑ | 11 | 7.746 | 0.440 | 0.310 |
St.G | E | 26 | 2 | 1.350 | ↓ | 4 | 7.095 | 0.154 | 0.273 |
Shrb | W | 25 | 2 | 1.347 | ↑ | 3 | 1.553 | 0.120 | 0.062 |
Shrb | E | 26 | 4 | 1.114 | ↑ | 7 | 4.709 | 0.269 | 0.181 |
Spdn | W | 25 | 1 | 0.989 | ↓ | 2 | 3.985 | 0.080 | 0.159 |
St.G | W | 25 | 4 | 0.755 | ↓ | 0 | 0.755 | 0.000 | 0.030 |
Bl-Y | E | 23 | 4 | 0.703 | ↓ | 0 | 0.703 | 0.000 | 0.031 |
References
Field, A. (2017). Discovering statistics using IBM SPSS (5th ed.). SAGE Publications Ltd.
Greenwell, B. M., McCarthy, A. J., Boehmke, B. C., & Dungang Liu. (2018). Residuals and diagnostics for binary and ordinal regression models: An introduction to the sure package. The R Journal, 10(1), 381–394. Retrieved from https://journal.r-project.org/archive/2018/RJ-2018-004/RJ-2018-004.pdf
Langsrud, Ø. (2003). ANOVA for unbalanced data: Use Type II instead of Type III sums of squares. Statistics and Computing, 13(2), 163–167. Retrieved from https://link.springer.com/article/10.1023/A:1023260610025
Maglio, S. J., & Polman, E. (2014). Spatial orientation shrinks and expands psychological distance. Psychological Science, 25(7), 1345–1352. Retrieved from https://doi.org/10.1177/0956797614530571
The standard errors reported by Maglio and Polman are twice as large as they would be. Be aware that some software (notably SPSS) defaults to producing error bars that are two standard errors from the mean instead of one. You must change the default setting to get the proper error bars (or report that your error bars are \(\pm2~SE\), not \(\pm1~SE\)).↩︎