Chapter 10 Multi-way designs

10.1 Readings and Resources

10.1.2 Ordered logistic model

10.2 Example data set

## Parsed with column specification:
## cols(
##   DIRECTION = col_character(),
##   DISTANCE = col_double(),
##   STN_NUMBER = col_double(),
##   STN_NAME = col_character()
## )

When we think about what it means for something to be far away or close, we couple an objective concept of distance (say, 1000 meters) with a more subjective feeling about that distance. Two people may agree about the former (e.g., “the restaurant is 2km away”), but disagree about the latter (e.g., “That is so far”; “Are you kidding? It’s just around the corner!”).

Maglio & Polman (2014) were interested in what affects that psychological, subjective sense of distance in the context of traveling within a large city (Toronto, in this case). In particular, they were interested in how one’s orientation (traveling toward or away from somewhere) might affect one’s sense of distance.

Below you can see a section of the Bloor-Danforth line (green, horizontal) centered on Bay station. On this line routes travel west, through St. George and Spadina stations, and east, through Bloor-Younge and Sherbourne stations.

Toronto subway map centered on Bay station.

(Toronto subway map below obtained from the Toronto Transit Commission)

Maglio and Polman asked 202 travelers at Bay station how far away they “felt” one of the four stations was (Spadina, St. George, Bloor-Younge, or Sherbourne; randomly selected for each participant). An example of the survey is shown below. The objective walking distance from Bay station to each of the four stations is 1100m west, 760m, west, 410m east, and 1100m east, respectively.

Example of item from Maglio and Polman (2014) asking about subjective distance to Bloor-Younge station.

(Other materials from the study can be obtained from Magio and Polman’s Open Science Framework page)

Additionally, researchers asked people on both the east (travelers headed toward Sherbourne) and west (travelers headed toward Spadina) sides of the platform.

Maglio and Polman’s design has two factors: the station about which the participant was asked (four levels, one for each station) and the direction the participants were traveling (two levels, east or west). Maglio and Polman wanted to test the hypothesis that orientation—in this case, traveling toward a station—affects the subjective distance to that point, as operationalized by the participant’s response to the survey on a 7-point scale.

In statistical terms, Maglio and Polman were interested in an interaction: does direction of travel (factor) change one’s assessment of the subjective distance (dependent variable) to the surrounding stations (factor)?

10.3 Descriptives

We begin by describing the data. We have eight “design cells” (two directions of travel, east/west crossed with four stations). Additionally, we have seven possible responses, which leaves \(8\times7=56\) total response frequencies, shown in the table below.

## `summarise()` regrouping output by 'DIRECTION', 'STN_NAME' (override with `.groups` argument)
Summaries and response proportions
Maglio & Polman (2014), Experiment 1
Station / Direction N M SD Response1
1 2 3 4 5 6
Spadina
West 25 2.640 0.995 8.0% 44.0% 28.0% 16.0% 4.0%
East 26 3.654 1.093 3.8% 11.5% 23.1% 38.5% 23.1%
St. George
West 25 1.640 0.810 56.0% 24.0% 20.0%
East 26 2.769 1.032 15.4% 15.4% 50.0% 15.4% 3.8%
Bloor-Younge
West 26 2.192 1.234 34.6% 34.6% 15.4% 7.7% 7.7%
East 23 1.609 0.722 52.2% 34.8% 13.0%
Sherbourne
West 25 4.000 1.118 12.0% 20.0% 28.0% 36.0% 4.0%
East 26 2.769 1.142 15.4% 26.9% 26.9% 26.9% 3.8%

1 Empty cells denote no responses. Response category 7 is omitted because no participant used it.

The table also shows the number of participants in each of the eight station/direction combinations, and the corresponding mean and standard deviation of the responses.

The table, though large, allows us to see all the results at a glance. West-traveling participants responded with smaller distances than east-traveling participants when asked to rate the distance to the western stations, Spadina and St. George. The opposite is true of the eastern stations, Bloor-Younge and Sherbourne. This is consistent with distances being “shrunk” in the direction of travel.

10.4 Visualizations

Although the table is an effect way of showing the data, we should also consider graphical representations of the data. Maglio and Polman showed means and standard errors5, and we will follow them before making a more detailed plot.

10.4.1 Means / standard errors

Means and standard errors are the typical way that data is represented in many research settings. We have already discussed the downsides of this in previous sections (e.g. Section 8.4). The same discussion applies here: means and standard errors hide much of the data.

## `summarise()` regrouping output by 'STN_NAME' (override with `.groups` argument)
Average response (subjective distance) by station and direction of travel from Bay station. Error bars show standard errors of the mean. Compare to Figure 1 of Maglio and Polman (2014), p. 1347

Figure 10.1: Average response (subjective distance) by station and direction of travel from Bay station. Error bars show standard errors of the mean. Compare to Figure 1 of Maglio and Polman (2014), p. 1347

However, our solutions previously (violin plots, box plots, dot plots) are somewhat inappropriate for these particular data. Why? Because the data are discrete: they can only take on one of 7 values. This means that observations will be particularly clumped together. Box plots and violin plots are good for continuous data, but less good for discrete data like this, where they will have a lumpy look.

The best way of obtaining a plot similar to Figure 10.1 in SPSS is to create it along with your ANOVA. In the Analyze→General Linear Model→Univariate… enter the “Plots…” box and add a plot with the direction factor as separate lines and the station number on the horizontal axis. Under Error Bars (below), select “Standard error” and set the multiplier to “1”. You’ll get the graph along with your ANOVA output.

The easiest way of getting a graph like 10.1 in jamovi is to create it along with your ANOVA. Before you run your ANOVA, make sure you’ve set the station name variable to be ordinal in the data setup, and order them from west to east (Spadina, St. George, Bloor-Younge, and Sherbourne).

Then start your ANOVA. When you’ve set up your ANOVA, scroll down to the “Estimated Marginal Means” settings, and indicate that you’d like “Marginal means plots” in the output (right), with standard errors as error bars (left). Then, highlight both the station variable and the direction variable and drag them to the right, to under “Term 1”.

The plot should appear to the right.

The easiest way of getting a graph like 10.1 in JASP is to create it along with your ANOVA. Before you run your ANOVA, make sure you’ve set the ordering of the station name variable the data setup: order them from west to east (Spadina, St. George, Bloor-Younge, and Sherbourne).

Then start your ANOVA. When you’ve set up your ANOVA, scroll down to the “Descriptive plots” settings and indicate you want standard errors as error bars (below). Then add station name on the horizontal axis and direction as separate lines.

The plot should appear to the right.

10.4.2 Dot plot, with dodge/jitter

We could take a similar approach to that in Section 9.2 by creating a dotplot as in Figure 10.2.

Dot plot of the responses (subjective distance) by station and direction of travel from Bay station. Note that scale value '7' was never used by any participant.

Figure 10.2: Dot plot of the responses (subjective distance) by station and direction of travel from Bay station. Note that scale value ‘7’ was never used by any participant.

We could also add proportions for more detail (see below). Although the dotplot is better than means and standard errors because it shows all the data, it relies on our ability to count the number of dots in a category, which can be difficult with large numbers of dots. The number of dots in some of these cells is bordering on too many already.

In jamovi, you can get something similar to the above by ordering your station variable, then under the Descriptives module put “Distance” under “Variable”, then the direction and station name variables under “Split by”. Under “Box plots” below, choose “Data” and set it to “Jittered”.

The plot should appear to the right.

Side-by-side histograms, as below, are probably a better option.

10.4.3 Frequencies

Figure 10.3 shows the relative frequency histograms for each of the conditions and response categories. The two directions (east and west) for each station diverge in opposite directions allowing one to see how the direction of travel matters for each station. The same patterns as in Figure 10.1 are evident: participants traveling west give smaller distance estimates to the western stations (Spadina and St. George), and vice versa for the eastern stations.

## `summarise()` regrouping output by 'STN_NAME', 'DIRECTION' (override with `.groups` argument)
## Warning: Using alpha for a discrete variable is not advised.
## Warning: Removed 40 rows containing missing values (geom_text).
Observed response frequencies (subjective distance) by station and direction of travel from Bay station. Note that scale value '7' was never used by any participant.

Figure 10.3: Observed response frequencies (subjective distance) by station and direction of travel from Bay station. Note that scale value ‘7’ was never used by any participant.

What is lacking in this graph is measure of uncertainty (e.g., standard errors). It isn’t immediately clear that a measure of uncertainty would improve this particular graph, however; it might be better to rely on an an additional graph and/or the inferential statistics presented elsewhere in order to get a sense of the uncertainty.

Figure 10.3 requires a lot of customization. It would be difficult to create in a GUI statistical software such as SPSS. I created in R with ggplot2. To see the code and create one like it, please see the RStudio project for this chapter, linked above.

10.5 ANOVA

The most common method for that designs like this one is a factorial ANOVA with two factors and the interaction. If we think about the design and look at Figure 10.3, it is immediately obvious that this approach is likely to be…suboptimal. These responses are…

  • ordinal (they have no interval or ratio interpretation),
  • discrete (they can only take on one of 7 values), and
  • skewed (thanks to ceiling/floor effects).

In order to get practice with ANOVA, and to ensure we can replicate the results Maglio and Polman report, we will start with an two-way factorial ANOVA before turning to an ordered logistic analysis.

Model statistics
Multiple \(R^2\) 0.381
Multiple \(R^2_{adj}\) 0.359
\(F_{7,194}\) 17.080
Sig. (\(p\)) 0.000
Resid. SD 1.036
Resid. df 194
Analysis of Variance table
Type II sums of squares
Term SS d.f. MS \(F\) Sig. (\(p\)) \(\eta^2\) \(\eta^2_p\)
DIRECTION 0.713 1 0.713 0.664 0.416 0.001 0.002
STN_NAME 75.158 3 25.053 23.349 0.000 0.223 0.265
DIRECTION:STN_NAME 52.413 3 17.471 16.283 0.000 0.156 0.201
Residuals 208.152 194 1.073

If you performed the ANOVA above in SPSS, JASP, jamovi, or statscloud without changing any of the default settings, it is likely your results are slightly different from the ones above (take a look at the \(F\) values, for instance). In many software packages, Type III sums of squares by default, but the above table shows Type II sums of squares (as is the default in R).

You can find an explanation of the difference in Langsrud (2003). I recommend using Type II by default (you can set this in most software packages under the Model settings), but the differences are unlikely to be large in most applications.

At the time of this writing, statscloud does not allow changing the from Type III.

10.5.1 Model checking

## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  DISTANCE by direction_and_station
## Fligner-Killeen:med chi-squared = 3, df = 7, p-value = 0.8
QQ plot of residuals from the normal error model (ANOVA) against theoretical normal quantiles.

Figure 10.4: QQ plot of residuals from the normal error model (ANOVA) against theoretical normal quantiles.

10.6 Ordinal analysis

## formula: DISTANCE ~ STN_NUMBER * DIRECTION
## data:    for_clm0
## 
##  link  threshold nobs logLik  AIC    niter max.grad cond.H 
##  logit flexible  202  -273.75 571.51 7(0)  2.31e-10 1.7e+02
## 
## Coefficients:
##                           Estimate Std. Error z value Pr(>|z|)    
## STN_NUMBER2                 -1.838      0.535   -3.44  0.00059 ***
## STN_NUMBER3                 -0.904      0.512   -1.76  0.07767 .  
## STN_NUMBER4                  2.311      0.544    4.25  2.2e-05 ***
## DIRECTIONEAST                1.720      0.515    3.34  0.00083 ***
## STN_NUMBER2:DIRECTIONEAST    0.415      0.728    0.57  0.56850    
## STN_NUMBER3:DIRECTIONEAST   -2.642      0.750   -3.52  0.00043 ***
## STN_NUMBER4:DIRECTIONEAST   -3.768      0.775   -4.86  1.2e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2   -1.663      0.383   -4.34
## 2|3   -0.123      0.354   -0.35
## 3|4    1.361      0.379    3.59
## 4|5    2.863      0.439    6.52
## 5|6    6.213      1.078    5.76
## Likelihood ratio tests of cumulative link models:
##  
##                  formula:                          link: threshold:
## intercept_only   DISTANCE ~ 1                      logit flexible  
## station          DISTANCE ~ STN_NUMBER             logit flexible  
## main_effects     DISTANCE ~ STN_NUMBER + DIRECTION logit flexible  
## full_interaction DISTANCE ~ STN_NUMBER * DIRECTION logit flexible  
## 
##                  no.par AIC logLik LR.stat df Pr(>Chisq)    
## intercept_only        5 650   -320                          
## station               8 607   -295   49.55  3    1.0e-10 ***
## main_effects          9 608   -295    0.88  1       0.35    
## full_interaction     12 572   -274   42.49  3    3.2e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In SPSS, you can find ordered logistic analysis under Analyze→Regression→Ordinal… . In order build the model above, you’ll need to manually build the model under “Location…” (location because you’ll build a model on the mean—the location parameter—of the logistic distribution).

In the Location settings, select “Custom” at the top, and then add the main effects of direction and station name. Then add the interaction of the two to complete the full factorial model.

In jamovi, ordered logistic analysis is under the Regression module, “Ordinal outcomes”. Before you start an ordinal regression, though, make sure your variables are all set up properly in the data settings (distance and station name should be ordinal, with the proper ordering).

Once your variables are set up properly, head over to put ordinal regression settings and put your variables in the appropriate boxes. Then you’ll want to use the Model Builder to create “blocks”.

Neither JASP nor statscloud include ordinal logistic analyses at this time.

10.6.1 Model checking

## Tests of nominal effects
## 
## formula: DISTANCE ~ STN_NUMBER * DIRECTION
##                      Df logLik AIC   LRT Pr(>Chi)
## <none>                    -274 572               
## STN_NUMBER           12   -268 583 12.22     0.43
## DIRECTION             4   -272 576  3.63     0.46
## STN_NUMBER:DIRECTION
## `summarise()` regrouping output by 'DIRECTION', 'STN_NUMBER' (override with `.groups` argument)
## `summarise()` regrouping output by 'DIRECTION', 'STN_NUMBER', 'DISTANCE' (override with `.groups` argument)
## Warning: Using alpha for a discrete variable is not advised.
## Warning: Removed 12336 rows containing missing values (geom_text).
Illustration of the ordinal logistic regression model, with predicted response probabilities.

Figure 10.5: Illustration of the ordinal logistic regression model, with predicted response probabilities.

See Greenwell et al. (2018) for details.

Boxplot of (random) surrogate residuals by group.

Figure 10.6: Boxplot of (random) surrogate residuals by group.

QQ plot of surrogate residuals against the theoretical logistic quantiles.

Figure 10.7: QQ plot of surrogate residuals against the theoretical logistic quantiles.

## `summarise()` regrouping output by 'STN_NAME' (override with `.groups` argument)
Chi-square test statistics for each group testing the hypothesis that the true are equal to the predicted probability from the ordered logistic model.

Figure 10.8: Chi-square test statistics for each group testing the hypothesis that the true are equal to the predicted probability from the ordered logistic model.

Observations' contributions to ordinal model fit
Experiment 1, Maglio & Polman (2014)
Station Dir. N in group Resp. Contr. to \(\chi^2_5\) Why Obs. Exp. Obs. p Exp. p
Bl-Y W 26 5 3.626 2 0.567 0.077 0.022
St.G E 26 3 1.756 13 9.021 0.500 0.347
St.G W 25 3 1.657 5 2.833 0.200 0.113
Spdn W 25 2 1.367 11 7.746 0.440 0.310
St.G E 26 2 1.350 4 7.095 0.154 0.273
Shrb W 25 2 1.347 3 1.553 0.120 0.062
Shrb E 26 4 1.114 7 4.709 0.269 0.181
Spdn W 25 1 0.989 2 3.985 0.080 0.159
St.G W 25 4 0.755 0 0.755 0.000 0.030
Bl-Y E 23 4 0.703 0 0.703 0.000 0.031
Contributions to $\chi^2_5$ statistics for individual groups, to show potential misfit.

Figure 10.9: Contributions to \(\chi^2_5\) statistics for individual groups, to show potential misfit.

References

Field, A. (2017). Discovering statistics using IBM SPSS (5th ed.). SAGE Publications Ltd.

Greenwell, B. M., McCarthy, A. J., Boehmke, B. C., & Dungang Liu. (2018). Residuals and diagnostics for binary and ordinal regression models: An introduction to the sure package. The R Journal, 10(1), 381–394. Retrieved from https://journal.r-project.org/archive/2018/RJ-2018-004/RJ-2018-004.pdf

Langsrud, Ø. (2003). ANOVA for unbalanced data: Use Type II instead of Type III sums of squares. Statistics and Computing, 13(2), 163–167. Retrieved from https://link.springer.com/article/10.1023/A:1023260610025

Maglio, S. J., & Polman, E. (2014). Spatial orientation shrinks and expands psychological distance. Psychological Science, 25(7), 1345–1352. Retrieved from https://doi.org/10.1177/0956797614530571


  1. The standard errors reported by Maglio and Polman are twice as large as they would be. Be aware that some software (notably SPSS) defaults to producing error bars that are two standard errors from the mean instead of one. You must change the default setting to get the proper error bars (or report that your error bars are \(\pm2~SE\), not \(\pm1~SE\)).↩︎