Statistical Analysis and Reporting in R Table of Statistical Analyses

Jacob O. Wobbrock [contact]
The Information School
University of Washington

Download

Latest Update: 26-Oct-2021
Download just R code recipes: PDF
Download ZIP file (everything): ZIP

About

Have you ever needed to do a statistical analysis but not been quite sure which one to use? Or perhaps you've known the proper analysis, but not known how to translate it into R code? Statistical Analysis and Reporting in R provides an organized set of R code recipes for numerous analyses. These analyses are organized by whether they involve single or multiple factors, are between- or within-subjects, are for main effects, interactions, or post hoc contrasts, or are parametric or nonparametric. Tests of proportion and association, tests of ANOVA assumptions, and data distribution tests are also included. Generalized linear models, linear mixed models, and generalized linear mixed models are also included. All code recipes are accompanied by generic data sets and runnable R code files. You can therefore look up your desired analysis, take the provided R code as a starting point, and change the generic variable names as needed. Finally, for each R analysis, an English language statistical result is given of the kind that might appear in a scientific report, often including a graph made with ggplot2.

Accompanying Data Sets

All data sets for use with the R code recipes are given in *.csv file format in the ZIP download. These data files are contained in a folder called data and divided into subfolders therein, which correspond to the *.R code files included. Provided your R code files are in the same directory as the data folder, running the R code recipes should "just work."

The data files follow a naming convention, as in this example:

1F3LBs_multinomial.csv

The "1F" part refers to one factor, i.e., a single independent variable 'X'. The "3L" part refers to three levels, i.e., the factor has levels 'a', 'b', and 'c'. The "Bs" part refers to between-subjects, meaning each subject was exposed to only one of the factor's levels. Finally, the "multinomial" part refers to the dependent variable, which in this case, is a polytomous response, i.e., it can take on 'x', 'y', or 'z' as values.

All data files are in long format, with a subject identifier "S" in the leftmost column; independent variables (factors) "X", "X1", or "X2" etc. in the next columns; and the dependent variable (response) "Y" in the rightmost column.

Required Software Tools

Two software tools are required for running the code snippets given in Statistical Analysis and Reporting in R. These tools are R and RStudio. You should install R first and then RStudio.

Related Coursera Course

I have also created a Coursera course that covers much of the material herein. It is taught in the R statistical programming language using the RStudio environment. The course is called Designing, Running & Analyzing Experiments.

Related Independent Study

Previously, I created an independent study called Practical Statistics for HCI, which covers inferential statistics using the SAS JMP and IBM SPSS statistics tools. The independent study is similar to, and largely subsumed by, my Coursera course.

Author's Statistics Papers

  1. Elkin, L.A., Kay, M., Higgins, J. and Wobbrock, J.O. (2021). An aligned rank transform procedure for multifactor contrast tests. Proceedings of the ACM Symposium on User Interface Software and Technology (UIST '21). Virtual Event (October 10-14, 2021). New York: ACM Press, pp. 754-768.
  2. Wobbrock, J.O. (2017). The relevance of nonparametric and semi-parametric statistics to HCI. Workshop on "Moving Transparent Statistics Forward." ACM Conference on Human Factors in Computing Systems (CHI '17). Denver, Colorado (May 6-11, 2017). Paper No. 2.
  3. Wobbrock, J.O. and Kay, M. (2016). Nonparametric statistics in human-computer interaction. Chapter 7 in J. Robertson & M.C. Kaptein (eds.), Modern Statistical Methods for HCI. Switzerland: Springer, pp. 135-170.
  4. Wobbrock, J.O. (2011). Practical statistics for human-computer interaction: An independent study combining statistics theory and tool know-how. Annual Workshop of the Human-Computer Interaction Consortium (HCIC '11). Pacific Grove, California (June 14-18, 2011).
  5. Wobbrock, J.O., Findlater, L., Gergle, D. and Higgins, J.J. (2011). The aligned rank transform for nonparametric factorial analyses using only ANOVA procedures. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI '11). Vancouver, British Columbia (May 7-12, 2011). New York: ACM Press, pp. 143-146.

Copyright © 2018-2021 Jacob O. Wobbrock. All rights reserved.
Last updated October 26, 2021.