Statistical Analysis and Reporting in R

Jacob O. Wobbrock [contact]
The Information School
University of Washington

Current Version: 18-Dec-2018

Have you ever needed to do a statistical analysis but not quite known which one to use? Or perhaps you've known the proper analysis, but not known how to translate that into R code? Statistical Analysis and Reporting in R provides an organized set of R code snippets for numerous parametric and nonparametric analyses. These analyses are organized by whether they are single or multiple factors, and between- or within-subjects. Tests of proportion, tests of ANOVA assumptions, and distribution tests are also provided. Generalized linear models, linear mixed models, and generalized linear mixed models are also included. Code snippets for post hoc pairwise comparisons show how to follow up statistically significant main effects and interactions. All code snippets are accompanied by actual data sets. The user can therefore look up the desired analysis, take the provided R code as a starting point, and change the generic variable names as appropriate. Finally, for each R analysis, an English-language statistical result is given of the kind that might appear in a scientific report.

Accompanying Data Sets

All data sets for use with the R code snippets are given in both *.csv and *.jmp file formats. If you're using R and RStudio, you only need the *.csv data files. These data files are contained in a folder called data and divided into subfolders therein. Provided your R code file is in the same directory as the data folder, running the R code snippets should "just work."

The data files follow a naming convention, as in this example:

``` 1F3LBs_multinomial.csv ```

The "1F" part refers to one factor, i.e., a single independent variable 'X'. The "3L" part refers to three levels, i.e., the factor has values 'a', 'b', and 'c'. The "Bs" part refers to between-subjects, meaning each subject was exposed to only one of the factor's levels. Finally, the "multinomial" part refers to the dependent variable, which in this case, is a polychotomous outcome, i.e., it can take on 'x', 'y', or 'z' as values.

All data files are in long format, with a subject identifier "S" in the leftmost column; independent variables (factors) "X", "X1", or "X2" etc. in the next columns; and the dependent variable (response) "Y" in the last column.

Required Software Tools

Two software tools are required for running the code snippets given in Statistical Analysis and Reporting in R. These tools are R and RStudio. Optionally, data sets are given also as *.jmp files for use with SAS JMP, but this tool is not required.

Related Coursera Course

The author has also created a Coursera MOOC that covers much of the material in Statistical Analysis and Reporting in R. It is taught in the R statistical programming language using the RStudio environment. The course is called Designing, Running & Analyzing Experiments.

Related Independent Study

Previously, the author created an independent study called Practical Statistics for HCI, which covers inferential statistics using the SAS JMP and/or IBM SPSS statistics tools. The independent study is similar to, and largely subsumed by, the Coursera MOOC.

Author's Statistics Papers

1. Wobbrock, J.O. (2017). The relevance of nonparametric and semi-parametric statistics to HCI. Workshop on "Moving Transparent Statistics Forward." ACM Conference on Human Factors in Computing Systems (CHI '17). Denver, Colorado (May 6-11, 2017). Paper No. 2.
2. Wobbrock, J.O. and Kay, M. (2016). Nonparametric statistics in human-computer interaction. Chapter 7 in J. Robertson & M.C. Kaptein (eds.), Modern Statistical Methods for HCI. Switzerland: Springer, pp. 135-170.
3. Wobbrock, J.O. (2011). Practical statistics for human-computer interaction: An independent study combining statistics theory and tool know-how. Annual Workshop of the Human-Computer Interaction Consortium (HCIC '11). Pacific Grove, California (June 14-18, 2011).
4. Wobbrock, J.O., Findlater, L., Gergle, D. and Higgins, J.J. (2011). The Aligned Rank Transform for nonparametric factorial analyses using only ANOVA procedures. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI '11). Vancouver, British Columbia (May 7-12, 2011). New York: ACM Press, pp. 143-146.