Statistical Analysis and Reporting in R Table of Statistical Analyses

Jacob O. Wobbrock [contact]
The Information School
University of Washington

Download

Current Version: 13-March-2019
Download ZIP file: Rstats.zip

About

Have you ever needed to do a statistical analysis but not quite known which one to use? Or perhaps you've known the proper analysis, but not known how to translate that into R code? Statistical Analysis and Reporting in R provides an organized set of R code snippets for numerous parametric and nonparametric analyses. These analyses are organized by whether they are single or multiple factors, and between- or within-subjects. Tests of proportion, tests of ANOVA assumptions, and distribution tests are also provided. Generalized linear models, linear mixed models, and generalized linear mixed models are also included. Code snippets for post hoc pairwise comparisons show how to follow up statistically significant main effects and interactions. All code snippets are accompanied by actual data sets. The user can therefore look up the desired analysis, take the provided R code as a starting point, and change the generic variable names as appropriate. Finally, for each R analysis, an English-language statistical result is given of the kind that might appear in a scientific report.

Accompanying Data Sets

All data sets for use with the R code snippets are given in *.csv file format. These data files are contained in a folder called data and divided into subfolders therein, which correspond to the *.R code files included. Provided your R code files are in the same directory as the data folder, running the R code snippets should "just work."

The data files follow a naming convention, as in this example:

1F3LBs_multinomial.csv

The "1F" part refers to one factor, i.e., a single independent variable 'X'. The "3L" part refers to three levels, i.e., the factor has values 'a', 'b', and 'c'. The "Bs" part refers to between-subjects, meaning each subject was exposed to only one of the factor's levels. Finally, the "multinomial" part refers to the dependent variable, which in this case, is a polychotomous outcome, i.e., it can take on 'x', 'y', or 'z' as values.

All data files are in long format, with a subject identifier "S" in the leftmost column; independent variables (factors) "X", "X1", or "X2" etc. in the next columns; and the dependent variable (response) "Y" in the rightmost column.

Required Software Tools

Two software tools are required for running the code snippets given in Statistical Analysis and Reporting in R. These tools are R and RStudio.

Related Coursera Course

The author has also created a Coursera MOOC that covers much of the material in Statistical Analysis and Reporting in R. It is taught in the R statistical programming language using the RStudio environment. The course is called Designing, Running & Analyzing Experiments.

Related Independent Study

Previously, the author created an independent study called Practical Statistics for HCI, which covers inferential statistics using the SAS JMP and/or IBM SPSS statistics tools. The independent study is similar to, and largely subsumed by, the Coursera MOOC.

Author's Statistics Papers

  1. Wobbrock, J.O. (2017). The relevance of nonparametric and semi-parametric statistics to HCI. Workshop on "Moving Transparent Statistics Forward." ACM Conference on Human Factors in Computing Systems (CHI '17). Denver, Colorado (May 6-11, 2017). Paper No. 2.
  2. Wobbrock, J.O. and Kay, M. (2016). Nonparametric statistics in human-computer interaction. Chapter 7 in J. Robertson & M.C. Kaptein (eds.), Modern Statistical Methods for HCI. Switzerland: Springer, pp. 135-170.
  3. Wobbrock, J.O. (2011). Practical statistics for human-computer interaction: An independent study combining statistics theory and tool know-how. Annual Workshop of the Human-Computer Interaction Consortium (HCIC '11). Pacific Grove, California (June 14-18, 2011).
  4. Wobbrock, J.O., Findlater, L., Gergle, D. and Higgins, J.J. (2011). The Aligned Rank Transform for nonparametric factorial analyses using only ANOVA procedures. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI '11). Vancouver, British Columbia (May 7-12, 2011). New York: ACM Press, pp. 143-146.

Copyright © 2018-2019 Jacob O. Wobbrock. All rights reserved.
Last updated March 13, 2019.