Statistical Analysis and Reporting in R
Jacob O. Wobbrock
[contact]
The Information School
University of Washington
Download
Latest Update: 14-Mar-2022
Download just R code recipes: PDF
Download ZIP file (data+code): ZIP
About
Have you ever needed to do a statistical analysis but not been quite sure which one to use? Or perhaps you've known the proper analysis, but not known how to translate it into R code? Statistical Analysis and Reporting in R provides an organized set of R code recipes for numerous analyses. These analyses are organized by whether they involve single or multiple factors, are between- or within-subjects, are for main effects, interactions, or post hoc contrasts, or are parametric or nonparametric. Tests of proportion and association, tests of ANOVA assumptions, and data distribution tests are also included. Generalized linear models, linear mixed models, and generalized linear mixed models are also included. All code recipes are accompanied by generic data sets and runnable R code files. You can therefore look up your desired analysis, take the provided R code as a starting point, and change the generic variable names as needed. Finally, for each R analysis, an English language statistical result is given of the kind that might appear in a scientific report, often including a graph made with ggplot2.
Accompanying Data Sets
All data sets for use with the R code recipes are given in *.csv file format in the ZIP download. These data files are contained in a folder called data and divided into subfolders therein, which correspond to the *.R code files included. Provided your R code files are in the same directory as the data folder, running the R code recipes should "just work."
The data files follow a naming convention, as in this example:
1F3LBs_multinomial.csv
The "1F" part refers to one factor, i.e., a single independent variable 'X'. The "3L" part refers to three levels, i.e., the factor has
levels 'a', 'b', and 'c'. The "Bs" part refers to between-subjects, meaning each subject was exposed to only one of the factor's
levels. Finally, the "multinomial" part refers to the dependent variable, which in this case, is a polytomous response, i.e., it can
take on 'x', 'y', or 'z' as values.
All data files are in long format, with a subject identifier "S" in the leftmost column; independent variables (factors) "X", "X1", or "X2" etc. in the next columns; and the dependent variable (response) "Y" in the rightmost column.
Required Software Tools
Two software tools are required for running the code snippets given in Statistical Analysis and Reporting in R. These tools are R and RStudio. You should install R first and then RStudio.
Related Coursera Course
I have also created a Coursera course that covers much of the material herein. It is taught in the R statistical programming language using the RStudio environment. The course is called Designing, Running & Analyzing Experiments.
Related Independent Study
Previously, I created an independent study called Practical Statistics for HCI, which covers inferential statistics using the SAS JMP and IBM SPSS statistics tools. The independent study is similar to, and largely subsumed by, my Coursera course.
Author's Statistics Papers
Copyright © 2018-2022 Jacob O. Wobbrock. All rights reserved.
Last updated November 15, 2022.