Statistical Analysis and Reporting in R
Jacob O. Wobbrock
[contact]
The Information School
University of Washington
Download
Current Version: 18-Dec-2018
Download ZIP file: Rstats.zip
About
Have you ever needed to do a statistical analysis but not quite known which one to use? Or perhaps you've known the proper analysis, but not known how to translate that into R code? Statistical Analysis and Reporting in R provides an organized set of R code snippets for numerous parametric and nonparametric analyses. These analyses are organized by whether they are single or multiple factors, and between- or within-subjects. Tests of proportion, tests of ANOVA assumptions, and distribution tests are also provided. Generalized linear models, linear mixed models, and generalized linear mixed models are also included. Code snippets for post hoc pairwise comparisons show how to follow up statistically significant main effects and interactions. All code snippets are accompanied by actual data sets. The user can therefore look up the desired analysis, take the provided R code as a starting point, and change the generic variable names as appropriate. Finally, for each R analysis, an English-language statistical result is given of the kind that might appear in a scientific report.
Accompanying Data Sets
All data sets for use with the R code snippets are given in both *.csv and *.jmp file formats. If you're using R and RStudio, you only need the *.csv data files. These data files are contained in a folder called data and divided into subfolders therein. Provided your R code file is in the same directory as the data folder, running the R code snippets should "just work."
The data files follow a naming convention, as in this example:
1F3LBs_multinomial.csv
The "1F" part refers to one factor, i.e., a single independent variable 'X'. The "3L" part refers to three levels, i.e., the factor has
values 'a', 'b', and 'c'. The "Bs" part refers to between-subjects, meaning each subject was exposed to only one of the factor's
levels. Finally, the "multinomial" part refers to the dependent variable, which in this case, is a polychotomous outcome, i.e., it can
take on 'x', 'y', or 'z' as values.
All data files are in long format, with a subject identifier "S" in the leftmost column; independent variables (factors) "X", "X1", or "X2" etc. in the next columns; and the dependent variable (response) "Y" in the last column.
Required Software Tools
Two software tools are required for running the code snippets given in Statistical Analysis and Reporting in R. These tools are R and RStudio. Optionally, data sets are given also as *.jmp files for use with SAS JMP, but this tool is not required.
Related Coursera Course
The author has also created a Coursera MOOC that covers much of the material in Statistical Analysis and Reporting in R. It is taught in the R statistical programming language using the RStudio environment. The course is called Designing, Running & Analyzing Experiments.
Related Independent Study
Previously, the author created an independent study called Practical Statistics for HCI, which covers inferential statistics using the SAS JMP and/or IBM SPSS statistics tools. The independent study is similar to, and largely subsumed by, the Coursera MOOC.
Author's Statistics Papers
Copyright © 2018-2019 Jacob O. Wobbrock. All rights reserved.
Last updated January 6, 2019.