Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Wednesday 14 August 2019

Replication report: Multilevel-Linear-Models using SAS and SPSS for repeated measures designs


The following is a report on the reproduction of the statistical work in the paper “Differences of Type I error rates for ANOVA and Multilevel-Linear-Models using SAS and SPSS for repeated measures designs" by Nicolas Haverkamp and AndrĂ© Beauducel at the University of Bonn.


The original paper was accepted for publication by Meta-Psychology, https://open.lnu.se/index.php/metapsychology , a journal focused on methodology and reproductions of existing work. This report is part of my continued attempts at establishing a standard template for future such reports according to the Psych Data Standards found here https://github.com/psych-ds/psych-DS


Haverkamp and Beauducel's paper is exploration of the robustness of hypothesis tests of different Hierarchical Linear Models (HLMs), which the authors call Multilevel-Linear-Models (MLMs) under some assumption violations. Specifically, violation of the sphericity assumption.

Synthetic data was generated under two situations: an ideal situation and a violation situation.

For the ideal situation, the authors used equal factor loadings of .50 for all 9 or 12 dimensions, depending on the test being done. This represents a correlation of .50 between one measure and each of the other repeated measures. They did this according to equation (1) in the paper.

For the violation situation, they instead changed the loadings, and correlation between measures, to .80, but only for odd-numbered measurements (5 such measurements for m=9, and 6 such measurements for m=12). This is outlined in equation (5).

A collection of analyses of the synthetic data were performed for up-to-date versions of both SAS and SPSS. The SAS code and SPSS syntax were included as additional files with the paper submission, which makes replication straightforward to anyone who has access to recent versions of both SAS and SPSS. Note that although SPSS is primarily a GUI-based software package, using SPSS syntax effectively allows it to be treated like a code-based software package. This is because syntax is a record of the buttons pressed and settings chosen such that it can be saved, loaded, and run like any other program.

Running the provided code and syntax produced results that were identical up to rounding error to those presented in Figures 1 through 4, as well as Table 3. Therefore, this work by Haverkamp and Beauducel is replicable and is ready for publication in Meta-Psychology.

However, there are a few concerns that should be noted for guidance for future work, and these concerns regard accessibility.

While the code can be run by anyone with access to both SAS and SPSS, that demographic is mainly limited to those working in large universities with extensive software licenses. Full-version, single-user licenses of each of these software packages can run in the thousands of dollars. As such, while the work can be replicated, it can be a major effort to do so for some researchers.

The authors justify this by citing the lack of support for the desired analyses in R. So a take-home message might be that there is either an unmet demand for Hierarchical Linear Models - Repeated Measures support in R, or that the support is hard to find, or that the authors missed something. However, the authors also documented their search for HLM support across not just SAS and SPSS, but also R, and STATA. So that would suggest they have done some due diligence in this respect.

For future work in R, I would direct the reader to the R package "scdhlm: Estimating Hierarchical Linear Models for Single-Case Designs" https://cran.r-project.org/web/packages/scdhlm/index.html

Additionally, the values from Figures 1 to 4 needed to be collected from a plot digitizer, such as DigitizeIt, as these values didn't show up in numerical form in the paper. While a single-user for a license for a plot digitizer is not nearly as expensive, the entire step could have been avoided if the information from each figure was presented as a table.

Previous replication reports: 



No comments:

Post a Comment