Statistics et al.: Open Reviews 1 - Two Meta-Psychology Papers

This is the first post of several in which I publish the peer reviews I have previously given to journals.

There are a few reasons for doing this, but the main one is purely mercenary: I want to get more return for the effort I put into carefully reading and critiquing these articles.

This is new space for me, so I'm going to start with ones that I am the most comfortable publishing (those already published with my name attached in an open review system) gradually to those which are the most potentially problematic (negative reviews for manuscripts that were never published by the client journal or publisher). As such, the first few reviews will be disproportionately positive.

I am actively asking for a social licence from researchers to publish these reviews, so please leave your opinions in the comments.

Context:

These first two reviews were for the Journal of Meta-Psychology, found at https://open.lnu.se/index.php/metapsychology open review, open access journal devoted to replications and reproduction of psychological research.

This is a very new journal, so the bar for acceptance is implicitly low. Furthermore, this review criteria is explicitly centered around scientific soundness, not novelty or scientific importance. That doesn't mean the two following papers are bad, just that they didn't have to be excellent in order to receive my assent.

Computational Reproducibility via Containers in Psychology", by Clyburne-Sherin & Green

Don't be afraid to tell us more about the actual Code Ocean system and how it solves the problems posed in the first part of this manuscript. It's okay to make this more of a whitepaper and less of a meta-study.

I appreciate that you have included extensive screenshots. This will help anyone reading this manuscript on paper or some other environment that needs information to be self-contained, as well as preserve the information for any readers that may encounter this paper after the link has gone stale.

However, I'm not clear on how Figure 10 demonstrates how an interactive widget might work. Is the link to Figure S6 in the discussion section a link to the widget?

One technical concern: What happens if Code Ocean's servers go down some day, or cannot be accessed from some part of the world? Do the papers using this system to be computationally reproducible become non-reproducible? Can a compute capsule be downloaded to a hard drive and used offline?

Some other minor fixes I would recommend are below.

Fix redundancy: "requires overcoming a number of technical hurdles must be overcome"
Fix typesetting error: “assessments in the “reproducible category" Add either a citation for Jupyter or a comment such as "see Jupyter.org".
Missing right bracket for references to Figures 5,6, and 10.
Missing space in "Figure7".
Sections are typically referred to with capitals just as figures are. (e.g. Section 5.2)
Grammatical change: In THE context of discussing Docker

Estimating Population Mean Power Under Conditions of Heterogeneity and Selection for SignificanceJerry Brunner and Ulrich SchimmackUniversity of Toronto Mississauga

First,
Thank you for including your R code, including the random seed you used.

Also, I commend your consideration of non-centrality parameters. That's something that is rare even among statisticians these days, although NCPs are clearly still relevant.

There is one major problem that needs to be addressed before this manuscript is ready for publication.
------------

What is the justification for Principle 2? How do we know that the publication probably is proportional to the power in this way? How does this work for publications that report multiple effects and hypothesis tests, each of a differing amount of power and possibly testing very different things?

Is it that significant results get published and non-significant ones do not, such that the chance of a given results being published is equal to its power?

Does this method work for meta-studies by only considering significant effects?

This is especially problematic because all the other principles are derived from Principle 2, so the justification for it needs to be rock solid.

If you explicitly clarify that either...
...your estimation method is predicated on the assumption that significant estimates are published and non-significant ones are in the 'file-drawer', or that...
...the formula does not consider non-significant results (which might still be published if they appear alongside other significant results, or might be published in a journal like PLoS One, which has a policy of applying no preference to significant results).

, then I will be satisfied.

-----------
Other mathematical issues:
------------
Regarding the cut-off at Z = 6, how is this to be applied in non-normal situations, in which extremely high test statistics may cause similar numerical issues? May I recommend just replacing 6 with some arbitrarily high M that the practitioner or the underlying software can determine based on what can produce reliable convergence to a solution?

"t statistics may be squared to yield F statistics" ... of df2 = 1, only.

"Bonferroni ... joint 0.001 significance." Shouldn't this be 0.05/120 significance? Do you mean 0.001 family-wise?

--------------------
Minor writing recommendations:
-------------------------

"We are well aware of the powerful arguments against the null hypothesis"

The word 'powerful' is already being used in the statistical context here. Is there another word like 'compelling' that could be used to prevent confusion?

"more false". The pedant in me wants to replace this with "farther from true".

"of the Principles by simulation" lower case 'principles'. (appears more than once)

Typo: typicl

Typo: 'and chose the effect size' --> 'and CHOOSE the effect size'

Typo: Forumula 6

Missing Section reference: "We carried out a simulation experiment like the one in Section , which"

Typo: Missing correlation between sample size and effect size (-0.6)

Typo: approximated fairly well BE a gamma distribution. (BY?)

Missing % sign: roughly 84% when it should be 95%.

Typo: simulation is is easiest

---------------
Comments that are not things to fix:
-------------
"A great deal of computation was saved..." this could also have been doing with importance sampling using something close like a truncated exponential.

Your method works just fine, but you may want to look into this computation method as you explore the ideas in this manuscript further.

"The strange fractional degrees of freedom.." non-integral degrees of freedom are not a problem.

Have a look at how R does a two-sample t-test with unequal variances. DF is just another parameter, and mathematically, there's usually no reason why it needs to be an integer.

As for uniroot(), this employs Newton's method of finding a univarate root of a function. It's a perfectly fine function for this case, but for further exploration, have a look at optim() for a generalized multivariate optimizer.

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science

Tuesday, 18 September 2018

Open Reviews 1 - Two Meta-Psychology Papers

Context:

Computational Reproducibility via Containers in Psychology", by Clyburne-Sherin & Green

Estimating Population Mean Power Under Conditions of Heterogeneity and Selection for SignificanceJerry Brunner and Ulrich SchimmackUniversity of Toronto Mississauga

No comments:

Post a Comment