The following is a report on the reproduction of the statistical work in the paper “Insights into Criteria for Statistical Significance from Signal Detection Analysis” by Jessica K. Witt at Colorado State University.

The original paper was accepted for publication by the Journal of Meta-Psychology, a journal focused on methodology and reproductions of existing work. This report on is my first attempt at establishing a standard template for future such reports according to the Psych Data Standards found here https://github.com/psych-ds/psych-DS

Witt’s “Insights” is a comparison of different methods to make the statistical claims by researchers in psychology and related fields more reproducible. That is, if an effect is reported as statistically significant in one article, what could be done in that original article to better ensure that the effect is real.

Witt uses language from signal detection theory rather than type I and type II errors. Specifically, a hit is a true positive, a miss is a false negative (type II error), a correct rejection is a true negative, and a false alarm is a false positive (type I error). Note that ‘rejection’ does not refer to rejection of a null hypothesis, but a rejection of a signal as meaningful.

Witt’s claims about these rates under simulation are shown clearly in the paper’s 14 figures. The methodology of the simulations was outlined in the text of the paper clearly enough to recreate the simulation code as described and produce the results illustrated.

Below are code snippets made from the simulation methods described in the paper for selected figures. Most of the remaining figures are coded similarly but with different parameters. The results from these reproductions are very similar to those shown in the figures and described in the text.

### 10 studies with cohenD = 0, 10 with cohenD = 0.5

### Hit: 0 effect, fail to detect

### Miss: 0.5 effect, fail to detect

### FA: 0 effect, detect

### CorrRej: 0.5 effect, detect

## Thanks to https://www.r-bloggers.com/bayes-factor-t-tests-part-2-two-sample-tests/

## for the Bayes Factor code

library(BayesFactor)

set.seed(12345)

Nprocess = 100

Nstudies1 = 10

Nstudies2 = 10

pvals = matrix(NA,nrow=Nprocess,ncol=(Nstudies1+Nstudies2))

BFs = matrix(NA,nrow=Nprocess,ncol=(Nstudies1+Nstudies2))

for(process in 1:Nprocess)

{

for(study in 1:Nstudies1)

{

### For no effect

cohenD = 0

groupA = rnorm(n=64,mean=50,sd=10)

groupB = rnorm(n=64,mean=50-10*cohenD,sd=10)

tt = t.test(groupA,groupB,var.equal=TRUE)

pvals[process,study] = tt$p.value

BFs[process,study] = exp(ttest.tstat(tt$statistic, length(groupA), length(groupB))$bf)

}

for(study in Nstudies1+(1:Nstudies2))

{

### For effect cohen's D

cohenD = 0.5

groupA = rnorm(n=64,mean=50,sd=10)

groupB = rnorm(n=64,mean=50-10*cohenD,sd=10)

tt = t.test(groupA,groupB,var.equal=TRUE)

pvals[process,study] = tt$p.value

BFs[process,study] = exp(ttest.tstat(tt$statistic, length(groupA), length(groupB))$bf)

}

if(process %% 10 == 0){print(process)} ## progress indicator

}

### get the data into long form

pvals_H0 = c(pvals[,1:Nstudies1])

BFs_H0 = c(BFs[,1:Nstudies1])

pvals_H1 = c(pvals[,Nstudies1+(1:Nstudies2)])

BFs_H1 = c(BFs[,Nstudies1+(1:Nstudies2)])

##### Figure 1, Hits, etc. at different p-values and Bayes Factors

### CorrRej: 0 effect, fail to detect

### FA: 0 effect, detect

### Miss: 0.5 effect, fail to detect

### Hit: 0.5 effect, detect

alpha = c(.10,.05,.005,.001)

BF_cutoff = c(1,2,3,10)

FA = rep(NA,8)

hits = rep(NA,8)

misses = rep(NA,8)

corrRej = rep(NA,8)

inconclusive = rep(NA,8)

for(k in 1:4)

{

corrRej[k] = length(which(pvals_H0 >= alpha[k]))

FA[k] = length(which(pvals_H0 < alpha[k]))

misses[k] = length(which(pvals_H1 >= alpha[k]))

hits[k] = length(which(pvals_H1 < alpha[k]))

inconclusive[k] = 0

}

for(k in 5:8)

{

corrRej[k] = length(which(BFs_H0 < 1/BF_cutoff[k-4]))

FA[k] = length(which(BFs_H0 >= BF_cutoff[k-4]))

misses[k] = length(which(BFs_H1 < 1/BF_cutoff[k-4]))

hits[k] = length(which(BFs_H1 >= BF_cutoff[k-4]))

inconclusive[k] = (length(BFs_H0) - corrRej[k] - FA[k]) +

(length(BFs_H1) - misses[k] - hits[k])

}

### These results are VERY close to what is shown in Figure 1

hits

FA

misses

corrRej

inconclusive

#### Figure 2 uses the same data and plots Pr(hit|H1) vs Pr(FA|H0)

hit_rate = hits / length(BFs_H0)

FA_rate = FA / length(BFs_H1)

### As expected from Figure 1's results, Figure 2 matches

plot(hit_rate ~ FA_rate, ylim=c(0,1), xlim=c(0,1))

#### AUC calcs ### I am intentionally putting all the processes together instead of treating them

### in batches of 20 like in the paper. The confidence intervals would be better as a bootstrap.

AUC_x = FA_rate[order(FA_rate)]

AUC_y = c(hit_rate[order(FA_rate)],1)[-1]

AUC = sum(AUC_y * diff(c(AUC_x,1))) ### 0.9807 across all processes, agreeing with claimed .97-.98

#### Distance to Perfection ### See AUC calcs

type = c("alpha10","alpha05","alpha005","alpha001","BF1","BF2","BF3","BF10")

dist = (hit_rate - 1)^2 + (FA_rate)^2

cbind(type,dist)

### My results are actually better than the paper's.

#### Regression of BF vs pvalues

set.seed(12345)

Nprocess = 100

Nstudies1 = 10

Nstudies2 = 10

sampSizes = round(seq(from=32,to=2000,length=30))

for(k in 1:length(sampSizes))

{

thisSize = sampSizes[k]

pvals = matrix(NA,nrow=Nprocess,ncol=(Nstudies1+Nstudies2))

BFs = matrix(NA,nrow=Nprocess,ncol=(Nstudies1+Nstudies2))

for(process in 1:Nprocess)

{

for(study in 1:Nstudies1)

{

### For no effect

cohenD = 0

groupA = rnorm(n=thisSize,mean=50,sd=10)

groupB = rnorm(n=thisSize,mean=50-10*cohenD,sd=10)

tt = t.test(groupA,groupB,var.equal=TRUE)

pvals[process,study] = tt$p.value

BFs[process,study] = exp(ttest.tstat(tt$statistic, length(groupA), length(groupB))$bf)

}

for(study in Nstudies1+(1:Nstudies2))

{

### For effect cohen's D

cohenD = 0.5

groupA = rnorm(n=thisSize,mean=50,sd=10)

groupB = rnorm(n=thisSize,mean=50-10*cohenD,sd=10)

tt = t.test(groupA,groupB,var.equal=TRUE)

pvals[process,study] = tt$p.value

BFs[process,study] = exp(ttest.tstat(tt$statistic, length(groupA), length(groupB))$bf)

}

}

### get the data into long form

pvals_H0 = c(pvals[,1:Nstudies1])

BFs_H0 = c(BFs[,1:Nstudies1])

pvals_H1 = c(pvals[,Nstudies1+(1:Nstudies2)])

BFs_H1 = c(BFs[,Nstudies1+(1:Nstudies2)])

mod_H0 = lm( log(BFs_H0) ~ log(pvals_H0))

print(mod_H0) ## Figure 4, panels c and d.

### The slopes and intercepts produced are very close to the of the values in the plots.

}

## No comments:

## Post a comment