Friday, 8 February 2019

Replication Report - Signal Detection Analysis

The following is a report on the reproduction of the statistical work in the paper “Insights into Criteria for Statistical Significance from Signal Detection Analysis” by Jessica K. Witt at Colorado State University.

The original paper was accepted for publication by the Journal of Meta-Psychology, a journal focused on methodology and reproductions of existing work. This report on is my first attempt at establishing a standard template for future such reports according to the Psych Data Standards found here https://github.com/psych-ds/psych-DS



Witt’s “Insights” is a comparison of different methods to make the statistical claims by researchers in psychology and related fields more reproducible. That is, if an effect is reported as statistically significant in one article, what could be done in that original article to better ensure that the effect is real.

Witt uses language from signal detection theory rather than type I and type II errors. Specifically, a hit is a true positive, a miss is a false negative (type II error), a correct rejection is a true negative, and a false alarm is a false positive (type I error). Note that ‘rejection’ does not refer to rejection of a null hypothesis, but a rejection of a signal as meaningful.

Witt’s claims about these rates under simulation are shown clearly in the paper’s 14 figures. The methodology of the simulations was outlined in the text of the paper clearly enough to recreate the simulation code as described and produce the results illustrated.

Below are code snippets made from the simulation methods described in the paper for selected figures. Most of the remaining figures are coded similarly but with different parameters. The results from these reproductions are very similar to those shown in the figures and described in the text.



### 10 studies with cohenD = 0, 10 with cohenD = 0.5
### Hit: 0 effect, fail to detect
### Miss: 0.5 effect, fail to detect
### FA: 0 effect, detect
### CorrRej: 0.5 effect, detect


## Thanks to https://www.r-bloggers.com/bayes-factor-t-tests-part-2-two-sample-tests/
## for the Bayes Factor code
library(BayesFactor)


set.seed(12345)
Nprocess = 100
Nstudies1 = 10
Nstudies2 = 10
pvals = matrix(NA,nrow=Nprocess,ncol=(Nstudies1+Nstudies2))
BFs = matrix(NA,nrow=Nprocess,ncol=(Nstudies1+Nstudies2))



for(process in 1:Nprocess)
{


    for(study in 1:Nstudies1)
    {
        ### For no effect
        cohenD = 0
        groupA = rnorm(n=64,mean=50,sd=10)
        groupB = rnorm(n=64,mean=50-10*cohenD,sd=10)
       
        tt = t.test(groupA,groupB,var.equal=TRUE)
        pvals[process,study] = tt$p.value
        BFs[process,study] = exp(ttest.tstat(tt$statistic, length(groupA), length(groupB))$bf)
       
    }
   
    for(study in Nstudies1+(1:Nstudies2))
    {
        ### For effect cohen's D
        cohenD = 0.5
        groupA = rnorm(n=64,mean=50,sd=10)
        groupB = rnorm(n=64,mean=50-10*cohenD,sd=10)
                   
        tt = t.test(groupA,groupB,var.equal=TRUE)
        pvals[process,study] = tt$p.value
        BFs[process,study] = exp(ttest.tstat(tt$statistic, length(groupA), length(groupB))$bf)
   
    }

if(process %% 10 == 0){print(process)} ## progress indicator
}



### get the data into long form
pvals_H0 = c(pvals[,1:Nstudies1])
BFs_H0 = c(BFs[,1:Nstudies1])
pvals_H1 = c(pvals[,Nstudies1+(1:Nstudies2)])
BFs_H1 = c(BFs[,Nstudies1+(1:Nstudies2)])




##### Figure 1, Hits, etc. at different p-values and Bayes Factors
### CorrRej: 0 effect, fail to detect
### FA: 0 effect, detect
### Miss: 0.5 effect, fail to detect
### Hit: 0.5 effect, detect

alpha = c(.10,.05,.005,.001)
BF_cutoff = c(1,2,3,10)
FA = rep(NA,8)
hits = rep(NA,8)
misses = rep(NA,8)
corrRej = rep(NA,8)
inconclusive = rep(NA,8)

for(k in 1:4)
{
    corrRej[k] =    length(which(pvals_H0 >= alpha[k]))
    FA[k] =         length(which(pvals_H0 <  alpha[k]))
    misses[k] =     length(which(pvals_H1 >= alpha[k]))
    hits[k] =       length(which(pvals_H1 <  alpha[k]))
    inconclusive[k] = 0
}


for(k in 5:8)
{
    corrRej[k] =    length(which(BFs_H0 < 1/BF_cutoff[k-4]))
    FA[k] =         length(which(BFs_H0 >=  BF_cutoff[k-4]))
    misses[k] =     length(which(BFs_H1 < 1/BF_cutoff[k-4]))
    hits[k] =       length(which(BFs_H1 >=  BF_cutoff[k-4]))
    inconclusive[k] = (length(BFs_H0) - corrRej[k] - FA[k]) +
                      (length(BFs_H1) - misses[k] - hits[k])
}



### These results are VERY close to what is shown in Figure 1
hits
FA
misses
corrRej
inconclusive



#### Figure 2 uses the same data and plots Pr(hit|H1) vs Pr(FA|H0)

hit_rate = hits / length(BFs_H0)
FA_rate = FA / length(BFs_H1)

### As expected from Figure 1's results, Figure 2 matches
plot(hit_rate ~ FA_rate, ylim=c(0,1), xlim=c(0,1))




#### AUC calcs  ### I am intentionally putting all the processes together instead of treating them
### in batches of 20 like in the paper. The confidence intervals would be better as a bootstrap.

AUC_x =  FA_rate[order(FA_rate)]
AUC_y =  c(hit_rate[order(FA_rate)],1)[-1]

AUC = sum(AUC_y * diff(c(AUC_x,1))) ### 0.9807 across all processes, agreeing with claimed .97-.98


#### Distance to Perfection ### See AUC calcs

type = c("alpha10","alpha05","alpha005","alpha001","BF1","BF2","BF3","BF10")
dist = (hit_rate - 1)^2 + (FA_rate)^2
cbind(type,dist)
### My results are actually better than the paper's.





#### Regression of BF vs pvalues
set.seed(12345)
Nprocess = 100
Nstudies1 = 10
Nstudies2 = 10

sampSizes = round(seq(from=32,to=2000,length=30))

for(k in 1:length(sampSizes))
{

    thisSize = sampSizes[k]
    pvals = matrix(NA,nrow=Nprocess,ncol=(Nstudies1+Nstudies2))
    BFs = matrix(NA,nrow=Nprocess,ncol=(Nstudies1+Nstudies2))



    for(process in 1:Nprocess)
    {


        for(study in 1:Nstudies1)
        {
            ### For no effect
            cohenD = 0
            groupA = rnorm(n=thisSize,mean=50,sd=10)
            groupB = rnorm(n=thisSize,mean=50-10*cohenD,sd=10)
           
            tt = t.test(groupA,groupB,var.equal=TRUE)
            pvals[process,study] = tt$p.value
            BFs[process,study] = exp(ttest.tstat(tt$statistic, length(groupA), length(groupB))$bf)
           
        }
       
        for(study in Nstudies1+(1:Nstudies2))
        {
            ### For effect cohen's D
            cohenD = 0.5
            groupA = rnorm(n=thisSize,mean=50,sd=10)
            groupB = rnorm(n=thisSize,mean=50-10*cohenD,sd=10)
                       
            tt = t.test(groupA,groupB,var.equal=TRUE)
            pvals[process,study] = tt$p.value
            BFs[process,study] = exp(ttest.tstat(tt$statistic, length(groupA), length(groupB))$bf)
       
        }

    }   
   
    ### get the data into long form
    pvals_H0 = c(pvals[,1:Nstudies1])
    BFs_H0 = c(BFs[,1:Nstudies1])
    pvals_H1 = c(pvals[,Nstudies1+(1:Nstudies2)])
    BFs_H1 = c(BFs[,Nstudies1+(1:Nstudies2)])

    mod_H0 = lm( log(BFs_H0) ~ log(pvals_H0)) 
    print(mod_H0)  ## Figure 4, panels c and d.
    ### The slopes and intercepts produced are very close to the of the values in the plots.

}

No comments:

Post a Comment