Saturday, 23 March 2019

What makes a good data dictionary?


A data dictionary is a guide, external to the dataset in question, that explains what each variable is in a human-readable format. In R, the programming equivalent of a data dictionary is the result of the str() function, which will show the first few values of each variable, the format (e.g. numerical, string, factor), and other important information (e.g. the first few levels of the factor).

A data dictionary may include software-specific features, but it should still be value to anyone using the dataset, regardless of the software they are using.

Other than that, what makes a good data dictionary?


Thursday, 21 February 2019

Reading Assignments - Split-Plot Design, Magnitude-Based Inference

This semester, I'm teaching a new (to me) course that's heavy into design of experiments and biostatistics, which means I needed some new reading assignments. First, a survey of applications of split-plot designs for fisheries. Next, a seminal paper on magnitude-based inference, written for physiologists. Non-paywalled links to the papers included.


Sunday, 17 February 2019

Alternatives to the P value



Here are some things you can find and report as alternatives to the p-value: confidence intervals, Bayes factors, and magnitude-based interference. These are used mostly the same situations, but all of them are more informative especially in combination with the P value.

Tuesday, 12 February 2019

Lingering questions from the 2018 MLB season

Here's a few more comments and ongoing questions about Major League Baseball that I wanted to post but didn't fit anywhere else. Just a few more weeks until spring training!

Inside: Pitch count superstitions, base coach evaluation, WAR in blowouts, and anecdotes of SafeCo.


The Vestigial Reference Letter

The graduate school reference letter is a holdover from a time when academics was much more of a closed off clique than the relatively open network of today. Back then, references served to keep the world of higher education reserved for insiders. Nowadays, these letters are vestigial and purposeless; they only serve to waste the time of applicants, referees, and admissions people.


Friday, 8 February 2019

Replication Report - Signal Detection Analysis

The following is a report on the reproduction of the statistical work in the paper “Insights into Criteria for Statistical Significance from Signal Detection Analysis” by Jessica K. Witt at Colorado State University.

The original paper was accepted for publication by the Journal of Meta-Psychology, a journal focused on methodology and reproductions of existing work. This report on is my first attempt at establishing a standard template for future such reports according to the Psych Data Standards found here https://github.com/psych-ds/psych-DS

Saturday, 26 January 2019

I read this: Smart Baseball

Smart Baseball, by Keith Law, exposes many common baseball statistics like batting average for the garbage they are. He does this in an approachable, narrative style that will appeal to long-time baseball fans. It's very light on statistics, but Law obviously knows his stuff; he just isn't interested in showing off the details.