Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Tuesday, 8 March 2022

Reading Assignment – Collecting Carefully.

This is a reading assignment for “Episode 28: Collect Carefully” of “The Data Science Ethics Podcast”, available at: https://datascienceethics.com/podcast/collect-carefully/ . My eventual hope is to incorporate it into a course on data ethics and (AI) safety, but that's still a long way from being anything solid. From recent interviews with post secondary institutions, I heard that a lot of schools are looking to incorporate ethics into their stats and data science courses, so I hope this and some of my future posts can contribute to those efforts.


Thursday, 17 February 2022

Peer review of "Algorithmically deconstructing shot locations as a method for shot quality in hockey"

This is an open peer review I did for the manuscript "Algorithmically deconstructing shot locations as a method for shot quality in hockey" by Devan G. Becker, Douglas G. Woolford and Charmaine B. Dean as submitted to the Journal of Quantitative Assessment in Sport, back in 2020.

You can find the manuscript behind a $42 paywall put up by DeGruyter at https://www.degruyter.com/document/doi/10.1515/jqas-2020-0012/html


Tuesday, 19 October 2021

Sampling, conditional probability, and random number generation

 Part of the motivation behind making the course Statistics and Gambling is to infuse new applicability into introductory or intermediate probability courses. This blog post is a look at how the course is going to cover familiar probability topics with examples in games of chance, and a simulation-based (rather than theory-based) approach.

This post covers basic methods of random number generation (RNG) in R, and applying RNG to demonstrate core concepts in sampling, conditional probability, and conditional distributions. It is meant to be a very surface-level primer on the topics, just enough to give context for the deeper dives into specific games of chance.

Saturday, 26 June 2021

The Bottleneck Retirement Plan

I do not have a voluntary retirement plan or a pension. I have the means to put money away specifically for retirement, but I choose not to. Instead, I use an investment strategy that has been described as "the most and least insane thing I've ever heard". Here is that strategy:


Sunday, 11 April 2021

Wow, what are the odds? (Part 1: American Odds, Decimal Odds, and Implied Probability)

The term "odds" is slippery because it's used to mean different things in different contexts. In layperson terms, "odds" is often used as a synonym for probability. In proper statistical terms, "odds" is a function of probability, but it's not the same as probability. There are also other uses of the term "odds" in gambling contexts which are functions of a parallel concept called "implied probability". In these notes, we're going to look at some common types of odds in statistics and gambling contexts, and some of the calculations to convert between them.


Sunday, 14 February 2021

How does Polychoric Correlation Work? (aka Ordinal-to-Ordinal correlation)

Let's say you've got data of many paired cases of two ordinal variables, like you might when you ask a large number of people the same two Likert scale questions (e.g. "poor", "fair", "good", "very good", "excellent").


What could you learn from the data from those two questions?

Here's a few common approaches:


1) Compare the means of each variable by abusing a t-test.

2) Compare the distribution of each variable with a chi-squared goodness-of-fit test.

3) Check for a relationship between responses of each variable with a chi-squared independence test.

4) Estimate the strength of such a relationship with a Spearman correlation.

Saturday, 30 January 2021

Lottery tickets, baseball cards, and the coupon collector's problem

A man, a woman, an enby, and 27 elephants walk into a bar. The bartender looks at the group of 30 and asks "What is the probability that 2 or more of you have the same birthday?".

The group proceeds to disregard twins, leap years, and building codes. They talk amongst themselves, leave a little space for the reader to calculate or guess for themselves,