Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Saturday 9 January 2016

Statistical Analysis: An Essay with Numbers

Statistical Analysis: An Essay with Numbers This is an attempt to make statistics more accessible to people from the humanities and social sciences by describing a statistical analysis as an essay, instead of something more arcane.

A complete essay includes:

  1. A topic of which the essay is about.
  2. A thesis, which is a claim you make at the beginning and back up later with…
  3. Descriptions and extra information to clarify your point, which motivates your…
  4. Supporting evidence, which draws on work from external sources, lending weight to your…
  5. Conclusion and summary of the essay.

A statistical analysis includes:
  1. Some data upon which the analysis is performed.
  2. A hypothesis, which is a claim you are checking to see if you can make.
  3. Descriptive statistics, which support your ability to make assumptions and do a…
  4. Hypothesis test, which draws on math from external sources and lets you provide a…
  5. Conclusion and explanation of the results.

In greater detail:
  1. The purpose of essays and analyses is to improve understanding of their topic or data respectively. They both should save the reader the trouble of having to dig through the raw information to find the important parts.
  2. Hypo-thesis means ‘less than thesis’. A hypothesis is a claim we could be making; a thesis is one we are making. It’s less than a full thesis until we test it.

    In an essay, we want the thesis statement to be packed into a single sentence or phrase. That statement should be explicit/clear but short. A thesis statement has formality to it. It should be easy to spot. A hypothesis is also very short and exact; it has rules. A hypothesis of no difference, or null hypothesis, is almost always in the form “this equals that” or “all of these are equal”, but with certain Greek letters.

  3. The term ‘descriptive stats’ is being stretched to include all key information about the data. For example “Is it in categories, if so do those categories have a specific order like disagree-neutral-agree? Are there any pieces of data that are much different from the rest?” You need facts like these to understand what the data is, and know what to do with the data to test your hypothesis.

    Descriptive statistics is an ideal starting point for learning stats, because the information is self contained and people usually have some exposure to it already. We can say “the average height of a Canadian adult male is 175cm” and be understood even with training in stats.

  4. Hypothesis tests (or statistical tests) are used to determine if the data matches the hypothesis. In a statistical analysis we might say “according to Fisher’s Exact Test these two things are not independent”, Fisher’s Exact Test is the name of the test, and we’re referring to it much like we would refer to external source like “According to Fisher, Romeo ‘doesn’t really like Juliet’ (page 213)”. In both cases, the external reference is used to make the claim more legitimate.

    If the data doesn’t match the hypothesis, we reject that hypothesis. However, like in most sciences, we can almost never prove a hypothesis correct, but only disprove or fail to disprove it. For example, if your hypothesis was that unicorns don’t exist, it could be disproved by finding one, but not proven the hypothesis by failing to find a unicorn.

  5. An essay’s conclusion should reword the thesis statement and summarize the essay body around it. The conclusion of a statistical analysis should state whether the hypothesis is to be rejected and include any concerns or observations relating to that decision.