Statistics et al.: June 2017

Saturday 24 June 2017

R Seminar on programming with vectors

This is the first of five seminars that was given at Simon Fraser University. I've included this one in the blog because it makes for, hopefully, an excellent tutorial for intermediate users of R.

Inside:

- Vector operations
- Building a matrix
- Matrix operations
- Arrays, data frames, and data tables
- Indices
- For loops
- For loops on indices
- Apply functions
- Binding
- The permutation test – two sample t
- User-defined functions
- The replicate() function
- The by() function
- The combn() function - Permutation Test - All Combinations
- Synthesis – permutation test for ANOVA

Assorted Sports Questions

These are some problems I'd like to posit to the sports analytics community, as food for thought and future research starters.

The scheduling of 31 teams (NHL Ice Hockey)

What scheduling problems, will arise in the national hockey league by having a prime number of teams? It must restrict their options for the number of games in a season, at the very least.

Are there even any other major leagues with odd numbers of teams? Leagues without a divisional structure such as Barclay's Premier League in soccer could get away with it, but they have the complications of Champion's League promotion and of relegation to deal with.

It's not just about creating a set of pairings so that every team has the same number of home and away games as well as games against same-division and same-conference opponents. There is also scheduling to consider, in that the limited number of weekend nights are preferable for business purposes, and that teams must be able to physically be in the same location at the same time, presumably with a rest buffer. There were already issues with the 30-team, 82-schedule because of injury risk (and greater injury consequence in terms of games missed). Adding a team will compound this, and the forced asymmetry of a prime number of teams reduces the flexibility of the schedule to account for weekends and rest.

As an aside, look to Balanced Incomplete Block Designs as a basis for designing team schedules.

Ratings for Sports Officials (All Sports)

In baseball, the location of the pitch as it crosses home plate is recorded for every throw, so people can tell where each umpire considers the strike zone to be, and the level of consistency of that zone.

What ways are there to evaluate the accuracy and consistency of officials in other sports? Tracking live decisions vs after the fact decisions from recordings? Could you do it as a 'deviation from superhuman AI' metric like how professional chess players are sometimes rated? For example, we could probably existing trackers (which are 2D for players and 3D for the ball) in NBA basketball to check the level of enforcement of traveling.

What are the ethical and behaviour implications of tracking and reporting officials' performance statistics like we do with players? Will officials lose focus on the game if they are also concerned with their stats? Would there be sufficient value to sports to do this?

Playoff Match Draft (All Sports, NHL Hockey example)

Instead of having first-round playoff match-ups determined solely by seed, matches should be determined by draft. Take the national hockey league (please). The NHL is split into two conferences of 15 and 16 teams respectively. In each conference, the 1st seed team plays the 8th seed team (Ignoring wild card complications), the 2nd seed team plays the 7th, 3rd plays 6th, and 4th plays 5th. If a team's skill were one-dimensional, this setup makes sense and the team that does best in the regular season is rewarded with the best chance to advance by playing against the weakest qualifying team.

Reality is messier. This setup occasionally leads to teams being punished for playing well in the regular season by being played against a team that they do particular poor against in the first round the playoffs.

Imagine a playoff draft instead. The eight teams in each conference qualify for the playoffs as before, but instead the top seeded team CHOOSES their opponent from the other seven qualifying teams. Then, the top seeded team among the remaining 6 chooses their opponent from the remaining 5 and so on. By default, teams could always select the lowest seeded available opponent, which leads to the same 1-8, 2-7, 3-6, and 4-5 pairings as the current setup. However, doing very well (3rd or better) in the regular season earns you some discretion if there's an opponent that counters you that you would like to avoid.

There would be absolutely no reason for a team to covet a lower position over a higher one. There would be no implication of strategically losing games at the end of the season, because there would be no conceivable reward in it.

A playoff draft also adds the potential for dramatic scenarios. If a team choose anyone other than the default lowest seed opponent, that implies a lot of confidence in being able to beat that opponent specifically. A lot of bragging rights come from a draftee beating their drafter. Would a team draft a stronger but less physical team to reduce their injury risk for the 2nd round, if it happens.

Would a team deliberately develop a reputation for being physical in the hopes of being drafted later against a weaker team?

Home Team Advantage (MLB Baseball)

Is batting last in baseball really an advantage? Sure, the team who bats last has more information when they do bat, which can provide a strategic advantage. However, that information advantage is nothing like it is T20 or One-Day cricket, in which each team only bats once, and teams still opt to bat first sometimes. Also, in baseball, batting last means always pitching for 9 innings in non-tied games; that's on average 6% more pitching than the other team needs to do. Is the information in the current game worth the extra pitching fatigue from the next game?

In Major League Baseball, games are usually played in 3 or 4 game series between opponents. The home team is given the supposed advantage of batting last for all of those games. Would it be a greater advantage to bat first for the first 1 or 2 games? Even if it did, would it be 'better' to increase the home team advantage?

Designated Hitters (MLB Baseball)

What would happen if the catcher didn't have to bat, and was also replaced with a designated hitter like the American league pitcher? Would that speed up the game, or is the time to change gear minimal? Would it lead to more hits by pitch because of the reduced opportunity for retaliation?

Does there need to be 9 players in a batting lineup, or can the designated hitter simply be removed in favour of an 8-player lineup? What second-order effects to the typical roster would there be by eliminating the designated hitter?

Thursday 8 June 2017

Reflections on teaching two similar 200-level service courses.

This is about two courses at Simon Fraser University that appear very similar: Stat 201 and Stat 203.

These courses are very similar in that they are 200-level service courses (meaning they are for non-majors). They are introductory courses that cover the fundamentals of descriptive statistics, sampling, probability, hypothesis testings, and t-tests. Both courses are equivalent as per-requisites for the 300-level service courses or for fulfilling graduation requirements.

Both classes were offered as a combination of 2 hours/week of lecture one day, and 1 hour/week of lecture another day, with drop-in workshop support for assignments and studying.

One could be forgiven for treating them as different sections of the same course, which is exactly what I did.

However, one class is titled “Stat 203: Statistics for Social Sciences”, uses SPSS, and is a service course for the sociology and anthropology departments. The other is titled “Stat 201: Statistics for Life Sciences”, uses R, and is a service course for the biology and environmental sciences. This schism not in content but in audiences is what makes these courses different in ways I didn't expect.

The 201 class had much higher classroom engagement, higher attendance, and even a better reaction to my awful jokes. More measurably, the 201 class also had an average of .75 grade points higher than their 203 counterparts; the 201s received a B+ on average, and the 203s received an average of between a C+ and a B-. Unsurprisingly in this context, the 201 students rated me much higher (4.5/5 vs 3.5/5) in their teacher evaluations.

The themes in the written answers were essentially the same, although my weaknesses were mentioned more by the 203 students. The first word count here is for Stat 201 and the second for Stat 203. I've removed a few common but uninformative words like “Jack”, “Davis”, “course”, and the usual grammar stop words.

Word cloud for evaluations from Stat 201: Stats for Life Sciences

Word cloud for evaluations from Stat 203: Stats for Social Sciences

There's a few lessons to be learned about the mistake of treating two different courses like these as if they were the same, but it's hard to articulate, so forgive me if I stumble.

First, teach (or present, or write) for the audience you have, as opposed to generically. There's a quote that floats around in B.Ed. programs, “I taught, but the students didn't learn” (See Alfie Kohn's article, http://www.alfiekohn.org/article/teach-learn/ ), and how this is a poor attitude for an educator, or that the focus should be on the result, not the process. In other terms, material MUST be suited to the audience to be effective. For me, it would be best to draw from some new sources or sacrifice some depth for more fundamental examples before I deliver Stat 203 again.

Another possibility is to hold practice session for exams, or offer more hints for assignment questions. Since I delivered this course, my exam practice material has gotten much more extensive. For example, the Midterm 1 practice material is now more than 15 pages long, and includes a partial key.

Another key moral: Teaching is a service first, and a means of research and personal growth after that. In Mastery: The Keys to Success and Long-Term Fulfillment, by George Leonard, there's a story about the author's time as a trainer of fighter pilots. In the story, the author spends extra time further developing two already-talented pilots at the expense of the other, less apt, pilots under his charge. From a value-added perspective, the trainer had only done half his job, because the novice pilots could have benefit far more per hour from the trainer's attention than the ace pilots. It's possible that I committed the same fault without being aware of it, and ended up giving more attention to the Stat 201 students than they needed and ended up leaving the Stat 203 students behind as a result.

Also, as an artifact of the timing of the exams in each class, the 203 students got the harder exams than the 201 students, which is the opposite of what should have happened. I wrote my exams in the order that they would be administered, and it happened to be that the Stat 201 midterms and final all came before the Stat 203 equivalents. I wanted to make the exams different but equivalent, and the easiest way to do this was to create a question for one exam, and then change the numbers and/or scenario for the question for the second exam, and add a twist. Most of the time, adding a twist meant increasing the complexity of the exam. I justified this with the assumption that the later 203s would have additional information about the exam from the 201s that had taken a very similar exam a few days prior; this assumption was wrong.

Another resource I should be using is live student feedback. I've been using a learning management system called TopHat, and it's taken me a while to make good use of it. TopHat allows students to answer questions live in lecture (or after the lecture, if the prof doesn't want to deal with excuses for absences) through their mobile devices. I've rarely used it for student opinion polls, but doing so would be a good way to effectively adapt material, or at a minimum give students a chance to anonymous voice concerns.

I don't want to dismiss the 203s as simply weaker in statistics; that shuts the door to finer optimizations. Instead, it would be better think of there being some barrier I haven't broken through yet, and to try to identify that.

On the flip side, what I'm doing with the 201 students seems to work well on the surface, but it's not optimal either. I'm wasting an opportunity to challenge them or push them to work towards greater learning. We'll see though, it's possible that the 201 course being in the mid-afternoon played a role, as well as the its location on a secondary campus. Being on a secondary campus, my coordinator hypothesizes that more dedicated students selected that course because others would have been deterred by the extra commute.

For all the gloom that this reflection may present, I would call this semester and the teaching of these two classes a success. It was a substantial improvement both in outcomes and in workload of the semesters before, and over the Stat 203 class that delivered in Summer 2012.

One particularly bright spot was Crowdmark, a grading platform we started using for assignments and exams. The assignments had some technical growing pains, but for exams, Crowdmark was fantastic. Each exam is given a QR code at the top of each page, which allows that page to be separated from the rest of the exam digitally. The questions could be distributed out to markers just by having them log in and grade, and the platform is equipped with hotkeys to allow them to put the… same… comment… on… hundreds… of… incorrect… answers… by writing the comment once and using a couple of keystrokes on each question. The students then receive a pdf of their exam with the marker's annotations.

Also, it keeps a record of how each student did on each question, rather than looking at the exam scores in aggregate. This means I can look back and see which questions are doing the best for appropriate difficulty and discriminatory power. I can apply item response theory to the results. I can even use the data for my future research on improving the exam experience.

About the Author

Jack Davis is a teaching professor in Statistics at the University of Waterloo, Canada.

Their research spans statistics in sport, data mining, adult education and Bayesian computation.

They have a course called "Statistics, Gambling, and Games of Chance" on Udemy.

They also pretend to be an expert in writing and game design, which is why they wrote a textbook called "Writing for Statisticians" and why they programmed a video game called "Doc Logic" for Xbox 360.

They enjoy chess and chess variants, but they are so, so, very bad at them. They want to try living on a houseboat sometime.

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science

Saturday 24 June 2017

R Seminar on programming with vectors

Thursday 22 June 2017

Assorted Sports Questions

Thursday 8 June 2017

Reflections on teaching two similar 200-level service courses.