Thursday, 8 June 2017

Reflections on teaching two similar 200-level service courses.

This is about two courses at Simon Fraser University that appear very similar: Stat 201 and Stat 203.

These courses are very similar in that they are 200-level service courses (meaning they are for non-majors). They are introductory courses that cover the fundamentals of descriptive statistics, sampling, probability, hypothesis testings, and t-tests. Both courses are equivalent as per-requisites for the 300-level service courses or for fulfilling graduation requirements.

Both classes were offered as a combination of 2 hours/week of lecture one day, and 1 hour/week of lecture another day, with drop-in workshop support for assignments and studying.

One could be forgiven for treating them as different sections of the same course, which is exactly what I did.

However, one class is titled “Stat 203: Statistics for Social Sciences”, uses SPSS, and is a service course for the sociology and anthropology departments. The other is titled “Stat 201: Statistics for Life Sciences”, uses R, and is a service course for the biology and environmental sciences. This schism not in content but in audiences is what makes these courses different in ways I didn't expect.

The 201 class had much higher classroom engagement, higher attendance, and even a better reaction to my awful jokes. More measurably, the 201 class also had an average of .75 grade points higher than their 203 counterparts; the 201s received a B+ on average, and the 203s received an average of between a C+ and a B-. Unsurprisingly in this context, the 201 students rated me much higher (4.5/5 vs 3.5/5) in their teacher evaluations.

The themes in the written answers were essentially the same, although my weaknesses were mentioned more by the 203 students. The first word count here is for Stat 201 and the second for Stat 203. I've removed a few common but uninformative words like “Jack”, “Davis”, “course”, and the usual grammar stop words.


 Word cloud for evaluations from Stat 201: Stats for Life Sciences


Word cloud for evaluations from Stat 203: Stats for Social Sciences



There's a few lessons to be learned about the mistake of treating two different courses like these as if they were the same, but it's hard to articulate, so forgive me if I stumble.

First, teach (or present, or write) for the audience you have, as opposed to generically. There's a quote that floats around in B.Ed. programs, “I taught, but the students didn't learn” (See Alfie Kohn's article, http://www.alfiekohn.org/article/teach-learn/ ), and how this is a poor attitude for an educator, or that the focus should be on the result, not the process. In other terms, material MUST be suited to the audience to be effective. For me, it would be best to draw from some new sources or sacrifice some depth for more fundamental examples before I deliver Stat 203 again.

Another possibility is to hold practice session for exams, or offer more hints for assignment questions. Since I delivered this course, my exam practice material has gotten much more extensive. For example, the Midterm 1 practice material is now more than 15 pages long, and includes a partial key.


Another key moral: Teaching is a service first, and a means of research and personal growth after that. In Mastery: The Keys to Success and Long-Term Fulfillment, by George Leonard, there's a story about the author's time as a trainer of fighter pilots. In the story, the author spends extra time further developing two already-talented pilots at the expense of the other, less apt, pilots under his charge. From a value-added perspective, the trainer had only done half his job, because the novice pilots could have benefit far more per hour from the trainer's attention than the ace pilots. It's possible that I committed the same fault without being aware of it, and ended up giving more attention to the Stat 201 students than they needed and ended up leaving the Stat 203 students behind as a result.

Also, as an artifact of the timing of the exams in each class, the 203 students got the harder exams than the 201 students, which is the opposite of what should have happened. I wrote my exams in the order that they would be administered, and it happened to be that the Stat 201 midterms and final all came before the Stat 203 equivalents. I wanted to make the exams different but equivalent, and the easiest way to do this was to create a question for one exam, and then change the numbers and/or scenario for the question for the second exam, and add a twist. Most of the time, adding a twist meant increasing the complexity of the exam. I justified this with the assumption that the later 203s would have additional information about the exam from the 201s that had taken a very similar exam a few days prior; this assumption was wrong.

Another resource I should be using is live student feedback. I've been using a learning management system called TopHat, and it's taken me a while to make good use of it. TopHat allows students to answer questions live in lecture (or after the lecture, if the prof doesn't want to deal with excuses for absences) through their mobile devices. I've rarely used it for student opinion polls, but doing so would be a good way to effectively adapt material, or at a minimum give students a chance to anonymous voice concerns.

I don't want to dismiss the 203s as simply weaker in statistics; that shuts the door to finer optimizations. Instead, it would be better think of there being some barrier I haven't broken through yet, and to try to identify that.

On the flip side, what I'm doing with the 201 students seems to work well on the surface, but it's not optimal either. I'm wasting an opportunity to challenge them or push them to work towards greater learning. We'll see though, it's possible that the 201 course being in the mid-afternoon played a role, as well as the its location on a secondary campus. Being on a secondary campus, my coordinator hypothesizes that more dedicated students selected that course because others would have been deterred by the extra commute.

For all the gloom that this reflection may present, I would call this semester and the teaching of these two classes a success. It was a substantial improvement both in outcomes and in workload of the semesters before, and over the Stat 203 class that delivered in Summer 2012.

One particularly bright spot was Crowdmark, a grading platform we started using for assignments and exams. The assignments had some technical growing pains, but for exams, Crowdmark was fantastic. Each exam is given a QR code at the top of each page, which allows that page to be separated from the rest of the exam digitally. The questions could be distributed out to markers just by having them log in and grade, and the platform is equipped with hotkeys to allow them to put the… same… comment… on… hundreds… of… incorrect… answers… by writing the comment once and using a couple of keystrokes on each question. The students then receive a pdf of their exam with the marker's annotations.

Also, it keeps a record of how each student did on each question, rather than looking at the exam scores in aggregate. This means I can look back and see which questions are doing the best for appropriate difficulty and discriminatory power. I can apply item response theory to the results. I can even use the data for my future research on improving the exam experience.