Statistics et al.: Reflections / Postmortem on teaching Stat 302 1601

This was the second course I have been the lecturer for, although I’ve had the bulk of the responsibility for several online courses as well. Every other course I’ve been responsible for had between 30 and 140 students. This one had 300.

Stat 302 is a course aimed towards senior-undergrad life and health science majors whom have completed a couple of quantitative courses before, including a similarly directed 200-level statistics course. It involved 3 hours of lectures per week for 14 weeks, a drop-in tutoring center in lieu of a directed lab section, 4 assignments, 2 midterms and a final exam. The topics to be covered were largely surrounding ANOVA, regression, modeling in general, and an introduction to some practical concerns like causality.

The standard textbook for this course was Applied Regression Analysis and Other Multivariate Methods (5th ed), but I opted not to use it to allow for more focus on practical aspects (at the cost of mathematical depth), as well as to save my students a collective $60,000.

I delivered about 75% of the lectures as fill-in-the-blank notes, where I had written pdf slides and sent them out to the students, but removed key words in the pre-lecture version of the slides. After each lecture the filled slides were made available. The rest of the lectures were in a practice problem / case studies format, where I sent out problems to solve before class, and solved them on paper under a video camera, with written and verbal commentary, during class. Most of these were made available too.

Everything can be found at http://www.sfu.ca/~jackd/Stat302.html for now.

What worked:

1. Focusing on the practical aspects of the material. This was a big gamble because it was a break from previous offerings of the course, and meant I had a lot less external material to work from. It was work the risk, and I’m proud of the course that was delivered.

I was able to introduce the theory of an ambitious range of topics, including logistic regression, with time to spare. The extra time was used for in-depth examples. This example time added a lot more value than an equal amount of time on formulae would have. It more closely reflected how these students will encounter data in future courses and projects, and the skills they will need to analyze that data.

The teaching assistants that talked to me about it had good things to say about the shift. The more keen students asked questions of a startling amount of depth and insight. I feel that there were only a few cases where understanding was less that what it would have been if I had given a more solid, proof based explanation of some method or phenomenon, rather than the data-based demonstrations I relied upon.

Although making the notes for the class was doubly hard because it was my first time and because I was breaking from the textbook, those notes are going to stand on their own in future offerings of Stat 302 and of similar courses. As a long-term investment, it will probably pay off. For this class, it probably hurt the attendance rate because students knew the filled notes would be available to them without attending. My assumption about these non-attendees is that they would gain little more from showing up that they wouldn’t from reading the notes and doing the assignments.

2. Using R. At the beginning of the semester, I polled the students about their experience with different statistical software, and the answers were diverse. A handful of students had done statistics with SPSS, JMP, SAS, Excel, and R, and without much overlap. That meant that any software I chose would be new to most of the students. As such, I feel back to my personal default of R.

Using R meant that I could essentially do the computation for the students by providing them the necessary code with their assignments. It saved me some of lecture time that would have otherwise been spent providing a step-by-step of how to manage in a point-and-click environment. It also saved me the lecture time and personal time spent dealing with inevitable license and compatibility issues that would have arisen from using anything not open source.

Also, now the students have experience with an analysis tool that they can access after the class is over. Even though many students had no programming experience, I feel like they got over the barrier of programming easily enough. There were some major hiccups which can hopefully be avoided in the future.

3. Announcing and keeping a hard line on late assignments. In my class, hard copies of assignments were to be handed in to a drop box for a small army of teaching assistants to grade and return. Any late assignments would have added a new layer of logistics to this distribution, so I announced in the first day that late assignments would not be graded at all. This also saved me a lot of grief with regards to judging which late excuses were ‘worthy’ of mercy or not, or trying to verify them.

4. Using a photo-to-PDF app on my phone. It’s faster and more convenient than using a scanner. Once I started using one, posting keys and those case study style lecture notes became a lot easier.

5. Including additional readings in the assignments. The readings provided the secondary voice to the material that would have otherwise been provided by the textbook. Since I've posted answers to the questions I wrote, I will need to make new questions in order to reuse the articles, but the discovery part is already done.

6. The Teaching Assistants. By breaking from the typical material, I was also putting an extra burden on the teaching assistants to also have knowledge beyond the typical Stat 302 offering. They kept this whole thing together, and they deserve some credit.

What I learned by making mistakes:

1. USE THE MICROPHONE. I have good lungs and a very strong voice, so even when a microphone is available, my preference has been to deliver lectures unaided. This approach worked up until one morning in Week 3 when I woke up mute and choking on my own uvula. Two hours of lectures had to be cancelled.

2. Use an online message board. For a large class, having message board goes a long way. It allows you to answer a student question once or two, rather than several times over e-mail. I had underestimated the number of times I would get the same question, and answer the question in class didn’t seem to help because of the 45-60% attendance rate. Other than the classroom, my other option to send out a mass email, which, aside from sending out lecture notes, was done sparingly.

A message board also serves the same purpose of a webpage as a repository of materials like course notes, datasets, and answer keys.

3. Do whatever you can in advance. Had I simply spent more time writing more rough drafts of lectures, or made or found some datasets to use, before the start of class in January, that time spent would have paid off more than one-to-one. How? Because I still had to do that work AND deal with the effects of lost sleep afterwards. There were a few weeks where my life was a cycle of making material for the class at the last minute, and recovering for working until dawn. Thank goodness I was only responsible for one course.

4. Distrust your own code. I have a lot of experience with writing R code on the fly, so I thought I could get away with minimal testing of the example code I wrote and gave out with assignments. Never again.

One of my assignments was a logistical disaster. First, a package dependency had recently changed, so even though on my system I could get away with a single library() call to load every function needed for the assignment, many students needed two. For others, the package couldn't even be installed.

Also, when testing the code for a question, I had removed all the cases with missing data before running an analysis. I didn’t think it would make any difference because the regression function, lm() removes these cases automatically anyways. It turns out that missing data can seriously wreck the stepAIC() function, even if the individual lm() calls within the function handle it fine.

In the future, I will either try to take any necessary functions from packages and put them into a code file that can called with source(), or I will provide the desired output with the example code. This also ties back into working until dawn: quality suffers.

5. Give zero weight on assignments. The average score on my assignments was about 90%, and with little variation. As a means of separating students by ability, the assignments failed completely. As a means of providing a low-stakes venue for learning without my supervision, I can’t really tell. The low variation and other factors in the class suggest to me a lot of copying or collusion. Identifying which students are copying each other, or are merely writing things verbatim from my notes is infeasible – even with teaching assistants. The number of comparisons grows with the SQUARE of the number of students, and comparisons are hard to judge fairly in statistics already.

One professor here with a similar focus on practicality, Carl Schwarz, gives 0% grade weight to assignments in some classes. The assignments are still marked for feedback, I assume, but only exams are used for summative evaluation. This would be ideal for the next time I teach this course.

I would expect the honest and interested students to hand in work for practice and feedback and they would not be penalized grade-wise for not handing in a better, but copied, answer. I would expect the rest of the students to simply not hand anything in, which isn’t much worse for them than copying, and would save my teaching assistants time and effort.

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science

Monday 11 April 2016

Reflections / Postmortem on teaching Stat 302 1601

1 comment: