Statistics et al.: postmortem

Showing posts with label postmortem. Show all posts

Wednesday, 26 September 2018

Two career failure stories

This semester, I am teaching a course in career planning in statistics. No such course existed when I was an undergrad, so a lot of the planned course material was learned from experience. I opened the semester with some 'failure stories' of trying to start a career with a BSc in Math. Here are some of my failures:

Stat Writing Exercise - Pre-Baked Regression Analysis

In this Statistical Communication exercise, the learners take an already completed regression analysis and write a report of 250-400 words describing the analysis. This exercise consists of a 40-50 minute example that the teacher goes through to demonstrate and establish expectations, followed by a 50-70 minute period for the learners to emulate that writing process on a new analysis.

Stat Writing Exercise - Improving Graphs

This is an in-class exercise that I gave to the 3^rd year undergrads in a Statistical Communication class. It was designed to take 20 minutes to explain and 40-50 minutes to execute, including instant feedback. It went well enough that I felt it was worth sharing.

Reflection on teaching a 400-level course on Big Data for statisticians.

This was the first course I have taught that I would consider a 'main track' course, in that the students were to learn more about what they were already competent in at the start. Most of the courses I have previously taught were 'service courses', in that they were designed and delivered by the Statistics Department in service to other departments that wanted their own students to have a stronger quantitative and experimental background (e.g. Stat 201, 203, 302, and 305). The exception, Stat 342, was designed for statistics majors, but is built as an introduction to SAS programming. Since most other courses in the program are taught using R or Python, teaching SAS feels like teaching a service course as well, in that I am teaching something away from the students' main competency, and enrollment is mainly driven by requirement rather than interest.

In my usual courses, I am frequently grilled from anxious students about what exactly is going to be on exams. Frequent complaints I receive in student responses are about how I spend too much time on 'for interest' things are not explicitly being tested on the exams. I've also found that I needed to adhere to rigid structure in course planning and in grading policy. Moving an assignment's due date, teaching something out of the stated syllabus order, changing the scope or schedule of a midterm, or even dropping an assignment and moving the grade weight have all caused a cascade of problems in previous classes.

Stat 440, Learning from Big Data, was a major shift.

I don't know which I prefer, and I don't know which is easier in the long term, but it is absolutely a different skill set. The bulk of the effort changed from managing people to managing content. I did not struggle to keep the classroom full, but I did struggle to meaningfully fill the classroom's time. I had planned to cover the principles of modern methods (cross validation, model selection, LASSO, dimension reduction, regression trees, neural nets), some data cleaning (missing data, imputation, image processing), some technology (SQL, parallelization, Hadoop), some text analysis (regular expressions, edit distance, XML processing), but I still had a couple of weeks that I had to fill at the end because of the lightning speed that I was able to burn through these topics without protest.

In 'big data', the limits of my own knowledge became a major factor. Most of what I covered in class wouldn't have been considered undergrad material ten years ago when I was a senior (imputation, LASSO, neural nets); some of it didn't exist (Hadoop). There are plenty of textbook and online resources for learning regression or ANOVA, but the information for many of the topics of this course were cobbled together from blog posts, technical reports, and research papers. A lot of resources were either extremely narrow in scope or vague to the point of uselessness. I needed materials that were high-level enough for someone not already a specialist to understand, and technical enough that someone well-versed in data science would get something of value from it, and I didn't find enough.

The flip side of this was that motivation was easy. Two of the three case studies assigned had an active competition component to them. The first such study was a US-based based challenge to use police data from three US cities. In this one, presentation was a major basis that the police departments would judge the results. As such, I had requests for help with plotting and geographic methods that were completely new to me. A similar thing happened with the 'iceberg' case study, based on this Kaggle competition. I taught the basics of neural nets, and at least three groups asked me about generalizations and modifications to neural nets that I didn't know about. (The other case study was a 'warm-up' in which I adapted material from a case study competition held by the Statistical Society of Canada. The students were not in active competition). At least 20% of the class has more statistical talent than I do.

In order to adapt to this challenge and advantage, I changed about mid-semester from my usual delivery method of a PDF slideshow to one of commentary while running through computer code. This worked well for material that would be directly useful for the three case study projects, such as all the image processing work I showed for the case study on determining the difference between icebergs and ships. It wasn't as good for material that would be studied for the final exam. I went through some sample programs on web scraping, and the feedback wasn't as positive for that, and the answers I got on the final exam for the web scraping question were too specific to the examples I had given.

A side challenge was the ethical dilemma of limiting my advice to students looking to improve their projects. I had to avoid using insights that other students had shared with me because of the competitive nature of the class. Normally if someone had difficulty with a homework problem, I could use their learning and share it with others, but this time, that wasn't automatically the case.

There was also a substantial size difference, I had 20-30 students, which is by far the smallest class I've ever lectured to. Previously, Stat 342 was my 'small' class, which had enrollment between 50-80, because it was compared to service classes of 100-300 students. This allowed me to actually communicate with students an a much more one-on-one level. Furthermore, since most of the work was done in small team settings, I got to know what each group of students was working on for their projects.

I worry that what I delivered wasn't exactly big data, and was really more of a mixed bag of data science. However, there was a lot of feedback from the students that they found it valuable, and value-added was what the goal all along.

Thursday, 8 June 2017

Reflections on teaching two similar 200-level service courses.

This is about two courses at Simon Fraser University that appear very similar: Stat 201 and Stat 203.

These courses are very similar in that they are 200-level service courses (meaning they are for non-majors). They are introductory courses that cover the fundamentals of descriptive statistics, sampling, probability, hypothesis testings, and t-tests. Both courses are equivalent as per-requisites for the 300-level service courses or for fulfilling graduation requirements.

Both classes were offered as a combination of 2 hours/week of lecture one day, and 1 hour/week of lecture another day, with drop-in workshop support for assignments and studying.

One could be forgiven for treating them as different sections of the same course, which is exactly what I did.

However, one class is titled “Stat 203: Statistics for Social Sciences”, uses SPSS, and is a service course for the sociology and anthropology departments. The other is titled “Stat 201: Statistics for Life Sciences”, uses R, and is a service course for the biology and environmental sciences. This schism not in content but in audiences is what makes these courses different in ways I didn't expect.

The 201 class had much higher classroom engagement, higher attendance, and even a better reaction to my awful jokes. More measurably, the 201 class also had an average of .75 grade points higher than their 203 counterparts; the 201s received a B+ on average, and the 203s received an average of between a C+ and a B-. Unsurprisingly in this context, the 201 students rated me much higher (4.5/5 vs 3.5/5) in their teacher evaluations.

The themes in the written answers were essentially the same, although my weaknesses were mentioned more by the 203 students. The first word count here is for Stat 201 and the second for Stat 203. I've removed a few common but uninformative words like “Jack”, “Davis”, “course”, and the usual grammar stop words.

Word cloud for evaluations from Stat 201: Stats for Life Sciences

Word cloud for evaluations from Stat 203: Stats for Social Sciences

There's a few lessons to be learned about the mistake of treating two different courses like these as if they were the same, but it's hard to articulate, so forgive me if I stumble.

First, teach (or present, or write) for the audience you have, as opposed to generically. There's a quote that floats around in B.Ed. programs, “I taught, but the students didn't learn” (See Alfie Kohn's article, http://www.alfiekohn.org/article/teach-learn/ ), and how this is a poor attitude for an educator, or that the focus should be on the result, not the process. In other terms, material MUST be suited to the audience to be effective. For me, it would be best to draw from some new sources or sacrifice some depth for more fundamental examples before I deliver Stat 203 again.

Another possibility is to hold practice session for exams, or offer more hints for assignment questions. Since I delivered this course, my exam practice material has gotten much more extensive. For example, the Midterm 1 practice material is now more than 15 pages long, and includes a partial key.

Another key moral: Teaching is a service first, and a means of research and personal growth after that. In Mastery: The Keys to Success and Long-Term Fulfillment, by George Leonard, there's a story about the author's time as a trainer of fighter pilots. In the story, the author spends extra time further developing two already-talented pilots at the expense of the other, less apt, pilots under his charge. From a value-added perspective, the trainer had only done half his job, because the novice pilots could have benefit far more per hour from the trainer's attention than the ace pilots. It's possible that I committed the same fault without being aware of it, and ended up giving more attention to the Stat 201 students than they needed and ended up leaving the Stat 203 students behind as a result.

Also, as an artifact of the timing of the exams in each class, the 203 students got the harder exams than the 201 students, which is the opposite of what should have happened. I wrote my exams in the order that they would be administered, and it happened to be that the Stat 201 midterms and final all came before the Stat 203 equivalents. I wanted to make the exams different but equivalent, and the easiest way to do this was to create a question for one exam, and then change the numbers and/or scenario for the question for the second exam, and add a twist. Most of the time, adding a twist meant increasing the complexity of the exam. I justified this with the assumption that the later 203s would have additional information about the exam from the 201s that had taken a very similar exam a few days prior; this assumption was wrong.

Another resource I should be using is live student feedback. I've been using a learning management system called TopHat, and it's taken me a while to make good use of it. TopHat allows students to answer questions live in lecture (or after the lecture, if the prof doesn't want to deal with excuses for absences) through their mobile devices. I've rarely used it for student opinion polls, but doing so would be a good way to effectively adapt material, or at a minimum give students a chance to anonymous voice concerns.

I don't want to dismiss the 203s as simply weaker in statistics; that shuts the door to finer optimizations. Instead, it would be better think of there being some barrier I haven't broken through yet, and to try to identify that.

On the flip side, what I'm doing with the 201 students seems to work well on the surface, but it's not optimal either. I'm wasting an opportunity to challenge them or push them to work towards greater learning. We'll see though, it's possible that the 201 course being in the mid-afternoon played a role, as well as the its location on a secondary campus. Being on a secondary campus, my coordinator hypothesizes that more dedicated students selected that course because others would have been deterred by the extra commute.

For all the gloom that this reflection may present, I would call this semester and the teaching of these two classes a success. It was a substantial improvement both in outcomes and in workload of the semesters before, and over the Stat 203 class that delivered in Summer 2012.

One particularly bright spot was Crowdmark, a grading platform we started using for assignments and exams. The assignments had some technical growing pains, but for exams, Crowdmark was fantastic. Each exam is given a QR code at the top of each page, which allows that page to be separated from the rest of the exam digitally. The questions could be distributed out to markers just by having them log in and grade, and the platform is equipped with hotkeys to allow them to put the… same… comment… on… hundreds… of… incorrect… answers… by writing the comment once and using a couple of keystrokes on each question. The students then receive a pdf of their exam with the marker's annotations.

Also, it keeps a record of how each student did on each question, rather than looking at the exam scores in aggregate. This means I can look back and see which questions are doing the best for appropriate difficulty and discriminatory power. I can apply item response theory to the results. I can even use the data for my future research on improving the exam experience.

Saturday, 31 December 2016

Reflections on teaching Stat 342 1609

Stat 342 is a course about programming in the SAS language and environment. It is aimed at statistics major undergrads. This course was delivered as a single two-hour lecture each week for all 60-70 students together, and a one-hour lab with the students in smaller groups with a lab instructor.

This was only the second time the course has been offered at SFU, which introduced some challenges and opportunities that were new to me. It was also the first course I have delivered to an audience of statistics majors.

My biggest regret is not putting more content into the course, especially as assignments. I should have given four assignments instead of two, and allowed for much more depth and unsupervised coding practice. This is especially true with more open topics like the SQL and IML procedures, and data steps. SAS is an enormous topic, but I feel like I could have done more with the two credits this course entails.

My biggest triumph was the inclusion of SQL into the course. I covered what SQL was and its basic uses of inspecting, subsetting, and aggregating data. This meant a commitment of two weeks of the course that hadn't been included before, and wasn't in the text. I heard from two separate sources afterwards, as well as students, that learning SQL was a priority but it wasn't found elsewhere in the curriculum.

In short, my personal conviction that SQL should be taught to stats students was validated.

The textbook, titled "SAS and R - Data Management, Statistical Analysis, and Graphics." by Ken Kleinman and Nicholas J. Horton., was half of the prefect textbook.

Very little theory or explanation is given of any the programs in the textbook. It read more like the old Schaum's Outline reference books than a modern text. There were simply hundreds of tasks, arranged into chapters and sections, and example code to perform these tasks in R and in SAS. Since most of these students were already familiar with R, this meant they had an example that they already familiar with as a translation, in the short exposition wasn't enough. It was a superb reference; it was the first book I have declared required in any course I've taught.

Having said that, the book "SAS snd R" still left a lot of explanation for me to provide on my own or from elsewhere. It also lacked any practice problems to assign or use as a basis. I relied on excerpts from a book on database systems [1], a book on data step programming [2], as well as from several digital publications from the SAS Institute. You can find links to all of these on my course webpage at https://www.sfu.ca/~jackd/Stat342.html

SAS University Edition made this course a lot smoother than I was expecting. Installing the full version of SAS has a lot of technical difficulties due to legacy code and intellectual property rights.

Simon Fraser University has a license for some of its students, but it's still a 9 GB download, and it only works on certain versions of Windows. By comparison, SAS U Edition is 2 GB, and the actual processing happens in its own virtual machine, independent of the operating system on the computer. The virtual machine can be hosted by one's own computer or remotely through Amazon Web Services.

Actually using this version SAS just requires a web browser. I has a virtual machine set up on my home computer, and a copy running through Amazon. That way, I could try more computationally demanding tasks at home, and demonstrate everything else live in class from a projector. Also, students' installation issues were rare (and exclusively the fault of a recent patch from VMWare), but could be dealt with in the short term by giving access to my Amazon instance.

Exam questions were of one of three types; give the output of this code, write a program to these specifications, and describe what each line of this program does. Only the third type has room for interpretation. This made marking faster and the judgements clearer to students.

It's hard to make comparisons of the support burden of this course to others I have taught because it was much smaller. I taught two classes this term and the other was more than 5 times as large. Naturally, the other had more than 5 times as many questions and problems from individual students.

The nature of the tasks in the two assignments and on the two exams gave less opportunity for arguing for marks as well. The assignment questions had computer output that had clear indicators of correctness.

Compared to the audiences of 'service' courses (courses offered by the stats department in service to other departments), there are some differences that call for a change in style. Majors seem to be more stoic in class. That is, it's harder for me to tell how well the class is understanding the material by the reactions of the students. Often, there is no reaction. In some cases, I think I covered some ideas to the point of obviousness because i misjudged the audience (too many examples of the same procedure). At least once, I rushed through a part that the students didn't get at all (ANOVA theory). Also, my jokes never get a reaction in class. Ever.

On the flip side, these students seem more willing to give me written feedback, or verbal feedback outside of class. None of this should surprise me; as a student I have tried to blend into a class of three people.

[1] "Database Systems: Design, Implementation, and Management" by Carlos Coronel, Steven Morris, and Peter Rob.
[2] "Handbook of SAS DATA Step Programming" by Arthur Li.

Wednesday, 28 December 2016

Reflections / Postmortem on teaching Stat 305 1609

Stat 305, Introduction to Statistics for the Life Sciences, is an intermediate level service course, mainly for the health sciences. It is a close relative to Stat 302, which I had taught previously in its requirements, audience, and level of difficulty. Compared to Stat 302, Stat 305 spends less time on regression and analysis of variance, and more time on contingency tables and survival analysis.

Changes that worked from last time:

Using the microphone. Even though the microphone was set to almost zero (I have a theatre voice), using it saved my voice enough to keep going through the semester. Drawing from other sources also worked. Not everything has to be written originally and specifically for a given lecture. Between 40 and 50 percent of my notes were reused from Stat 302. Also, many of the assignment questions were textbook questions with additional parts rather than made from scratch.

Changes that didn't work from last time:

De-emphasizing assignments. Only 4% of the course grade was on assignments, and even that was 'only graded on completion'. This was originally because copying had gotten out of control when 20% of the grade was assignments. This didn't have the desired effect of given people a reason to actually do the assignments and learn rather than copy to protect their grades.

Changes I should have done but didn't:

Keeping ahead of the course. I did it for a few weeks, but it got away from me, and I spent much of the semester doing things at the last feasible minute. This includes giving out practice materials. On multiple occasions I watched f.lux turn my screen from red to blue, which it does to match the colour profile of my screen to the rising sun.

What surprised me:

The amount of per-student effort this course took. There were fewer typos than in previous classes, and therefore student questions about inconsistencies in the notes. However, there was an unusually large amount of grade change requests. Maybe there was a demographic difference I didn't notice before, like more pre-med students, or maybe the questions I gave on midterms were more open to interpretation, or both.

What I need to change:

My assignment structure. There should have been more assignments that were smaller, and ideally they should include practice questions not to be handed in. Having more questions available in total is good because finding relevant practice material is hard for me, let alone students. Having smaller and more assignments mitigates the spikes student workload, and means that the tutors at the stats workshop have to be aware of less of my material concurrently.

Tophat:

Tophat is a platform that lets instructors present slides and ask questions of an audience using laptops and mobile devices that students already have. My original plan was to use iClickers as a means to poll the audience, but Tophat's platform turned out to be a better alternative for almost the same cost. It also syncs the slides and other material I was presenting to these devices. My concerns about spectrum crunch (data issues from slides being sent to 200-400 devices) didn't seem to be a problem, but I

Scaling was my biggest concern for this course, given that there were more students in the class than in my last two elementary schools combined. I turned to Tophat as a means of gathering student responses from the masses and not just from the vocal few. It also provided a lot of the microbreaks that I like to put in every 10-15 minutes to reset the attention span clock.

However, Tophat isn't just a polling system that uses people's devices. It's also a store of lecture notes, grades, and a forum for students. This is problematic because the students already have a learning management system called Canvas that is used across all class. This means two sets of grades, two forums (fora? forae?), and two places to look for notes on top of emails and webpage.

To compound this, I was also trying to introduce a digital marking system called Crowdmark. That failed, partly because I wasn't prepared and partly because students' data would be stored in the United States, and that introduces a whole new layer of opt-in consent. Next term, Crowdmark will have Canadian storage and this won't be a problem.

I intend to use Tophat for my next two classes in the spring, and hopefully I can use it better in the future.

The sheep in the room:

During the first two midterms, there was blatant, out-of-control cheating. Invigilators (and even some students) reported seeing students copying from each other, writing past the allotted time, and consulting entire notebooks. There was no space to move people to the front for anything suspicious, and there was too much of it to properly identify and punish people with any sort of consistency. Students are protected, as they should be, from accusations of academic dishonesty by a process similar to that which handles criminal charges, so an argument that 'you punished me but not xxxx' for the same thing is a reasonable defense.

The final exam was less bad, in part because of the space between them and attempts to preemptively separate groups of friends. Also, I had two bonus questions about practices that constitute cheating and the possible consequences. For all I know, these questions did nothing, but some of the students told me they appreciated them nonetheless. Others were determined to try and copy off of each other, and were moved to the front.

What else can be done? Even if I take the dozens of hours to meet with these students and go through the paperwork and arguing and tears to hand out zeros on exams, will it dissuade future cheaters? Will it improve the integrity of my courses? Will I be confronted with some retaliatory accusation?

Perhaps it's possible to create an environment where there are less obvious incentives to cheat. Excessive time pressure, for example, could push people to write past their time limit. Poor conditions are not an excuse for cheating, but if better conditions can reduce cheating, then my goal is met. But why a notebook? The students were allowed a double sided aid sheet; that should have been enough for everything.

This is something I don't yet have an answer for.

Priming:

The midterm exam was very difficult for people, and I anticipated a lot of exam anxiety on the final. On the final exam, I had two other bonus questions on the first page.

One of them asked the student to copy every word that was HIGHLIGHTED LIKE THIS, which was five key words that had been overlooked on many students' midterms in similar questions.

The other question employed priming, which is a method of evoking a certain mindset by having someone process information that covertly requires that mindset. The question was

“What would you like to be the world's leading expert in?”

… and was worth a bonus of 1% on the final for any non-blank answer. The point of the question was to have the students imagine themselves as being highly competent at something, anything, before doing a test that required acting competently. Most of them wrote 'statistics'. In literature on test taking, a similar question involving winning a Nobel Prize was found to have a positive effect on test scores in a randomized trial. It's impossible to tell if my question had any effect because it was given to everyone. However, several students told me after the exam that they enjoyed the bonus questions.

Priming is one of the exam conditions I want to test in a formal, randomly assigned experiment in the near future. It will need to pass the university's ethics board first, which it obviously will, but it's still required. It's funny how one can include something like this in an exam for everyone without ethical problems, but need approval if I want to test the effect because it's testing on human experiments.

Facebook wound up in a similar situation where they ran into ethical trouble for manipulating people's emotions by adjusting post order, but the trouble came from doing it for the purpose of published research and not something strictly commercial like advertising.

Reading assignments:

In the past, I have included reading assignments of relevant snippets of research papers using the methods being taught. Worrying about overwhelming the students, I had dropped this. However, I think I'll return to it, perhaps as a bonus assignment. There were a couple students that even told me after the class that the material was too simple, and hopefully some well-selected articles will satisfy them without scaring everyone else.

Using R in the classroom:

In the past, I also had students use R, often for the first time. I had mentioned in a previous reflection the need to test my own code more carefully before putting it in an assignment. Doing so was easily worth the effort.

Another improvement was to include the data as part of the code, rather than as separate csv files that had to be loaded in. Every assignment with R included code that defined each variable of each dataset as a vector and then combined the variables with the data.frame() function. The largest dataset I used had 6 columns and 100 rows; anything much larger would have to be pseudo-randomly generated. I received almost no questions about missing data or R errors; those that I did involved installing a package or the use of one dataset in two separate questions.

Monday, 11 April 2016

Reflections / Postmortem on teaching Stat 302 1601

This was the second course I have been the lecturer for, although I’ve had the bulk of the responsibility for several online courses as well. Every other course I’ve been responsible for had between 30 and 140 students. This one had 300.

Stat 302 is a course aimed towards senior-undergrad life and health science majors whom have completed a couple of quantitative courses before, including a similarly directed 200-level statistics course. It involved 3 hours of lectures per week for 14 weeks, a drop-in tutoring center in lieu of a directed lab section, 4 assignments, 2 midterms and a final exam. The topics to be covered were largely surrounding ANOVA, regression, modeling in general, and an introduction to some practical concerns like causality.

The standard textbook for this course was Applied Regression Analysis and Other Multivariate Methods (5th ed), but I opted not to use it to allow for more focus on practical aspects (at the cost of mathematical depth), as well as to save my students a collective $60,000.

I delivered about 75% of the lectures as fill-in-the-blank notes, where I had written pdf slides and sent them out to the students, but removed key words in the pre-lecture version of the slides. After each lecture the filled slides were made available. The rest of the lectures were in a practice problem / case studies format, where I sent out problems to solve before class, and solved them on paper under a video camera, with written and verbal commentary, during class. Most of these were made available too.

Everything can be found at http://www.sfu.ca/~jackd/Stat302.html for now.

What worked:

1. Focusing on the practical aspects of the material. This was a big gamble because it was a break from previous offerings of the course, and meant I had a lot less external material to work from. It was work the risk, and I’m proud of the course that was delivered.

I was able to introduce the theory of an ambitious range of topics, including logistic regression, with time to spare. The extra time was used for in-depth examples. This example time added a lot more value than an equal amount of time on formulae would have. It more closely reflected how these students will encounter data in future courses and projects, and the skills they will need to analyze that data.

The teaching assistants that talked to me about it had good things to say about the shift. The more keen students asked questions of a startling amount of depth and insight. I feel that there were only a few cases where understanding was less that what it would have been if I had given a more solid, proof based explanation of some method or phenomenon, rather than the data-based demonstrations I relied upon.

Although making the notes for the class was doubly hard because it was my first time and because I was breaking from the textbook, those notes are going to stand on their own in future offerings of Stat 302 and of similar courses. As a long-term investment, it will probably pay off. For this class, it probably hurt the attendance rate because students knew the filled notes would be available to them without attending. My assumption about these non-attendees is that they would gain little more from showing up that they wouldn’t from reading the notes and doing the assignments.

2. Using R. At the beginning of the semester, I polled the students about their experience with different statistical software, and the answers were diverse. A handful of students had done statistics with SPSS, JMP, SAS, Excel, and R, and without much overlap. That meant that any software I chose would be new to most of the students. As such, I feel back to my personal default of R.

Using R meant that I could essentially do the computation for the students by providing them the necessary code with their assignments. It saved me some of lecture time that would have otherwise been spent providing a step-by-step of how to manage in a point-and-click environment. It also saved me the lecture time and personal time spent dealing with inevitable license and compatibility issues that would have arisen from using anything not open source.

Also, now the students have experience with an analysis tool that they can access after the class is over. Even though many students had no programming experience, I feel like they got over the barrier of programming easily enough. There were some major hiccups which can hopefully be avoided in the future.

3. Announcing and keeping a hard line on late assignments. In my class, hard copies of assignments were to be handed in to a drop box for a small army of teaching assistants to grade and return. Any late assignments would have added a new layer of logistics to this distribution, so I announced in the first day that late assignments would not be graded at all. This also saved me a lot of grief with regards to judging which late excuses were ‘worthy’ of mercy or not, or trying to verify them.

4. Using a photo-to-PDF app on my phone. It’s faster and more convenient than using a scanner. Once I started using one, posting keys and those case study style lecture notes became a lot easier.

5. Including additional readings in the assignments. The readings provided the secondary voice to the material that would have otherwise been provided by the textbook. Since I've posted answers to the questions I wrote, I will need to make new questions in order to reuse the articles, but the discovery part is already done.

6. The Teaching Assistants. By breaking from the typical material, I was also putting an extra burden on the teaching assistants to also have knowledge beyond the typical Stat 302 offering. They kept this whole thing together, and they deserve some credit.

What I learned by making mistakes:

1. USE THE MICROPHONE. I have good lungs and a very strong voice, so even when a microphone is available, my preference has been to deliver lectures unaided. This approach worked up until one morning in Week 3 when I woke up mute and choking on my own uvula. Two hours of lectures had to be cancelled.

2. Use an online message board. For a large class, having message board goes a long way. It allows you to answer a student question once or two, rather than several times over e-mail. I had underestimated the number of times I would get the same question, and answer the question in class didn’t seem to help because of the 45-60% attendance rate. Other than the classroom, my other option to send out a mass email, which, aside from sending out lecture notes, was done sparingly.

A message board also serves the same purpose of a webpage as a repository of materials like course notes, datasets, and answer keys.

3. Do whatever you can in advance. Had I simply spent more time writing more rough drafts of lectures, or made or found some datasets to use, before the start of class in January, that time spent would have paid off more than one-to-one. How? Because I still had to do that work AND deal with the effects of lost sleep afterwards. There were a few weeks where my life was a cycle of making material for the class at the last minute, and recovering for working until dawn. Thank goodness I was only responsible for one course.

4. Distrust your own code. I have a lot of experience with writing R code on the fly, so I thought I could get away with minimal testing of the example code I wrote and gave out with assignments. Never again.

One of my assignments was a logistical disaster. First, a package dependency had recently changed, so even though on my system I could get away with a single library() call to load every function needed for the assignment, many students needed two. For others, the package couldn't even be installed.

Also, when testing the code for a question, I had removed all the cases with missing data before running an analysis. I didn’t think it would make any difference because the regression function, lm() removes these cases automatically anyways. It turns out that missing data can seriously wreck the stepAIC() function, even if the individual lm() calls within the function handle it fine.

In the future, I will either try to take any necessary functions from packages and put them into a code file that can called with source(), or I will provide the desired output with the example code. This also ties back into working until dawn: quality suffers.

5. Give zero weight on assignments. The average score on my assignments was about 90%, and with little variation. As a means of separating students by ability, the assignments failed completely. As a means of providing a low-stakes venue for learning without my supervision, I can’t really tell. The low variation and other factors in the class suggest to me a lot of copying or collusion. Identifying which students are copying each other, or are merely writing things verbatim from my notes is infeasible – even with teaching assistants. The number of comparisons grows with the SQUARE of the number of students, and comparisons are hard to judge fairly in statistics already.

One professor here with a similar focus on practicality, Carl Schwarz, gives 0% grade weight to assignments in some classes. The assignments are still marked for feedback, I assume, but only exams are used for summative evaluation. This would be ideal for the next time I teach this course.

I would expect the honest and interested students to hand in work for practice and feedback and they would not be penalized grade-wise for not handing in a better, but copied, answer. I would expect the rest of the students to simply not hand anything in, which isn’t much worse for them than copying, and would save my teaching assistants time and effort.

About the Author

Jack Davis is a teaching professor in Statistics at the University of Waterloo, Canada.

Their research spans statistics in sport, data mining, adult education and Bayesian computation.

They have a course called "Statistics, Gambling, and Games of Chance" on Udemy.

They also pretend to be an expert in writing and game design, which is why they wrote a textbook called "Writing for Statisticians" and why they programmed a video game called "Doc Logic" for Xbox 360.

They enjoy chess and chess variants, but they are so, so, very bad at them. They want to try living on a houseboat sometime.

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science