Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Saturday 31 December 2016

Reflections on teaching Stat 342 1609

Stat 342 is a course about programming in the SAS language and environment. It is aimed at statistics major undergrads. This course was delivered as a single two-hour lecture each week for all 60-70 students together, and a one-hour lab with the students in smaller groups with a lab instructor.

This was only the second time the course has been offered at SFU, which introduced some challenges and opportunities that were new to me. It was also the first course I have delivered to an audience of statistics majors.
My biggest regret is not putting more content into the course, especially as assignments. I should have given four assignments instead of two, and allowed for much more depth and unsupervised coding practice. This is especially true with more open topics like the SQL and IML procedures, and data steps. SAS is an enormous topic, but I feel like I could have done more with the two credits this course entails.

My biggest triumph was the inclusion of SQL into the course. I covered what SQL was and its basic uses of inspecting, subsetting, and aggregating data. This meant a commitment of two weeks of the course that hadn't been included before, and wasn't in the text. I heard from two separate sources afterwards, as well as students, that learning SQL was a priority but it wasn't found elsewhere in the curriculum.

In short, my personal conviction that SQL should be taught to stats students was validated.

The textbook, titled "SAS and R - Data Management, Statistical Analysis, and Graphics." by Ken Kleinman and Nicholas J. Horton., was half of the prefect textbook.

Very little theory or explanation is given of any the programs in the textbook. It read more like the old Schaum's Outline reference books than a modern text. There were simply hundreds of tasks, arranged into chapters and sections, and example code to perform these tasks in R and in SAS. Since most of these students were already familiar with R, this meant they had an example that they already familiar with as a translation, in the short exposition wasn't enough. It was a superb reference; it was the first book I have declared required in any course I've taught.

Having said that, the book "SAS snd R" still left a lot of explanation for me to provide on my own or from elsewhere. It also lacked any practice problems to assign or use as a basis. I relied on excerpts from a book on database systems [1], a book on data step programming [2], as well as from several digital publications from the SAS Institute. You can find links to all of these on my course webpage at https://www.sfu.ca/~jackd/Stat342.html

SAS University Edition made this course a lot smoother than I was expecting. Installing the full version of SAS has a lot of technical difficulties due to legacy code and intellectual property rights.

Simon Fraser University has a license for some of its students, but it's still a 9 GB download, and it only works on certain versions of Windows. By comparison, SAS U Edition is 2 GB, and the actual processing happens in its own virtual machine, independent of the operating system on the computer. The virtual machine can be hosted by one's own computer or remotely through Amazon Web Services.

Actually using this version SAS just requires a web browser. I has a virtual machine set up on my home computer, and a copy running through Amazon. That way, I could try more computationally demanding tasks at home, and demonstrate everything else live in class from a projector. Also, students' installation issues were rare (and exclusively the fault of a recent patch from VMWare), but could be dealt with in the short term by giving access to my Amazon instance.

Exam questions were of one of three types; give the output of this code, write a program to these specifications, and describe what each line of this program does. Only the third type has room for interpretation. This made marking faster and the judgements clearer to students.

It's hard to make comparisons of the support burden of this course to others I have taught because it was much smaller. I taught two classes this term and the other was more than 5 times as large. Naturally, the other had more than 5 times as many questions and problems from individual students.

The nature of the tasks in the two assignments and on the two exams gave less opportunity for arguing for marks as well. The assignment questions had computer output that had clear indicators of correctness.

Compared to the audiences of 'service' courses (courses offered by the stats department in service to other departments), there are some differences that call for a change in style. Majors seem to be more stoic in class. That is, it's harder for me to tell how well the class is understanding the material by the reactions of the students. Often, there is no reaction. In some cases, I think I covered some ideas to the point of obviousness because i misjudged the audience (too many examples of the same procedure). At least once, I rushed through a part that the students didn't get at all (ANOVA theory). Also, my jokes never get a reaction in class. Ever.

On the flip side, these students seem more willing to give me written feedback, or verbal feedback outside of class. None of this should surprise me; as a student I have tried to blend into a class of three people.

[1] "Database Systems: Design, Implementation, and Management" by Carlos Coronel, Steven Morris, and Peter Rob.
[2] "Handbook of SAS DATA Step Programming" by Arthur Li.

No comments:

Post a Comment