With the PhD wrapping up this summer, I can't default to 'do a higher degree' and have to go find a real job.
One option I've been considering is work in scientific, academic publishing. As a job, or just as a source of supplemental income, it seems ideal to me. It's the kind of job where I could actually add value to research by
making it more ready to disseminate. Also, I have research experience in
statistics, health science, molecular biology, and education. I write
habitually. I'm a native English speaker who can also check the mathematical, and especially the statistical assertions in an academic paper for correctness before it goes to an editor, or to the public.
Copy editing work can be done without leaving Vancouver, in fact it can be done from a houseboat, or a houseboat city. Reading technical reports and academic papers would keep me actively learning and discovering. The work can be done at any time of day, and the amount of work can be adjusted to fit other, more time-specific activities.
There are companies like ManuscriptEdit and Scribendi that dispatch editing work to their own academics on contract. Many of their editors are PhDs with established careers and long publications records. These companies ask for, understandably, a proven record of copy editing ability and writing experience. Blog posts probably don't count.
There are a couple of certification programs, like the one from the Board of Editors in the Life Sciences (BELS) internationally, and the Editors' Association of Canada (EAC), whose scope is editing and proofreading in general. A certification would be great because it's a shorthand for proof of ability. For the BELS exam, the most convenient exam is this November in Florida, and I doubt I could be ready for that even if I could go. The EAC exams are probably doable locally, especially since their annual meeting is in Vancouver this summer, but it's a multi-year process.
So, I tried something with a smaller commitment. I selected short articles from open access journals, specifically ones with grammatical mistakes in their abstracts. Then, I printed out these articles, copy edited them as if they had been given to me before publication, and sent the results to the journals' editors, each with a request to be considered for future contract work.
I copy edited four articles that I can share here. The first three are recently published open-access articles, to which I received two 'no' responses. The last one is one of mine that was recently submitted, but I have permission from the other authors to share it here. Even though I didn't get a positive response, I wasn't expecting one, and 2/3 responses to a cold request at all feels pretty encouraging; it means I'm getting attention. I also got an invitation to be a volunteer peer reviewer for future papers, so there that for connections too.
I'm still reading about the copy editing process, so I think the last two are better than the first two.
Paper 1: Open Journal of Statistics - Predictive Modeling of Gas Production, Utilization, and Flaring in Nigeria...
Paper 2: American Journal of Computational Mathematics - Self Similarity Analysis of Web Users Arrival Pattern at Selected Web Centers
Paper 3: Journal of Data Analysis and Information Processing - Role of Feature Selection on Leaf Image Classification
Paper 4: Submitted - Tactics for Twenty20 Cricket.
Statistical education, publishing, sports analytics, and game theory - everything that makes math useful in real life. Now carbon negative!
Featured post
Textbook: Writing for Statistics and Data Science
If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...
Friday 22 April 2016
nhlscrapr revisited
VanHAC, the Vancouver Hockey Analytics Conference, was April 9th, and I was presenting a tutorial on the nhlscrapr package for R.
This post is excerpts from the code I presented and gave out at the tutorial. The full tutorial expands my of my previous 'package spotlight' post on nhlscrapr. This post only includes the bare bones of downloading the raw games, examining the rate of goals scored and shots fired throughout the game, and making a basic player summary.
Also included is a patch to nhlscrapr I wrote that fixes a couple of functions ( full.game.database() , player.summary() ) that were throwing some errors, and adds a function ( aggregate.roster.by.name() ) that aids in matching player summaries to the proper names.
Monday 11 April 2016
Reflections / Postmortem on teaching Stat 302 1601
This was the second course I have been the lecturer for,
although I’ve had the bulk of the responsibility for several online courses as
well. Every other course I’ve been responsible for had between 30 and 140
students. This one had 300.
The teaching assistants that talked to me about it had good things to say about the shift. The more keen students asked questions of a startling amount of depth and insight. I feel that there were only a few cases where understanding was less that what it would have been if I had given a more solid, proof based explanation of some method or phenomenon, rather than the data-based demonstrations I relied upon.
5. Including additional readings in the assignments. The readings provided the secondary voice to the material that would have otherwise been provided by the textbook. Since I've posted answers to the questions I wrote, I will need to make new questions in order to reuse the articles, but the discovery part is already done.
6. The Teaching Assistants. By breaking from the typical material, I was also putting an extra burden on the teaching assistants to also have knowledge beyond the typical Stat 302 offering. They kept this whole thing together, and they deserve some credit.
Stat 302 is a course aimed towards senior-undergrad life and
health science majors whom have completed a couple of quantitative courses
before, including a similarly directed 200-level statistics course. It involved 3 hours of
lectures per week for 14 weeks, a drop-in tutoring center in lieu of a directed
lab section, 4 assignments, 2 midterms and a final exam. The topics to be
covered were largely surrounding ANOVA, regression, modeling in general, and an
introduction to some practical concerns like causality.
The standard textbook
for this course was Applied Regression Analysis and Other Multivariate
Methods (5th ed), but I opted not to use it to allow for more focus on
practical aspects (at the cost of mathematical depth), as well as to save my
students a collective $60,000.
I delivered about 75% of the lectures as fill-in-the-blank
notes, where I had written pdf slides and sent them out to the students, but
removed key words in the pre-lecture version of the slides. After each lecture
the filled slides were made available. The rest of the lectures were in a
practice problem / case studies format, where I sent out problems to solve
before class, and solved them on paper under a video camera, with written and
verbal commentary, during class. Most of these were made available too.
Everything can be found at http://www.sfu.ca/~jackd/Stat302.html
for now.
What worked:
1. Focusing on the practical aspects of the material. This
was a big gamble because it was a break from previous offerings of the course,
and meant I had a lot less external material to work from. It was work the
risk, and I’m proud of the course that was delivered.
I was able to introduce the theory of an ambitious range of
topics, including logistic regression, with time to spare. The extra time was
used for in-depth examples. This example time added a lot more value than an
equal amount of time on formulae would have. It more closely reflected how these
students will encounter data in future courses and projects, and the skills
they will need to analyze that data.
The teaching assistants that talked to me about it had good things to say about the shift. The more keen students asked questions of a startling amount of depth and insight. I feel that there were only a few cases where understanding was less that what it would have been if I had given a more solid, proof based explanation of some method or phenomenon, rather than the data-based demonstrations I relied upon.
Although making the notes for the class was doubly hard
because it was my first time and because I was breaking from the textbook,
those notes are going to stand on their own in future offerings of Stat 302 and
of similar courses. As a long-term investment, it will probably pay off. For
this class, it probably hurt the attendance rate because students knew the
filled notes would be available to them without attending. My assumption about
these non-attendees is that they would gain little more from showing up that
they wouldn’t from reading the notes and doing the assignments.
2. Using R. At the beginning of the semester, I polled the
students about their experience with different statistical software, and the
answers were diverse. A handful of students had done statistics with SPSS, JMP,
SAS, Excel, and R, and without much overlap. That meant that any software I
chose would be new to most of the students. As such, I feel back to my personal
default of R.
Using R meant that I could essentially do the computation
for the students by providing them the necessary code with their assignments.
It saved me some of lecture time that would have otherwise been spent providing
a step-by-step of how to manage in a point-and-click environment. It also saved
me the lecture time and personal time spent dealing with inevitable license and
compatibility issues that would have arisen from using anything not open
source.
Also, now the students have experience with an analysis tool
that they can access after the class is over. Even though many students had no
programming experience, I feel like they got over the barrier of programming
easily enough. There were some major hiccups which can hopefully be avoided in
the future.
3. Announcing and keeping a hard line on late assignments.
In my class, hard copies of assignments were to be handed in to a drop box for
a small army of teaching assistants to grade and return. Any late assignments
would have added a new layer of logistics to this distribution, so I announced
in the first day that late assignments would not be graded at all. This also
saved me a lot of grief with regards to judging which late excuses were
‘worthy’ of mercy or not, or trying to verify them.
4. Using a photo-to-PDF app on my phone. It’s faster and
more convenient than using a scanner. Once I started using one, posting keys
and those case study style lecture notes became a lot easier.
5. Including additional readings in the assignments. The readings provided the secondary voice to the material that would have otherwise been provided by the textbook. Since I've posted answers to the questions I wrote, I will need to make new questions in order to reuse the articles, but the discovery part is already done.
6. The Teaching Assistants. By breaking from the typical material, I was also putting an extra burden on the teaching assistants to also have knowledge beyond the typical Stat 302 offering. They kept this whole thing together, and they deserve some credit.
What I learned by making mistakes:
1. USE THE MICROPHONE. I have good lungs and a very strong
voice, so even when a microphone is available, my preference has been to deliver lectures unaided. This approach worked up until one morning in
Week 3 when I woke up mute and choking on my own uvula. Two hours of lectures had to be cancelled.
2. Use an online message board. For a large class, having message
board goes a long way. It allows you to answer a student question once or two,
rather than several times over e-mail. I had underestimated the number of times
I would get the same question, and answer the question in class didn’t seem to
help because of the 45-60% attendance rate. Other than the classroom, my other option to send out a mass email, which, aside from sending out lecture notes, was done sparingly.
A message board also serves the same purpose of a webpage as
a repository of materials like course notes, datasets, and answer keys.
3. Do whatever you can in advance. Had I simply spent
more time writing more rough drafts of lectures, or made or found some datasets to use, before the start of class in January, that time spent would have
paid off more than one-to-one. How? Because I still had to do that work AND
deal with the effects of lost sleep afterwards. There were a few weeks where my life was a cycle of making material for the class at the last minute,
and recovering for working until dawn. Thank goodness I was only responsible
for one course.
4. Distrust your own code. I have a lot of experience with
writing R code on the fly, so I thought I could get away with minimal testing
of the example code I wrote and gave out with assignments. Never again.
One of my assignments was a logistical disaster. First, a
package dependency had recently changed, so even though on my system I could
get away with a single library() call to load every function needed for the
assignment, many students needed two. For others, the package couldn't even be installed.
Also, when testing the code for a question, I had removed
all the cases with missing data before running an analysis. I didn’t think it
would make any difference because the regression function, lm() removes these
cases automatically anyways. It turns out that missing data can seriously
wreck the stepAIC() function, even if the individual lm() calls within the
function handle it fine.
In the future, I will either try to take any necessary
functions from packages and put them into a code file that can called with source(),
or I will provide the desired output with the example code. This also ties back into
working until dawn: quality suffers.
5. Give zero weight
on assignments. The average score on my assignments was about 90%, and with
little variation. As a means of separating students by ability, the assignments failed
completely. As a means of providing a low-stakes venue for learning without my
supervision, I can’t really tell. The low variation and other factors in the
class suggest to me a lot of copying or collusion. Identifying which students are copying each
other, or are merely writing things verbatim from my notes is infeasible – even
with teaching assistants. The number of comparisons grows with the SQUARE of
the number of students, and comparisons are hard to judge fairly in statistics already.
One professor here with a similar focus on practicality,
Carl Schwarz, gives 0% grade weight to assignments in some classes. The
assignments are still marked for feedback, I assume, but only exams are used
for summative evaluation. This would be ideal for the next time I teach this course.
I would expect the honest and interested students to hand in work for practice and feedback and they would not be penalized grade-wise for not handing in a better, but copied, answer. I would expect the rest of the students to simply not hand anything in, which isn’t much worse for them than copying, and would save my teaching assistants time and effort.
I would expect the honest and interested students to hand in work for practice and feedback and they would not be penalized grade-wise for not handing in a better, but copied, answer. I would expect the rest of the students to simply not hand anything in, which isn’t much worse for them than copying, and would save my teaching assistants time and effort.
Subscribe to:
Posts (Atom)