Statistics et al.: 2016

Saturday, 31 December 2016

Reflections on teaching Stat 342 1609

Stat 342 is a course about programming in the SAS language and environment. It is aimed at statistics major undergrads. This course was delivered as a single two-hour lecture each week for all 60-70 students together, and a one-hour lab with the students in smaller groups with a lab instructor.

This was only the second time the course has been offered at SFU, which introduced some challenges and opportunities that were new to me. It was also the first course I have delivered to an audience of statistics majors.

My biggest regret is not putting more content into the course, especially as assignments. I should have given four assignments instead of two, and allowed for much more depth and unsupervised coding practice. This is especially true with more open topics like the SQL and IML procedures, and data steps. SAS is an enormous topic, but I feel like I could have done more with the two credits this course entails.

My biggest triumph was the inclusion of SQL into the course. I covered what SQL was and its basic uses of inspecting, subsetting, and aggregating data. This meant a commitment of two weeks of the course that hadn't been included before, and wasn't in the text. I heard from two separate sources afterwards, as well as students, that learning SQL was a priority but it wasn't found elsewhere in the curriculum.

In short, my personal conviction that SQL should be taught to stats students was validated.

The textbook, titled "SAS and R - Data Management, Statistical Analysis, and Graphics." by Ken Kleinman and Nicholas J. Horton., was half of the prefect textbook.

Very little theory or explanation is given of any the programs in the textbook. It read more like the old Schaum's Outline reference books than a modern text. There were simply hundreds of tasks, arranged into chapters and sections, and example code to perform these tasks in R and in SAS. Since most of these students were already familiar with R, this meant they had an example that they already familiar with as a translation, in the short exposition wasn't enough. It was a superb reference; it was the first book I have declared required in any course I've taught.

Having said that, the book "SAS snd R" still left a lot of explanation for me to provide on my own or from elsewhere. It also lacked any practice problems to assign or use as a basis. I relied on excerpts from a book on database systems [1], a book on data step programming [2], as well as from several digital publications from the SAS Institute. You can find links to all of these on my course webpage at https://www.sfu.ca/~jackd/Stat342.html

SAS University Edition made this course a lot smoother than I was expecting. Installing the full version of SAS has a lot of technical difficulties due to legacy code and intellectual property rights.

Simon Fraser University has a license for some of its students, but it's still a 9 GB download, and it only works on certain versions of Windows. By comparison, SAS U Edition is 2 GB, and the actual processing happens in its own virtual machine, independent of the operating system on the computer. The virtual machine can be hosted by one's own computer or remotely through Amazon Web Services.

Actually using this version SAS just requires a web browser. I has a virtual machine set up on my home computer, and a copy running through Amazon. That way, I could try more computationally demanding tasks at home, and demonstrate everything else live in class from a projector. Also, students' installation issues were rare (and exclusively the fault of a recent patch from VMWare), but could be dealt with in the short term by giving access to my Amazon instance.

Exam questions were of one of three types; give the output of this code, write a program to these specifications, and describe what each line of this program does. Only the third type has room for interpretation. This made marking faster and the judgements clearer to students.

It's hard to make comparisons of the support burden of this course to others I have taught because it was much smaller. I taught two classes this term and the other was more than 5 times as large. Naturally, the other had more than 5 times as many questions and problems from individual students.

The nature of the tasks in the two assignments and on the two exams gave less opportunity for arguing for marks as well. The assignment questions had computer output that had clear indicators of correctness.

Compared to the audiences of 'service' courses (courses offered by the stats department in service to other departments), there are some differences that call for a change in style. Majors seem to be more stoic in class. That is, it's harder for me to tell how well the class is understanding the material by the reactions of the students. Often, there is no reaction. In some cases, I think I covered some ideas to the point of obviousness because i misjudged the audience (too many examples of the same procedure). At least once, I rushed through a part that the students didn't get at all (ANOVA theory). Also, my jokes never get a reaction in class. Ever.

On the flip side, these students seem more willing to give me written feedback, or verbal feedback outside of class. None of this should surprise me; as a student I have tried to blend into a class of three people.

[1] "Database Systems: Design, Implementation, and Management" by Carlos Coronel, Steven Morris, and Peter Rob.
[2] "Handbook of SAS DATA Step Programming" by Arthur Li.

Wednesday, 28 December 2016

Reflections / Postmortem on teaching Stat 305 1609

Stat 305, Introduction to Statistics for the Life Sciences, is an intermediate level service course, mainly for the health sciences. It is a close relative to Stat 302, which I had taught previously in its requirements, audience, and level of difficulty. Compared to Stat 302, Stat 305 spends less time on regression and analysis of variance, and more time on contingency tables and survival analysis.

Changes that worked from last time:

Using the microphone. Even though the microphone was set to almost zero (I have a theatre voice), using it saved my voice enough to keep going through the semester. Drawing from other sources also worked. Not everything has to be written originally and specifically for a given lecture. Between 40 and 50 percent of my notes were reused from Stat 302. Also, many of the assignment questions were textbook questions with additional parts rather than made from scratch.

Changes that didn't work from last time:

De-emphasizing assignments. Only 4% of the course grade was on assignments, and even that was 'only graded on completion'. This was originally because copying had gotten out of control when 20% of the grade was assignments. This didn't have the desired effect of given people a reason to actually do the assignments and learn rather than copy to protect their grades.

Changes I should have done but didn't:

Keeping ahead of the course. I did it for a few weeks, but it got away from me, and I spent much of the semester doing things at the last feasible minute. This includes giving out practice materials. On multiple occasions I watched f.lux turn my screen from red to blue, which it does to match the colour profile of my screen to the rising sun.

What surprised me:

The amount of per-student effort this course took. There were fewer typos than in previous classes, and therefore student questions about inconsistencies in the notes. However, there was an unusually large amount of grade change requests. Maybe there was a demographic difference I didn't notice before, like more pre-med students, or maybe the questions I gave on midterms were more open to interpretation, or both.

What I need to change:

My assignment structure. There should have been more assignments that were smaller, and ideally they should include practice questions not to be handed in. Having more questions available in total is good because finding relevant practice material is hard for me, let alone students. Having smaller and more assignments mitigates the spikes student workload, and means that the tutors at the stats workshop have to be aware of less of my material concurrently.

Tophat:

Tophat is a platform that lets instructors present slides and ask questions of an audience using laptops and mobile devices that students already have. My original plan was to use iClickers as a means to poll the audience, but Tophat's platform turned out to be a better alternative for almost the same cost. It also syncs the slides and other material I was presenting to these devices. My concerns about spectrum crunch (data issues from slides being sent to 200-400 devices) didn't seem to be a problem, but I

Scaling was my biggest concern for this course, given that there were more students in the class than in my last two elementary schools combined. I turned to Tophat as a means of gathering student responses from the masses and not just from the vocal few. It also provided a lot of the microbreaks that I like to put in every 10-15 minutes to reset the attention span clock.

However, Tophat isn't just a polling system that uses people's devices. It's also a store of lecture notes, grades, and a forum for students. This is problematic because the students already have a learning management system called Canvas that is used across all class. This means two sets of grades, two forums (fora? forae?), and two places to look for notes on top of emails and webpage.

To compound this, I was also trying to introduce a digital marking system called Crowdmark. That failed, partly because I wasn't prepared and partly because students' data would be stored in the United States, and that introduces a whole new layer of opt-in consent. Next term, Crowdmark will have Canadian storage and this won't be a problem.

I intend to use Tophat for my next two classes in the spring, and hopefully I can use it better in the future.

The sheep in the room:

During the first two midterms, there was blatant, out-of-control cheating. Invigilators (and even some students) reported seeing students copying from each other, writing past the allotted time, and consulting entire notebooks. There was no space to move people to the front for anything suspicious, and there was too much of it to properly identify and punish people with any sort of consistency. Students are protected, as they should be, from accusations of academic dishonesty by a process similar to that which handles criminal charges, so an argument that 'you punished me but not xxxx' for the same thing is a reasonable defense.

The final exam was less bad, in part because of the space between them and attempts to preemptively separate groups of friends. Also, I had two bonus questions about practices that constitute cheating and the possible consequences. For all I know, these questions did nothing, but some of the students told me they appreciated them nonetheless. Others were determined to try and copy off of each other, and were moved to the front.

What else can be done? Even if I take the dozens of hours to meet with these students and go through the paperwork and arguing and tears to hand out zeros on exams, will it dissuade future cheaters? Will it improve the integrity of my courses? Will I be confronted with some retaliatory accusation?

Perhaps it's possible to create an environment where there are less obvious incentives to cheat. Excessive time pressure, for example, could push people to write past their time limit. Poor conditions are not an excuse for cheating, but if better conditions can reduce cheating, then my goal is met. But why a notebook? The students were allowed a double sided aid sheet; that should have been enough for everything.

This is something I don't yet have an answer for.

Priming:

The midterm exam was very difficult for people, and I anticipated a lot of exam anxiety on the final. On the final exam, I had two other bonus questions on the first page.

One of them asked the student to copy every word that was HIGHLIGHTED LIKE THIS, which was five key words that had been overlooked on many students' midterms in similar questions.

The other question employed priming, which is a method of evoking a certain mindset by having someone process information that covertly requires that mindset. The question was

“What would you like to be the world's leading expert in?”

… and was worth a bonus of 1% on the final for any non-blank answer. The point of the question was to have the students imagine themselves as being highly competent at something, anything, before doing a test that required acting competently. Most of them wrote 'statistics'. In literature on test taking, a similar question involving winning a Nobel Prize was found to have a positive effect on test scores in a randomized trial. It's impossible to tell if my question had any effect because it was given to everyone. However, several students told me after the exam that they enjoyed the bonus questions.

Priming is one of the exam conditions I want to test in a formal, randomly assigned experiment in the near future. It will need to pass the university's ethics board first, which it obviously will, but it's still required. It's funny how one can include something like this in an exam for everyone without ethical problems, but need approval if I want to test the effect because it's testing on human experiments.

Facebook wound up in a similar situation where they ran into ethical trouble for manipulating people's emotions by adjusting post order, but the trouble came from doing it for the purpose of published research and not something strictly commercial like advertising.

Reading assignments:

In the past, I have included reading assignments of relevant snippets of research papers using the methods being taught. Worrying about overwhelming the students, I had dropped this. However, I think I'll return to it, perhaps as a bonus assignment. There were a couple students that even told me after the class that the material was too simple, and hopefully some well-selected articles will satisfy them without scaring everyone else.

Using R in the classroom:

In the past, I also had students use R, often for the first time. I had mentioned in a previous reflection the need to test my own code more carefully before putting it in an assignment. Doing so was easily worth the effort.

Another improvement was to include the data as part of the code, rather than as separate csv files that had to be loaded in. Every assignment with R included code that defined each variable of each dataset as a vector and then combined the variables with the data.frame() function. The largest dataset I used had 6 columns and 100 rows; anything much larger would have to be pseudo-randomly generated. I received almost no questions about missing data or R errors; those that I did involved installing a package or the use of one dataset in two separate questions.

Saturday, 3 December 2016

2016 did not suck.

The idea 2016 sucked is an indication of the triumph of media over statistics.

In the United States, which I use because their data is very open and thorough, national unemployment is less than 5% and has been steadily dropping. Of those unemployed, most of them are either between jobs by choice, or have not yet ever had major employment. We know this because the proportion of the workforce on unemployment assistance is below 1.20%, the lowest rate in more than 40 years.

Also in the US there was also a record low number (not proportion, raw number) of teenage pregnancies. That means both fewer abortions AND fewer unwanted births. So if you're pro-choice or pro-life, your side is winning.

Globally, the birth rate is falling faster than previously forecast, largely because of faster-than-forecast improvements to the quality of life in India. Last month, the Indian government target for solar power capacity was raised dramatically foe the 3rd time in 6 years because the bringing clean energy to people keeps getting easier and cheaper at a rate faster than anyone could reasonably expect.

Compared to 2015, worldwide sales of electric cars has increased 55%, worldwide use of coal has decreased. Anthropogenic (Man-made) carbon emissions were the same as 2015, despite the world economy growing. This is an indication that we could get climate change under control.

A lot of the problems of the last couple of years were either mitigated well in 2016, or outright handled.

Remember Ebola? The outbreak is officially over, and was close to over for most of 2016. There is a vaccine that's currently in use, and if something goes wrong, we have another vaccine candidate in Phase III (late human-equivalent testing) trials to take its place. This was the ultimate Ebola outbreak - not just the biggest but the last we will ever see.

Remember the Fukushima disaster? Radioactivity in most (not within a km of the damaged reactor) of the region has dropped to levels fit for permanent human habitation. After the disaster, Japan shut down all their nuclear power plants for years for safety upgrades, and 2016 saw the last of them come back online. What happened with the Fukushima reactor was extraordinary, and problem of that magnitude is impossible for CANDU modern reactors.

Remember Zika? The outbreak has been contained. There are scattered reports of new cases, but it's not showing up all over the world as was predicted after the Rio Olympics. On that note, the Rio Olympics seemed unremarkable as far as mishaps and problems are concerned. I saw Rio de Janerio in October. It looked like it has survived well enough.

Remember that war that started in 2016? Me neither. A cursory search finds no armed conflict between two or more different countries that started this year.

You can remember 2016 as the year we lost Alan Rickman, but 40 million people will remember it as the year they got access to clean water.

Sunday, 9 October 2016

Why Chess? Part 2: Onitama and Charity Chess

'Why chess' is an ongoing exploration of chess and its alternatives. The question connecting these posts is "why did the game we know as chess become the predominant strategy game?" Last time I looked at versions with smaller boards and fewer pieces than the orthodox game, like Martin Gardner's mini chess.

------
Onitama

Onitama is a recently released chess-like board game. It is played on a 5x5 board between two players controlling 5 pieces each. Four of pieces are 'students' which function as pawns, and one is the 'master' which functions as the king. The goal is to either capture the opponent's master or place any piece on the starting square of the opponent's master.

The movement capabilities of the pieces are determined by five cards. Your opponent possesses two of these cards, you possess two, and the fifth card is a 'swap' card. Each of these cards contains a move set. For example, the 'boar' card allows a piece to move one space laterally, or forward.

Any of your pieces can be moved using either of your cards. However, when you make a move, the card you used is traded with the 'swap' card. For example, using the 'boar' to move means giving up the boar, and allowing you opponent to obtain it after they make a move. (In the case where a move could have been made with either of your cards, you decide which one to give up.)

Games of Onitama are typically short. It took 10-25 minutes for each of 7 games a friend and I played when we were learning it. The game box includes 17 cards, so we were quickly able to see many different combinations of cards.

In two of these games, there was no card to allow the most basic of moves - 1 forward. All forward motion had to be done either diagonally or with a 2-side-1-forward move. These two games played radically different, often adjacent opposing pieces were safe from each other.

Only one card, tiger, allows a 2-forward movement. To balance this card, it only allows 2-forward and 1-back; there is no side movement or 1-forward allowed. This card is incredibly powerful, but in the one game we played with it, it became 'too awesome to use'. That is, using it would mean handing over the ability to the opponent, which could be disastrous. However, never using it meant only having one card of movement options at any given time.

For the setup of the game, 17 cards seemed exactly right. We imagined other card possibilities, but most were either obviously stronger than an existing card (say, by containing all the moves of that card and more), obviously weaker (containing a subset).

What Onitama is, to me, is an efficient, elegant way to try games with sets of hypothetical chess pieces and see the implications first hand. It's easy to imagine a chess-Onitama hybrid using an 8x8 board and cards of the orthodox pieces and some popular fairy chess pieces. There would be some complications with rank and with there being different kinds of pieces instead of identical movers, but the potential is there.

There is a detailed discussion of this game on the boardgames subreddit. There is also heated argument about whether Onitama is better than chess on the boardgamegeek forums.

----
Charity chess

I've been learning to play the orthodox chess game at a website called Charity Chess. This website hosts correspondence style and live games, as well as provides articles and training exercises. The profit from the advertising revenue is split among five charities, in proportion of the weighted votes of registered players. The weight is determined by the amount of activity in the community, mostly from seeing ads but also from publishing articles that receive high peer ratings.

The community is small now, about 300 users, but it has the potential to be the best gamification of charity I have seen yet. Rather than trying to create a game from scratch and make that charitable, this group simply took an existing, proven game and attached ad revenue.

Friday, 30 September 2016

Conference transcript: 50 Minutes of Action

Last week I spoke for 20 minutes at CASSIS, the Cascadia Symposium of Statistics in Sport, in Vancouver, BC. The other speakers were outstanding, and among the things I gained was a renewed appreciation for geography and the making of maps and graphics. It's something I'm very poor at, and I would very much like to make some contacts that are good at visual analytics. I also got to meet one of the co-creators of nhlscrapr, who has just make a similar program for American football called nflscrapr.

My talk was about the supposed tendency for hockey teams in the NHL to, when tied in the last ten minutes of regulation time, play very conservatively. That is, defensive strategies are adopted by both teams during this time that reduce the number of goals scored for either side, and generally make for very boring play.

At least that was my research hypothesis: fewer goals, shots, and shot attempts in tied situations compared to similar situations where one team is winning.

Why would this happen? Because teams behave rationally. The winning team in a National Hockey League game is awarded 2 points towards earning a place in the playoffs at the end of the season. The losing team, however, is awarded 0 points if the game ends after regulation play (60 minutes), or 1 point if they game ends in overtime or the shootout. That means there are 2 points distributed between the teams in a regulation game, and 3 points in an overtime or shootout game. Assuming the winner of an overtime game is essentially determined by a coin toss, a team that is tied can expect 1.5 points for remaining tied after 60 minutes.

From the 1.5 point perspective, the potential downside of breaking this tie is 3 times as large as the potential upside. We can assume that a team is already doing everything it can to maximize the chance that it scores the next goal simply because winning is better than losing. In other words, if there was a way a team could influence the game to score more goals while preventing the other team from doing the same, they would already be doing it. However, we can't necessarily assume they are doing everything they can to reduce or increase the rate of goal scoring in general.

So my assumption is that teams DO exert this sort of influence on the game.

Consider the rate of goal scoring in the last two years of the 'zero-sum' era, before the extra overtime point was introduced. The black dotted line for tied games follows the same pattern as the rest of the games, up until the last minute or two when teams that are behind by 1 goal tend to pull their goalie for an extra attacker.

In the first two years after the overtime loss point is introduced, we see a divergence, in which tied games show less scoring in the last 10 minutes.

Likewise, we see this again five years later in the first two years after the shootout in introduced, but less pronounced.

But these are old graphs. Is this the same divergence happening now, a decade into the shootout era? Here's the raw goals per hour for the most recent 3 seasons, which is measuring the same thing as the above graphs, but with less smoothing.

Without the smoothing, the empty net effect is concentrated on the last minute of play. For the team that's behind by a goal, it climbs to 8 goals per hour. For the team ahead by one, it climbs to 18 goals per hour.

The divergence between the tied teams and the rest has disappeared into the minute-by-minute noise. Maybe it will show up elsewhere. We can also get the shots and shot attempts per hour, like in this next graph.

With the shot attempts, we see a divergence, but not the one we would have expected from teams necessarily trying to preserve a tie. The worse situation a team is in, the more shot attempts they make. Even tied teams play very aggressively in the last minute. The team that's ahead makes fewer shot attempts, but has higher quality shots. The extreme case of this is in the last minute when the losing team has pulled their goalie - anything that would qualify as a shot is a goal against an empty net. It's possible that teams that are ahead are playing more conservatively because they have less to lose from a tie than the losing team has to gain, but that wasn't the hypothesis at the start.

With the newer data, not only can we get shots and shot attempts, but we can also isolate out the time and events in which a goalie was out of net, or when a team was on the powerplay. The following graphs show the goals and shot attempts for those situations. It's essentially the same story, except for the last minute. The one-ahead and one-behind lines can't be trusted in the last minute because there are very few times when the behind team didn't pull their goalie.

So are there 50 regulation minutes of action in an NHL game or 60? My personal grudge against the asymmetry introduced of the overtime loss rule makes me want to say 50, but I can't seem to back that up with recent data. File this one under negative results.

Sunday, 28 August 2016

Why Chess? Part 1: Smaller Boards

Chess fascinates me; it's a very pure board game, and it has existed in some recognizable form for centuries. There's a couple of general questions I have about chess that I return to occasionally, but haven't formally approached until now.

1) Why is the game that we consider chess the orthodox game and not some variant or alternative?

2) How robust is existing artificial intelligence to changes in chess?

Chess, at least in the western world, fills a niche in the game market. It's a competitive game between two players with no random chance and no physical requirement. The rules of the game take an hour to learn, a day to understand, a lifetime to master. In game design terms, chess has a great deal of depth, but limited complexity.

It's not the only such game. The game of Go, for example, has fewer rules, comparable depth, and an effective and convenient method to handicap.

Also, what about variants of chess? Would the game be less interesting or deep if the knight moved in a 3-and-1 jump instead of a 2-and-1 jump? What if the board dimensions were different? Chess has changed before, however slowly. Is it just an accident of history that this is the particular game we got, or is there something optimal about it?

I've been playing games of different variations of chess starting with those available on the Android app Chess Variants. Several of the variants avialable in this app are games that use the same kinds of pieces as orthodox chess but fewer of them, and on a smaller, simpler board. One clear pattern has emerged after 50 plays across 3 variants: I'm horrible. I am embarrassingly bad at chess compared to a computer opponent.

Mini chess (original).

The first variant I tried was Martin Gardner's 5x5 variant, made in 1969. I played as white (first) against an AI opponent (to be discussed in part 2) about 20 times. The result was 0 wins, 2 draws, and a heap of losses. The app allows you to play as black, or even as both for local play, but I'm stubborn.

The layout, shown in Figure 1, suggests a major advantage to black to a naive player like myself. There are 7 possible moves white can make at the beginning of the game. All of these moves let a piece be captured right away. What else could explain my tremendous defeat?

Figure 1 – Martin Garner's original Minichess

I looked up the game to see if my suspicions on white's disadvantage were correct. They were not. It turns out the Gardner's 1969 version of 5x5 chess is a weakly solved problem. Each player can play perfectly and always draw or win. If both are playing perfectly, the game ends in a draw. Furthermore, the perfect play algorithm (also called the 'oracle'), is a mere ~10 pages of if-then instructions.

The "weakly" in weakly solved refers to the fast that a perfect strategy exists and is known from the starting position of the game, but not necessarily every position. So if you play sub-optimally to start a game, the 'oracle' doesn't necessarily cover your situation.

A paper by Mhalla, M., & Prost, F. (2013) outlines this perfect strategy and talks about the winning rates of each side in a sample of historical correspondence games. The white player won slightly more of these games than black did.

Mini chess (updated).

This is a 1989 update to Gardner's 5x5 chess. It is identical except that the knight and bishop have been swapped on the black side.

I played 5 games on this version after playing the 1969 version. Games lasted longer and were closer, but with a small sample and an experience confound, I'm not confident in explaining why.

Micro Chess

This seems to be as minimal as chess gets without being a chess puzzle instead of a game. The layout, shown in Figure 2, only has one pawn per side. The pawn can be moved two squares in its first move. However, the pawn cannot be promoted to a queen. This seems reasonable as no player starts with a queen.

Figure 2 – Micro Chess

There is a one move checkmate for black:

White bishop to C2

Black knight to C3

White is in checkmate.

I was unable to win any match at this (again as white) either, but I did reach a draw quite often. My suspicion is that the knight is the most powerful piece, and that this change in the relative strength of pieces could cause an AI opponent to make poor trades if it was using piece valuations from classic chess. No such luck.

The Alpha-Beta Algorithm

The AI used in this app employs the alpha-beta tree algorithm, which is good for quick computation, like on a phone, because it eliminates large sets of possible actions quickly. It's also non-deterministic; it will choose different actions in identical situations if there is no single obviously best solution.

The alpha-beta algorithm is not specific to chess. It could be used for any discrete choice system, like Shogi or Go. However, it relies on a heuristic scoring system. To use the algorithm, an AI needs a way to evaluate how good or bad a consequence of that action is. Last time, I talked about throwing off the AI by assuming it would under-value a knight. That is, I was assuming that the value assigned to the consequence "lose your knight" would be copied over from classic chess to micro chess, and that this value could be exploited.

The Alpha-Beta algorithm also has some tuning parameters that determine the quality of the decisions made by it. By changing the number or depth (turns removed from the current situation) of the possibilities that the algorithm will evaluate. These parameters can be manipulated indirectly in the Chess Variants app by a difficulty setting with four options: easy, medium, hard, and fast.

The last setting, fast, was the one I was using. In this case, the algorithm will evaluate solutions until 1 second of processor time has been used. This means the algorithm becomes harder to beat when it is provided more processing power, so I can blame my equipment for my losses. The AI may have been playing a close-to-perfect game on the simplified boards, because the space of possible states was much smaller. I also noticed that in orthodox chess matches, the AI got better as the game progressed and pieces were removed from the board. I could frequently take the opponent's queen without a sacrifice only to be crushed in the end-game.

For existing pieces, the value of retaining each piece, and having them in various positions relative to an opposing king is well established by humans. That, I suspect, is why Jocli, the platform used to make all these chess variants online, only has variants that use pieces that exist in orthodox chess and which alter the board - because the heuristics for these (combinations of) pieces are well-established.

Next, I intend to look into Chess 960, which is a variant played on an 8x8 board in which the starting positions of non-pawn pieces are randomized. There are 960 possible starting arrangements that fit the restrictions (King is between rooks, bishops are on opposite colours, black mirrors white, etc. ), hence the name.

An AI may not be able to adapt a few opening strategies from orthodox chess to chess 960, but it fares well by treating any starting as a game already in progress (even if that progress was impossible). If you introduced a common chess puzzle piece, such as the grasshopper or princess, would an entirely new set of heuristics be needed?

Relevant links:

Mhalla, M., & Prost, F. (2013). Gardner’s minichess variant is solved. ICGA Journal, 36(4), 215-221.

https://www.researchgate.net/publication/252932724_Gardner's_Minichess_Variant_is_Solved

https://jocly.com The platform used for the Chess Variants app. Also playable in a web browser.

https://en.wikipedia.org/wiki/Minichess including Gardner's Minichess, and Microchess.

https://en.wikipedia.org/wiki/Fairy_chess_piece list of unorthodox chess pieces used in puzzles and variants.

Still to explore:

Onitama

The Duke

Nightmare Chess

Alice Chess

http://www.chessvariants.com/

The Encyclopedia of Chess Variants, by David Pritchard

The Classified Encyclopedia of Chess Variants, by John Derek Beasley

Friday, 22 July 2016

The Globalization of Baseball

As I write this, pro baseball player Ichiro Suzuki is on the verge of getting his 3,000th big-league hit*.

There's a big asterisk there because Ichiro has nearly 3,000 hits in the North American leagues (Major League Baseball, MLB), but has about 4,300 if you include his hits in the professional Japanese leagues (Nippon Pro Baseball, NPB) as well. Ichiro came from Japan when that was a lot rarer, and perhaps at an older age (27), so it's hard to compare him fairly to players anywhere, other to say he's a global all-star.

Baseball is a global game, and it's becoming more international quickly. Here are some trends I've seen that show increasing ties between baseball in North America and the rest of the world. My hope is for more MLB exhibition games overseas, and eventually overseas inter-league play, especially with Nippon Pro Baseball.

1. The World Baseball Classic

In 2006, 2009, and again in 2013, the USA hosted a world tournament modeled after the FIFA World Cup. The tournaments themselves are a sign that international baseball is strong enough to continue without the support of the Olympics, but the results are an even stronger sign. The team from Japan won the 2006 and 2009 WBCs and placed third in 2013. Cuba won second place in 2006. The USA has not yet placed higher than fourth.

The next World Baseball Classic will be in 2017.

In 2006, the team from Cuba was only allowed entry to the United States because other countries refused to play without them.

2. The end of the Cuban embargo

Earlier this year, the United States officially ended its cold war remnant embargo on Cuba. Cuba has been a source of baseball talent for many years, but in order to play for MLB during the embargo, a player would need to defect to the United States, leaving their old life behind completely. Most of Cuba's world-class superstars may have already done this before the embargo was removed, but what about the ones that could have been professional baseball players, but didn't invest the training time because of the risks and costs involved in defection?

We will see many more Cuban professional players, but it will take a couple of years.

3. Twenty-20 Cricket

Until 2005, professional cricket matches lasted either eight hours, or up to five days. The Twenty20 format of cricket changed that by introducing a three-hour game. This change makes individual cricket matches a lot more appealing to North American sports fans, whom are already used to Football, Soccer, Hockey, and Baseball games which all last about three hours including non-play time.

There are substantial differences between cricket and baseball, but the skill sets are related. Both games focus their action on a single ball thrower and ball hitter, with the remaining play happening among the fielders (9 in baseball, 11 in cricket).

With time and television, this could lead to flow of talent between the sports, with amateur players trying to increase their play time and career prospects by being involved in both sports.

4. The precedent of Inter-League Play

Major League Baseball is split into two leagues of 15 teams each. This is a more meaningful break than that between conferences in hockey or basketball. The two leagues in MLB play by different sets of rules. There are several minor differences, and one big one: the designated hitter. In the American League (which includes the Toronto Blue Jays), there is a designated hitter whose only role is to bat. In the National League, there is no designated hitter, and the pitcher (thrower, like a bowler), must also bat.

Despite this rule difference, American and National League teams play against each other regularly. The game is played by the rules of the home team. Play between MLB and Japanese NPB would likely work the same way, and having the precedent of playing by the home team's rules makes that inter-league play simpler to establish.

5. The video appeal system

According to [2], the Japanese style is for umpires to huddle together and make a call through consensus; American decisions are made by the official in the best physical place to witness the event in question.

Starting in the 2016 season in MLB, when a call is appealed, the video footage is reviewed by a central panel of officials in New York and an ultimate decision is made there. This new system making the most important or contentious calls are made by consensus. This reduces the distance between the two styles.

6. The Posting System

Since 1998, MLB and NPL have used an agreement called the Posting System regarding NPL players leaving Japan and playing in MLB instead. The merits or weaknesses of this system are beyond me, but at least it sets a precedent of coordination between to the two league systems. Ichiro was the second player to join the MLB under the Posting System. See [3] for more.

7. Interleague play with Major League Soccer

European soccer has lots of interaction between leagues. The best of each of the English Premier League, Spain's La Liga, Germany's Bundesliga, and others earn places in the UEFA Champion's League. Back in North America, Major League Soccer (MLS) has been more isolated, but that is changing.

There is a growing trend of star players from UEFA teams playing in MLS when their best years are over. They're still competitive in MLS, and names like Kaka and David Beckham bring in crowds.

There are also more games happening between MLS teams and those from the English Premier League. This month, the Vancouver Whitecaps FC tied 2-2 against Crystal Palace, and the Seattle Sounders beat West Ham 3-0.

Any commercial success from these matches is a signal to Major League Baseball that they could do the same, and that there is fan interest in the overseas matches.

[1] Japanese Baseball Rules and Customs
http://factsanddetails.com/japan/cat21/sub141/item771.html

[2] On the cultural distances in baseball
http://www.umich.edu/~wewantas/brooke/differences.html

[3] The Posting System, which describes how players are transferred between leagues
https://en.wikipedia.org/wiki/Posting_system

Monday, 18 July 2016

Annual Report to Stakeholders 2015-16

As a grad student, I had to submit an annual report on my academic activities and the progress I was making in the PhD.

This year, I neglected to submit one, but since it's a good exercise in perspective, I’m doing a more general one here for myself and the major stakeholders in my life.

Executive Summary (informally, the ‘TL;DR’):

The two most important things of this year were meeting Gabriela and defending my PhD thesis. Both of those successes owe a great deal to the development and effort of the previous years than anything particular I did this year.

Personal:

This year I enjoyed a lot more success in making personal connections than professional progress; a year and a day ago I met Gabriela and we are very much in love. This may seem sappy and out of place in an annual report, but this is a life report and it’s easy to understate these sort of things because they’re difficult to quantify. Gabi is extremely important to me, and she deserves to be mentioned here. Establishing a relationship of this quality takes a lot of effort, but it has been tremendously rewarding.

Education (Learning):

They say completing a doctorate is a marathon, not a sprint. If that’s true, mine felt like the Boston Marathon; some parts crossed the finish line a lot sooner than others. The spirit of the thesis was done a year ago - written, submitted, and with a defense scheduled. This year, I added a new short chapter, and added a new layer of proofreading to the old chapters, but that’s it. While writing this extra chapter, which was an application of my new method to a database, I learned enough about the Canadian legal system to spark an appetite for more (so, very little).

For projects and curiosity, I’ve learned about meta-analysis, validation, item response theory, web-crawling, copy editing, and an introductory amount about the statistical method INLA.

Duolingo says I’m now about 20% fluent in Portuguese. In the near-term future, I hope to push that higher at a rate of 10% per month.

In the short/mid-term, I hope to gain more depth in using SAS (and take more SAS certification exams to solidify this), and to learn more about the law (and to take the LSAT, likewise).

Education (Teaching):

In the spring I lectured Stat 302 to about 300 students. I’ve already talked about this in detail in my postmortem. Student evaluations rated my teaching higher than both the school and faculty average in almost every category. Also, I got applause, so 2/2 there.

Only 10-20% of the material was able to be recycled from Stat 203 and from other notes. However, that means a LOT of new notes and assignments and assignments were added to my personal corpus of notes.

One of the two courses I’m teaching in the fall, Stat 305, will recycle 50% of its material from 302 and 203. Of the other half, I already have the notes of five lectures completed.

I also have been doing some assignment design for a 400-level course on big data methods that Luke Bornn is teaching in the fall, and I did a guest lecture for the 400-level Statistics of Sport course.

Research:

Lots of little scattered things.

Harsha, Tim, and I wrote another cricket paper. Rajitha, Tim, and I wrote a hockey paper. Kurt Routley (in Computer Science) and I co-presented a talk on data mining in hockey. A health sciences paper linking depression and sexually risky behaviour in South Africa was published after years of delay.

Not really shown in the thesis was the additional development of the Approximate Bayesian Computation work that Steve Thompson and I made in the last year. Very recently we got some coveted data from the CDC that we can use to revive some analyses from 2014-5.

I’ll be presenting some hockey research on the overtime loss rule at the Cascadia Sports Conference in Vancouver in September, and Paramjit and I will be reviving a old paper on this with some new depth.

I’ve met with at least five other groups about other research projects all across the board, but without any deliverables that I can mention here.

Publishing:

4 papers were refereed: 3 for the Open Journal of Statistics, 1 for the Canadian Journal of Statistics.

6 papers were copy-edited: 3 for networking purposes, 1 that did the analysis for as well, 1 at the request of a faculty member, and 1 as a skill test for the Canadian Journal of Statistics.

26 blog posts were made and kept, not including this one.

Game Design:

I haven’t posted the details here, but a very basic (pre-alpha) working prototype of Scrabble Dungeon on PC. A player place tiles in a crossword grid from a designated starting tile to an ending point in order to win the game. Wall tiles, pre-placed tiles, and gem pickups are functioning. Boards of any layout and size can be loaded into the game that contain these elements. Parts of the card system are working, and two of the cards (life relic and meta relic) are included.

The goal is to eventually get the game onto the Android Play Store, but I may also settle for a PC version if it’s less of a headache.

Friday, 24 June 2016

Thesis Defence Notes - Approximate Bayesian Computation

These are notes from the slides that were given during my defense. I've clarified some points since then.

The other half of the notes, on the cricket simulator, are found here.

------------------

Approximate Bayesian Computation (ABC) is a method of estimating the distribution of parameters when the likelihood is very difficult to find. This feature makes ABC applicable to a great variety of problems.

The classical ABC method can be summarized as:

Step 1: Select a prior joint distribution F(theta) for the parameters of interest.

Step 2: Observe a sample s from the population you wish to characterize.

Step 3: Compute T(s), a statistic, or set of statistics, from the sample.

Step 4: Repeat the following many times

4A: Generate random parameter values p from F(theta)

4B: Use the values p to simulate a sample s'

4C: Compute the distance between T(s') and T(s), the statistics taken from the simulated and observed samples, respectively.

4D: If the difference is less than some tuning value, epsilon, then accept the values p.

Step 5: The collection of values p from all accepted values is taken as the posterior distribution of the parameters of interest.

Step 6: If the parameters of interest are continuous, a smoothing method is applied to this collection to produce a continuous posterior distribution.

The distance between two statistics can be determined by Manhattan or Euclidean distance, or a weighted distance if deviations between some statistic values are more important than others.

The tuning parameter epsilon determines the balance between accuracy and computational efficiency. A small epsilon will ensure that only generated samples that are very similar to the observed sample will be used. That is, only samples that produce statistics very close to the real sample's statistics. However, many samples will be rejected.

A large epsilon will ensure that many samples will be produced, but some of those samples will have statistics that deviate from the observed sample. Since the process of producing new samples could be time-consuming, depending on the application, this may be a worthwhile trade-off.

Compared to classical Approximate Bayesian Computation method, I made some modifications.

- Instead of having an accept-reject rule, I opted to use every sample but assign weights to the parameters behind each sample based on the distance between T(s') and T(s).

- The weights and smoothing mechanism are determined through a Kernel Density Estimator (KDE)

Specifically, every generated sample is gets treated as a point mass in a space across the statistic values and parameter values.

Consider the example in the following MS-Paint diagrams.

In Figures 1-3, we have a sample with a single statistic value, T(s), vertical, and a single parameter of interest, theta, horizontal. We don't know the parameter value of the observed sample, so we represent the observed sample as a blue horizontal line.

Points A and B represent the statistic and parameter values from each of two generated samples (also A and B, respectively). These are points instead of lines because these samples are simulated using known parameter values.

The gray area at the bottom of each figure is the conditional density of the parameter of the observed sample. The thin lines around points A and B are the contour lines of the probability density, with peaks at the points and decreasing outwards.

Figure 1: A generated sample that's far from the observed sample.

In Figure 1, we see the effect of generated sample A on the parameter estimate. The statistic from sample A is far from the statistic of the observed sample, so point A is far from the line. As a result, the probability density does increase or decease much along the line, and therefore the conditional density of the parameter estimate is diffuse, as shown by the shallow gray hill.

Figure 2: A generated sample that's close to the observed sample.

In Figure 2, we see sample B's effect. This sample's statistic is close to that of the observed sample. Therefore, point B is close to the line. The conditional density is much steeper around sample B's parameter value. This is because the probability density increases and decreases a lot along the blue line.

Figure 3: The combined estimate from both samples.

In Figure 3, we see the cumulative effect of both samples. The conditional density from both resembles that from generated sample B. This is because sample B's effect was given more weight. Generated samples that more closely resemble the observed sample contribute more to the parameter estimate.

I used this method on two applications

1. A theoretical network based on a network of sexual partners in Africa.

2. A real network of precedence citations between decisions of the Supreme Court of Canada.

The theoretical dataset was based on a respondent driven sample. An initial seed of people was selected, presumably by a simple random process. Members of this seed were polled about the number of sexual partners they had in the previous year, their HIV status, and their sexual orientation. For each reported partner, contact information was taken in order to recruit these partners into the sample as well. If the recruitment pool was exhausted before the desired number of respondents, new seed members were found.

Like the basis dataset, we assumed that the sample we were given contained the mentioned covariates of those polled, including the number of partnerships, and we assumed that only the partnerships that led to successful recruitment were available. This implies that large parts of the network could be hidden to us, such as partnerships that 'lead back' into the sample already taken.

Consider these two figures.

Figure 4 is from a sample of a network with identifying information about every reported partnership is given. Figure 5 is from the same sample, where only the partnerships that led to recruitment are known. Notice that Figure 4 includes lines going out to points that we did not sample directly. Notice also that Figure 4 includes cycles, in which you could go from one point to itself by following the lines and without backtracking.

We don't see these cycles in Figure 5, the recruitment-only network, because a person cannot be recruited if they are already in the sample. A cycle might be a triangle between persons J, K, and L, where all three are connected. Person J can recruit both K and L through those connections, but K cannot recruit L, so we never see the K-L connection. What we see is one person, J, with two distinct partners instead of the whole picture.

Socially, the K-L connection is key information, and mathematically this lack of cycles makes estimation of anything about this network a lot harder.

Figure 4: A network with identifying information from the sample.

Figure 5: A network with only recruitment information from the sample.

Nevertheless, I used ABC to estimate some features (parameters) of the population that this sample came from.

The parameters of interest were:

- The number of members in the population, (prior: geometric, mean 2000, truncated minimum 400)

- The probability of an HIV infection passing along a partnership, (prior: uniform 0 to 0.3)

- The proportion of this network that was already infected before any of these partnerships (prior: uniform 0 to 0.5)

- The 'closeness preference', a parameter governing the propensity to select partners that are similar to oneself. Zero is no preference, 10 is a strong preference, and a negative number represents a general dislike for those similar to you. (prior: uniform -2 to 10)

I conducted two rounds of Approximate Bayesian Computation by producing 500 and 2500 samples, respectively. The results are here. The yellow and red represent regions of high conditional probability. ABC allows us to narrow down the likely value of the real parameters to these yellow and red regions.

Parameter sets at points all across these graphs were tried, but only those in the non-blue regions produced samples that were anything like the observed sample.

Figure 6: Coloured contour maps of conditional densities, social network data.

From these, we can calculate marginal means to get our parameter estimates.

Parameter	True Value	Mean (Initial Round)	Mean (Final Round)
Population Size	1412	2049	2094
Initial Infection Rate	0.467	0.399	0.374
Transmission Chance	0.093	0.175	0.201
Closeness Preference	7.298	6.753	9.352

The second application was on the precedence citations of decisions in the Supreme Court of Canada (SCC). Because this is real data, I am not able to find the ground truth with which to compare my results, but I can showcase how the method behaves with real data.

When this data was collected, there were 10623 documents produced by the SCC, 9847 of which are cases (e.g. disputes between two parties). These decisions are among the more than 3,000,000 cases and other documents recorded in the Canadian Legal Information Institute (CanLII). Figure 7 shows part of the network of SCC cases.

Figure 7: Induced subgraph of cases and citations in Supreme Court of Canada

Cases span from 1876 to 2016, with more recent cases represented as points near the top. The larger points are those that have been cited often in Canadian law. Lines between points represent citations between SCC cases. Only 1000 cases are shown here, so the true network between SCC cases has roughly 10 times as many nodes and 100 times as many links.

My general interest in this dataset is to identify the features that make a law prone to being obscure over a long time.

I modelled each SCC case as 'vendor' that was trying to 'sell' citations to future Canadian legal cases. Each SCC case had an attractiveness a, where

and where

β_irrel : Parameter determining the decay in attractiveness over time (per 5 years without a citation).

β_pa : Parameter determining a preferential attachment factor, assuming that a case becomes more well-known whenever it is cited.

β_corp , β_crown , β_dis : Parameters determining increased/decreased attractiveness based on whether a case involved a corporation, the crown, or dissent between Supreme Court Judges, respectively.

what I found was less clear that in was in the synthetic data case, as shown in Figure 8.

Figure 8: Coloured contour maps of conditional densities, Supreme Court of Canada data.

It's impossible to read the variable names in this figure, I'm afraid. The jist, however, is that...

- a case that is not cited in five or more is less and less likely to receive citations over time.

- a case that is cited often is more likely to receive citations in the next five years.

- a case that involves a corporation is less likely to be cited.

- a case that involves the crown is more likely to be cited.

- a case that had dissenting opinions by some of the judges is more likely to be cited.

About the Author

Jack Davis is a teaching professor in Statistics at the University of Waterloo, Canada.

Their research spans statistics in sport, data mining, adult education and Bayesian computation.

They have a course called "Statistics, Gambling, and Games of Chance" on Udemy.

They also pretend to be an expert in writing and game design, which is why they wrote a textbook called "Writing for Statisticians" and why they programmed a video game called "Doc Logic" for Xbox 360.

They enjoy chess and chess variants, but they are so, so, very bad at them. They want to try living on a houseboat sometime.

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science