Statistics et al.: February 2017

Thursday, 16 February 2017

I read this: Chess Variants - Ancient, Regional and Modern

Chess Variants - Ancient, Regional, and Modern by John Gollon is a book published in 1968 that describes 33 different versions of chess, and includes sample games for most of them. It is the closest source I have yet encountered to answering the 'why chess' questions.

In the author's words, it is intended to be a Hoyle's Book of Games, but for chess. It also asrved as a humbling reminder to look outside university resources sometimes. The Bennett library may have nearly every book one could want on statistics, but it has perhaps 10 on chess, and those are mostly about artificial intelligence.

There's a lot of useful information from this book, much of the subtler implications are beyond me. Here's the gist of what I learned:

I read this: Chess Metaphors: Artificial Intelligence and the Human Mind

Chess Metaphors: Artificial Intelligence and the Human Mind, by Diego Rasskin-Gutman uses chess to explain concepts of intelligence both organic and artificial. The first part of the book is about the human mind, and cognitive models like schema and memory chunking, which I only skimmed. The second part discusses how an AI program can efficiently explore all the relevant consequences of any given chess move.

In the AI portion, 'Metaphors' explained how minimax algorithms worked, and discussed the Alpha-Beta algorithm, a chess playing staple, in particular. Minimax algorithms are useful for zero-sum two player games, like chess and most other head-to-head games.

Algorithms of this type work by compiling a set of possible moves, and for each move a set of possible responses by the opponent. The consequences of the response (e.g. the board state) is evaluated for each considered response. The worst (from the perspective of the AI) board state possible is assumed to be the response, and the evaluation (e.g. how 'good' that board state is) is recorded as the value of making that move. This repeats for each possible move, such that the AI has the worst case that comes from each possible move (each max). It then chooses the move that produces the best of the worst cases (the minimax). If someone playing against this AI doesn't respond with that optimal response, all the better.

The algorithm described above would only work for looking one move (one ply) ahead for each team. To consider deeper strategies, the process is repeated in an exponential explosion of possibilities. Reasonably, many of these algorithms differ in their focus on finding ways to avoid evaluating unnecessary positions, which is what Alpha-Beta does.

What really surprised me is how simple the evaluations can be. A very common evaluation method is to assign a value for each piece and a value for each square that piece can move to. This implies that such an AI would be functional, although not optimal, for a wide range of chess variants. In fact, if only the orthodox pieces are used, no additional programming would be required other than alter the possible moves to the new board. An algorithm like alpha-beta could be applied 'out of the box' to variants that don't use new pieces. This would explain why the 'chess variants' app that has the smaller boards like Garner's Minichess, and rearranged boards like Chess960 uses Alpha-Beta.

Including new pieces would involve programming in their possible moves and assigning them a material value (e.g. worth 4 pawns). Therefore, adapting existing AI to many of the variations seen in John Gollon's "Ancient, Regional, and Modern" book should be feasible. There are many ways to tune the relative value of pieces and immediate movement ability, including pitting AIs with different parameter values against each other in an evolutionary pool.

Footnote: Judging by http://www.chessvariants.com/ and the chess variants subreddit, it's much easier to make a variant than to drum up support and playerbase for it.

To consider later: Smess (also available as an app):
https://boardgamegeek.com/boardgame/1289/smess-ninnys-chess
In this game, the pieces have very simple moves and the board itself defines the difference in moves.

Saturday, 11 February 2017

Creating Homework Material for Statistical Writing Classes and Workshops.

This summer, I'll be offering a workshop on statistical writing to the graduate students in the SFU Stats department. In the year that follows, I hope to teach SFU's Stat 300 course, an undergraduate writing course that is mandatory for all statistics majors. I want this course to be offered at more universities, but since the typical undergrad degree in stats is already overloaded, this would need some additional motivation, so here's my anecdotal hook:

When graduate schools ask for a letter about your motivations for applying, or about your general background, there's a few things that are being examined. First, different graduate supervisors have different specialties, and they may be looking at your letter for signs of a good match. Second, the letter is a means of seeing firsthand your personal ability to communicate in English. They ask for something personal rather than a set topic to discourage plagiarism. They are not interested in your level of motivation; everyone says they are highly motivated.

I've recently been writing reference letters and filling out graduate school reference forms for students that have been in my previous 300-level classes. All of these schools are asking about the ability to write. Many statistics undergrads are applying to fields outside of pure statistics, for example economics, medical science, and business. All of them are asking their references about skills in English, often spoken English and always written. Likewise, the Statistical Society of Canada asks referees about spoken English as a part of their A.Stat accreditation.

In order to develop these skills in a program that usually focuses on mathematical theory and programming ability, I've sketched out some projects and exercises that could fit into a workshop or course.

The instructor (me) would ask colleagues from other fields that use quantitative data, such as sociology, anthropology, biology, business, ecological restoration, and history for reports from previous or current projects.

The students would read the reports and identify what the sampling method was, what analysis was done, and to check the assumptions on those methods to the data. They would interpret the results in their own words (not in the words of the discussion or conclusion section), and compose a couple of questions about the research that weren't answered. These questions don't have to pertain to the domain; they should pertain to the methodology. These would be questions like: were there any outliers that affected your results? Did you consider multiple testing? Did you consider regression/anova?

For example, for this research report, Water Quality of Stoney Creek and its Effects on Salmon Spawning, by SFU undergrads by Oak, Tony; Thai, Michelle; Orgil, Indra; Ngo, Kevin; Lu, Jerry , available at http://summit.sfu.ca/item/12770

A writing assignment could be:

What were the parameters to be estimated?
What biologically important thresholds or critical values are mentioned for the parameters being estimated?
What sampling method was used? How many sampling units are there?
Were there any null hypotheses being tested? If so, were they rejected? Should they have been?
Describe the qualitative differences between the four sites. Use the map and Figs 4,5,6, and 7.
In your own words, summarize the quantitative data described in Figures 2 and 3.
Compose two suggestions or questions about the analysis that may not have been considered.
Describe one way you could build upon (e.g. expand, follow up) this study, the type of information you would collect from this new work, and what additional conclusions you could make if the data matched your expectations.

(Note: Please don't take these questions as a criticism of this research report. It was selected to show how material is available within one's on campus. Even a quick skim of the report will show it's truly excellent work for undergrads.)

Another exercise I'm thinking of doing is to have students copy edit some scientific writing. This could be a publication or a blog post. I would do this twice; once with a well written work, and once in something of questionable quality like the unedited papers that come out of discount publishers. If I used something from ArXiv, with the authors' permission, I could use later, more complete versions of the same work as a 'key'. However using ArXiv is untenable for graded work because the 'key' is publicly available.

Also, there are the reading comprehension exercises.The ones I had posted earlier (here and here) were meant to be done in less than 2 hours, and had questions focused on specific statistical topics. Another possibility is to write more such assignments that require less statistical expertise, but more in-depth composition.

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science

Thursday, 16 February 2017

I read this: Chess Variants - Ancient, Regional and Modern

I read this: Chess Metaphors: Artificial Intelligence and the Human Mind

Saturday, 11 February 2017

Creating Homework Material for Statistical Writing Classes and Workshops.