Statistics et al.: Book review: Big Data by Timandra Harkness

I picked up Big Data by Timandra Harkness solely on the testimonials of Hannah Fry and Matt Parker on the front and back covers. The 2017 printing that I read is a 300-odd page general interest book about recent advances in big data.

"Big data" starts off pretty boilerplate for the topic – with a lot of definitions about what makes data "big"; volume, variety, velocity, and the like. It also gives some historical context about the growth of data over time, through early censuses, primitive computers, to today. The rest of this book is the result of interviews across the world with people working on different big data projects.

Some of these have received a lot of news coverage before, like the Large Hadron Collider, and consumer profiling programs like those made by Dunnhumby. Others haven't had as much attention, at least in the western world, like the field device that identifies, and selectively kills, insects based on the sound they make as they fly by.

There are two things that separate Harkness's book from the rest of the literature on big data I've read: Its overall approachability and the extra attention paid to data ethics.

Regarding approachability, this book keeps both technical and business jargon to an absolute minimum. Someone who has only heard about big data in an article or two could pick up this book and follow along with ease. That means a few things are some things are not exactly right (life expectancy is the mean years of life, but the median is described), but the descriptions are much, MUCH, easier to understand as a result.

If you've read anything by Malcolm Gladwell (Outliers, Tipping Point) or Stephen Baker (Final Jeopardy: The Story of Watson), you have a good idea of the difficulty and scope Timandra Harkness's writing. If you've read any of the Freakanomics series by Dubner and Levitt then you have an idea of the tone of "Big Data".

Data ethics is the focus of the third and final part of "Big Data". Harkness discusses the rise, pushback, and implications of mass surveillance. The cases she covers here include Stingray, a technology to locate a particular cellphone and remotely collect a startling amount of metadata and even message content from that phone. She writes about a technology that has been implemented in Oakland to listen for gunshots, which has been used to also listen in on conversations.

She also writes about the limits of confidentiality in the face of better and better profiling, specifically the ability to identify individual people by combining disparate datasets. This isn't always a bad thing; it can be used to find someone's medical history and pertinent medical information when they show up to an emergency room and aren't in a state to provide that information.

She talks about the perpetuation of inequalities through algorithmic decision making (e.g. for bank loans, criminal punishment and parole, job applications, and school admission), even in cases where the algorithm was designed to avoid discrimination. Ethical issues like this get rolled into a concept called 'algorithmic accountability' – as in 'who is to blame when unfair discrimination happens because of a machine's decision?'.

Other literature has discussed data ethics before, but not usually in such an applied and relevant manner. Most other work on the topic I've read puts too much focus on ethical issues that existed before big data, like abortion, or they require a background in philosophy to understand. The "Big ideas" part of Harkness's "Big Data" does neither; it gets straight to the newly emergent issues and does a good job of summarizing the problems, potential benefits, and complications surrounding them. There are large excerpts of these last few chapters that belong in, say, a statistics or computing science course on data ethics.

- Jack Davis

Statistics et al.

Featured post

Textbook: Writing for Statistics and Data Science

Thursday, 5 September 2019

Book review: Big Data by Timandra Harkness

No comments:

Post a Comment