Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Sunday 29 September 2019

Tips to successfully web scrape with a macro

Recently, I updated the cricket simulator that I made in grad school, which entailed gathering four years of new T20I, IPL, and ODI cricket data. That's about 1000 matches, and the website ESPNcricinfo has been dramatically updated since I first scraped it. For that matter, so have the tools in R for scraping.

However, for one part, the play-by-play commentaries, nothing was working, and I ended up relying on recording and repeating mouse-and-keyboard macros. It's crude, but the loading-as-scrolling mechanic was just too hard to deal with programmatically, even with otherwise very powerful Rselenium.

Using macros to scrape pages is a trial-and-error process, but with the following principles, you can drastically reduce the number of trials it takes to get it right.

Alternatives to cryptocurrency for seasteads

Frequently, cryptocurrency and blockchain technology (crypto for short) is cited as a necessity for a functioning seastead. In the long term, when or if seasteads are large enough and economically important enough to fight for sovereignty, having your very own currency makes sense.

However, Cryptocurrency works best in places with a robust internet infrastructure to back up digital transactions, and where there is a critical mass of people in an area willing to trade their goods and services for the currency. In the early days of floating hamlets though, better alternatives to crypto exist.

Friday 6 September 2019

Making Crossword Entries from Jeopardy! Before and After Clues

Before and After clues in Jeopardy! are clues pointing to two answers in which the last part of one answer is also the first part of the second answer. For example, "Supernatural kids' cartoon meets Star Wars prequel" could be a clue to both "Danny Phantom" and "The Phantom Menace", which would be shortened to "Danny Phantom Menace". 

These are a weak spot for me in Jeopardy, so I tried to make some more crosswords using only the 'before and after' clues on from Jeopardy! as found in the J-archive. It worked, sort of.

Thursday 5 September 2019

Book review: Big Data by Timandra Harkness

I picked up Big Data by Timandra Harkness solely on the testimonials of Hannah Fry and Matt Parker on the front and back covers. The 2017 printing that I read is a 300-odd page general interest book about recent advances in big data.

"Big data" starts off pretty boilerplate for the topic – with a lot of definitions about what makes data "big"; volume, variety, velocity, and the like. It also gives some historical context about the growth of data over time, through early censuses, primitive computers, to today. The rest of this book is the result of interviews across the world with people working on different big data projects.