Featured post

Textbook: Writing for Statistics and Data Science

If you are looking for my textbook Writing for Statistics and Data Science here it is for free in the Open Educational Resource Commons. Wri...

Wednesday 30 September 2020

Review of The Theory of Gambling and Statistical Logic


There are two reasons why I read Review of The Theory of Gambling and Statistical Logic, Second Edition (2009), by Richard A. Epstein, which dictated which of the text's 440 pages I paid attention to and which I skimmed.


First, to learn more of the fundamentals of betting strategy for my current job at Sportlogiq. Second, to get material to include in a possible future Statistics and Gambling course.

Why this book and not something else?


Other material on the topic like the book "Weighing the Odds in Sports Betting" by King Yao cover a lot of the practicality of working with the casinos and some of the ways to find an edge in a couple particular sports and markets. The podcasts "Guys & Bets Sports Betting" by Oddsshark, and "Hot Takedown" by Fivethirtyeight while entertaining are too specific and time-sensitive to learn the science of smart gambling. Research papers have the theoretical detachment and staying power to get useful insights, but they're very specific. Papers give you the right answers; books give you the right questions.


I won't reveal here much of what I learned for work, but I'll share all my notes on the teaching possibilities.




The Parrando Principle


The Parrando Principle is basically this: You can turn two losing games into a winning game if you are allowed to choose which rule set you're going to use at each step of the game. A winning strategy might even be to randomly choose between the rule sets.


According to the book, it's a theoretical breakthrough and the next step beyond minimax strategies. Aside from a stock-trading strategy, I couldn't see the practicality of it, though. Most of the examples were these contrived mathematical novelties, but pay attention to this idea in this future.



The Epaminondas system


The Epaminondas system (examples given in the roulette and craps sections) for detecting gambling edges (say, from a worn out roulette wheel or imperfect dice) uses extreme values. It needs less data to reach statistical significance than the more classical chi-squared test, which tests the entire observed distribution against a theoretical one.




Utility Curves


page 46-47 talks about utility curves, and describes a person's utility curve with a concave component for small gains and losses, and a more traditional convex component for large gains and losses. In other words, winning $10 helps more than losing $10 hurts, but winning $100,000 helps a lot LESS than losing $100,000 hurts.


The convex utility curve is used a lot in actuarial work to explain why people purchase insurance, even though on average it costs more than it pays out - some financial losses are too large to recover from.


This version of the utility curve also factors in the small emotional thrill of gambling, I suppose. In other words, losing a small gamble is just paying for entertainment, while winning a small gamble is profiting and getting entertainment.


The amount of concavity is called 'the acceptable amount of unfairness'.


Wagering and Bankroll management


Pages 60-65 are about bankroll management, including a mathematical derivation of the Kelly Stakes Criterion, a surprisingly simple system to implement that provably maximizes your profit relative to you bankroll. The catch is that you need to make profitable bets, and you need to know the probability of an event happening. These profitable opportunities in "real-world" events like sports, races, and elections where the true probability is unknown by the market. The Kelly Stakes criterion doesn't tell you what to bet on, but it does tell you how much to bet, assuming you have an 'edge', where you know the probability better than the market.


This book covers an adjustment to Kelly based on your confidence in your probability estimate. The book also talks about other criteria like practical limitations on the number of wagers you can make.


Besides limited information or number of bets, there are many other modifications to Kelly besides those in the book. If one or more bets is going to take a substantial time to pay out, then a Simultaneous Kelly strategy makes sense. If you are using additional risk-management strategies like hedging, then you can wager more than Kelly would suggest on the opening part of a bet.





Page 155 states that there are (15 choose 4) or 1365 "distinct" bingo cards because that's the number of possible center columns. It also states there are (15 choose 4) X (15 choose 5)^4 "possible" bingo cards. I don't know what the author means by "distinct" here, and a quick internet search came up with no answer.


The maximum possible number of cards that be played together that can produce exactly 1 winner is 15^4. Any more than that and you're guaranteed to have more than 1 winning card at the same time, whenever there is a winner.


Page 156 has a table of the expected number of numbers called before there is a bingo, which I've done for one card. However, the author provides the time-to-bingo for N < 15^4 simultaneous cards, which is more useful if you're planning a game. It would make a cool simulation project.


On page 157, there is a quick proof that in multi-card games, if there are multiple winners in a game, that it's more likely to be wins along the column instead of along the rows. It's true, but I would have never suspected without thinking it through.


What's even weirder is that if there's any bingo at all, it's about twice as likely to come from a row.


So column wins come all at once, but row wins are more frequent. Weird.






Dice Games


Pages 174 and 175 talk about De Mere's Problem: How many dice rolls are needed until some arbitrary outcome happens?


It's a geometric distribution, and the minimum number of rolls n until the outcome happens with probability p is a cute formula:


pn = probability of outcome happening in the first n rolls.

p1 = probability of outcome happening in any given roll.


n = - log(1 - pn) / log(1 - p)


so it takes...


n = log(1 - 1/2) / log(1 - 1/6) = log(2) / (log(6) - log(5)) = 3.802


rolls of a 6-sided die until there's a 50% chance of it rolls on 6. It's not complicated, but it's cute and I like it.


Page 183 features James Bernoulli's game:


Step 1) Roll a die, call the result x.

Step 2) Roll x dice. Your score is the total.


If the total is more than 12, you win $1.

If the total is 12, you won/lose $0.

If the total is 11 or fewer, you lose $1.


What's unusual about this game is that E[score] = 12.25, but E[payout] < 0.


Specifically, Pr[score > 12] = 0.468, Pr[score < 12] = 0.484.


This is a good lesson on how over/unders and point spreads should be built around the median outcome, not the mean outcome.


It's not in the book, but high-variance strategies when losing in sports work on a similar principle. When a hockey team that's behind by 1 pulls their goalie, the average number of goals they will lose by increases, but so does their chance of winning.






Page 184 talks about the optimal solution for a 'maximum roll' game that seems like a proto-Yahtzee. In this game, five dice are rolled, and at least one die is kept or 'frozen', and the rest are re-rolled. This process is repeated for the rolled dice until every die is 'frozen'.


The goal is get the highest average sum.


The optimal strategy can be found with the dynamic programming approach summarized below:


If you roll two dice, you can freeze either one or both of the dice. Since the expected score from rolling a d6 is 3.5, the optimal strategy is to keep both when both are in {4,5,6}, and to re-roll the lower die otherwise. This strategy produces an expected score of 8.236.


If you roll three dice, you can freeze one, two, or all three.


If the sum of the lowest two of the three are less than 8.236, reroll those two and use the two-dice strategy. Otherwise, if the lowest one is less than 3.5, reroll that one. This strategy produces an expected score of 13.425.


A similar strategy produces an expected score of 18.844 for 4 dice, and 24.442 for 5 dice.


Only two pages, 193 and 194 were written on Yahtzee. There's a handy table on the expected value of aiming for different scoring possibilities, which I assume were derived from a similar dynamic programming method as the maximum dice in the previous example. I'm still confused about the possibility of changing your goal after 1 or 2 rolls if there is unexpected results.


Page 195 talks about non-transitive dice, or rock-paper-scissors dice. Essentially that you can assign scores to different faces on dice such that E[score A] = E[score B] = E[score C], but the Pr(A>B), Pr(B>C), Pr(C>A) are all more than 0.5. In short, dice A, B, and C act like rock, paper, and scissors in a competition to roll higher than your opponent.



Pages 200-205 talk about Craps, but it's just a game description.


Page 207 talks about the average duration of a craps hand, which is a little more system.


Page 209 proposes a game of craps with 8-sided dice. That's a good idea for problems and exercises in general - how do classical games change when there are dice with more sides, or unfair dice?




page 212-213 talks briefly about backgammon, specifically how AI has quietly become a superhuman force in the game. What makes this interesting is the random element of rolling to move; a minimax-based search just isn't feasible like it is with chess.


Exercise from book: Prove no weighting of two 6-sided dice can make Pr(sum = 2) = Pr(sum = 3) = ... = Pr(sum = 12)


Possible expansion: Is this true for n dice or d sides?


Card Games in General


Around page 210 there is a discussion of card shuffling methods and the amount of entropy they produce.


For example, a perfect shuffle produces log_2(52!) bits of entropy because each of the 52! deck orders is equally likely. Riffle shuffling and even bridge shuffling (interleaving) produces much less entropy.


As another example, an ideal cut produces log_2(52) bits of entropy because there are 52 equally likely configurations that the deck can be in. In reality, cuts closer to the middle of the deck are more likely, so the entropy is lower.


In a course, card decks is a good place to talk about sampling with out without replacement.


The book talks briefly about a game called "roller coaster dice" where a player rolls 2d6 and then predicts if the next 2d6 roll will be higher or lower than the previous roll. A similar game can be played by drawing cards from a deck, and prediction can be improved by keeping track of the cards that have already been seen and 'removed' from the deck.


A useful programming project to demonstrate the advantage gained from accounting for replacement could be to simulate playing this game using two strategies and tracking the average number of successful guesses in a deck (in total or until first failure).


"With replacement" strategy of guessing 'greater' after drawing 2-7, 'less' after drawing 9-A, and either one after drawing an 8.


"Without replacement" strategy of guessing 'greater' when more than half the remaining cards have a greater rank than a previous card, and 'less' otherwise.


page 229 has some information on the expected number of cards needed to draw until a certain combination is encountered, which is useful for trick-taking games like Bridge, Canasta and Shanghai, or collectible card games like Magic: The Gathering.





pages 248-260 is on poker both in general and Texas Hold-Em.


There are thousands and thousands of resources specifically for poker, so I only took down a few tables and references of interest.


The probability of various hands for a 5-card hand (e.g. for draw poker), the best 5 of a 7-card hand (e.g. Texas Hold-Em), and the multiplier between them (which I added).



Pr(in 5 cards)

Pr(In 7 cards)


High Card




1 Pair  




2 Pair




3 of a kind
















Full House




4 of a kind








Straight Flush




Royal Flush







There was also a table on 3-card hand probabilities, which are useful in games like Texas Hold-Em for the 3 cards of the flop.




High Card


1 Pair






3 of a kind





According to Shackleford and Galfond, the following hands are worth playing pre-flop (when all you have are the two cards in front you and no community cards) if everyone ahead of you has folded already.


Early Position:

77 88 99 TT JJ QQ KK AA



QTs QJs T9s


Which is the top 12.8% of hands

(Note T stands for 'ten', and s stands for 'same suit').



In late position, since you have fewer players left to play, the playable range extends to also include:


A6s A7s A8s A9s

K8s K9s KT

Q8s Q9s QT QJ

J8s J9s JT




Which is the top 25.5% of hands.



page 268 - Blackjack Basic Strategy.


Possible Homework/Lecture project - How does the basic strategy change when the count is high (i.e. there are relatively more high cards left to be played in the shoe/deck.), or when there are multiple decks? What if there we removed all the hearts and diamonds?


page 271 - Card Counting in Blackjack.


Peter Griffin has a concept called EOR (Effect of Removal), which is essentially the amount that the game gets worse for the dealer (and better for you) when certain cards are removed from the deck.


When the total EOR is positive, the dealer is more likely to bust than average, and therefore the expected value of your bet improves (although rarely will it improve to positive)



Effect of Removal




























On page 274, there are simple point-count systems to approximate this, such as the Hi-Lo count that the MIT Blackjack team used.


If you're thinking about going to a casino blackjack table to make a lot of money counting cards, consider something that will net you more money with less work, like making YouTube videos, or standing alone in a parking lot.


Horse Racing, Parimutuel


Page 300 talks about the parimutuel, or 'bet among yourselves' system used in horse racing. In a parimutuel system, the odds offered at any given point (except right after the opening odds have been offered) are based on the proportion of money backing that horse or outcome.


The formula is


(decimal odds) = (1 - vig) / (proportion of bet money on this outcome).


The vig, or proportion of bets that the racetrack takes, is typically higher in parimutuel than it is in sports. A typical quoted amount was 12%, so a $1 bet on a horse that had 20% of the money behind it would pay out


$1 x (1 - 0.12) / (0.20) = $4.40


As the proportion of money changes, these odds can change a great deal too. Like other sports gambling, the odds you're quoted on are the odds your winning are based on, not the closing odds. That volatility and the maintenance of the horse track might justify the higher vig.




References worth a follow-up:




Other older games like Primero, Trijaques, Quinquenoue, Hazard, Esperance, Troid Dez, Passedix, Rafle, Her Jeu des Tas.


David Agard and Michael Shackleford - "A New Look at the Probabilities in Bingo," The College Mathematics J. 33:4, (2002) 301-305.


Stewart N. Ethier - Optimal Play: Mathematical Studies of Games and Gambling.


Peter Griffin - The Theory of Blackjack


Phil Galfond - "G-Bucks Conceptualizing Money Matters" April 2007 - Bluff Magazine.community


Martin Gardner - Mathematical Carnival


John D. Beasley: The Mathematics of Games.


Michael Shackleford: "Texas Hold-Em" The Wizard of Odds.


King Yao: "Weighing the Odds in Poker".


Phil Woodward: "Yahtzee: The Solution" Chance 16:1 (2002) 17-20.

No comments:

Post a Comment