LV Revealed
 
 

Commentary on Bayesian Poker

by Nick Christenson

This article was originally published in the June 2010 issue of TwoPlusTwo Magazine.

Background

The paper, Bayesian Poker, was written by Kevin Korb, Ann Nicholson, and Nathalie Jitnah at Monash University in Australia. It was published in the "Proceedings of Uncertainty in Artificial Intelligence" at Stockholm, Sweden in August 1999. The paper is available online via anonymous FTP in PostScript format. If you don't have a PostScript viewer, good free options are available for every platform.

In this article series I have previously examined papers that deal with a Bayesian approach to creating poker models. One example was Commentary on "Bayes Bluff: Opponent Modeling in Poker", published in the February 2010 TwoPlusTwo Magazine. The previous article discusses Bayesian statistics in some detail. Briefly, it's a probabilistic approach to analyzing possibilities where we adjust how we view our initial probabilities based on new information that we gain. A more complete look at Bayesian statistics can be found in the article cited above.

In the Bayesian Poker paper the authors aren't dealing directly with Bayesian statistics. Instead their poker model is based on what is called a Bayesian network. A Bayesian network is a probabilistic model that can be expressed by a directed graph that satisfies the local Markov property. Okay, that sounds really confusing, but the concepts are actually quite simple. Here's what it means.

Think of a directed graph as a flow chart where we transition from one node on the flow chart to another based on external events, such as opponent actions or newly dealt cards. By "directed" it means that there are no cycles, that is, there's no place in the flow chart where you can form a loop. By "Markov property" they mean that all the transitions from one node to another depend only on the state of the system at the particular node. See, that wasn't so bad.

Introduction

The authors' research interest is coming up with algorithms for automated reasoning in the face of uncertainty. This is why the paper was presented at the Conference on Uncertainty in Artificial Intelligence. Dealing with poker would seem to be an entirely appropriate way to approach problems in this field.

The authors take a learning approach to poker. This is much like the approach taken by the Computer Poker Research Group at the University of Alberta. This is not a game theoretic approach where we seek a mathematical solution to a restricted poker game. The authors are attempting to write a poker algorithm that can perform well in "real" poker games.

The game of choice for the Monash team is heads-up 5 card stud. This is obviously less sexy than, say, no-limit Texas hold 'em, but the game has some nice features. The fact that each player is dealt only five cards means a smaller game space than seven card poker games such as Texas hold 'em. Having only one unknown card means that there are fewer degrees of freedom for estimating opponent hands. Also, having no shared board means that relative hand strengths to the board are less of an issue. I suspect another influence on the choice of games was that the paper was written before the emergence of hold'em as the pre-eminent form of poker.

Five-card Stud Poker

The second section of the paper describes the rules of 5 card stud. As is common for academic papers, the game description contains both much more information than a serious poker player needs and, almost inevitably, overlooks some information that we would consider important.

In this case, it appears that each player antes one unit, then they are playing limit poker with a one unit bet or raise on each round. There is no bring-in, and the high hand always acts first. A bet and up to three raises is allowed.

The authors provide equations to define the call/fold boundary. However, they don't provide equivalent formulations for check/bet or call/raise boundaries. These are harder to solve explicitly, especially for earlier betting rounds, but they are crucially important. I wish they had included details on how their algorithm makes these decisions.

The authors also discuss some exploitive play details. They briefly mention tells, which they won't use in their model for obvious reasons, as well as the possibility of gaining information from opponent betting patterns. Many of these issues overlap with those discussed in another article in this series, Commentary on "On the Usefulness of Opponent Modeling: the Kuhn Poker case study" from the June 2010 TwoPlusTwo Magazine.

Bayesian Network for Poker

Section 3 of the paper provides a description of the architecture for the software they developed, which they call "BPP". This is mapped out in Figure 1 of the paper. This is the "flow chart" I alluded to earlier in this article. Note that there are no cycles in the graph. Readers are encouraged to consider what has to be the case at every decision point in order to satisfy the Markov property.

In the last paragraph of section 3.1 the authors list two assumptions they use in their model. The first is that the hand types are independent. That is, they don't take into account cards that have already been dealt. For 5 card stud this probably isn't a horrible assumption. Certainly, we can come up with cases where this assumption breaks down, but it's not a horrendous oversimplification.

The second assumption is that an opponent's actions are solely based on hand strength. I believe this one will have a bigger effect on the algorithm's EV. Even the most simple 5 card stud player will be less likely to bet a split pair of nines into a board consisting of three overcards than if one's opponent shows nothing higher than an eight. A player who does not allow for this behavior in opponents isn't going to perform very well against more sophisticated players.

In the Hand Types subsection the authors divide the space of all possible hands into a few categories. Once again, those who have been following this article series for a while will immediately notice that we've seen this abstraction before. The concept of grouping hands into bins was discussed in the article, Commentary on "Abstraction Methods for Game Theoretic Poker", from the June 2009 TwoPlusTwo Magazine.

The Monash researchers first tried a nine bin system. Apparently, it didn't work well. One bin was designated for all no-pair hands. Since these hands make up more than 50% of all five card deals, this is obviously going to be a problem. Since their Table 1 lists nine different hand types, even though the authors don't say, I strongly suspect these account for their nine bins. They adapted their system to use 17 bins, and they seem much happier with the results.

Frankly, I suspect nine bins would have been sufficient if they had divided their hands more judiciously. It's my opinion that one bin is probably sufficient for all hands that are a high two pair (with the top pair higher than any opponent card) or better, for example. Obviously, with half of all hands consisting of no pair, assigning them all to one bin is inappropriate.

Next we're treated to a description of what the authors call their Conditional Probability Matrices. These represent the model's view of the probability of betting, calling, and folding on the next round based on past behavior and prior action. These are learned values weighted by experience. As was mentioned earlier, the authors use only hands that reach showdown to assemble this data.

I believe this is a substantial mistake by the authors. First, hands that reach showdown are not at all representative of the hands that are played. Second, a great deal of information is available in non-showdown hands, at least in terms of call/fold, raise/call, and bet/check frequencies, among others. Under most circumstances, extrapolating hand ranges from betting information without being able to see the hand can be difficult. Usually, this is accomplished by using Bayesian statistics, and it can be a serious challenge to get right. In this case, though, the authors already have a Bayesian network set up! They're in an ideal position to exploit this information. I can't understand why they wouldn't.

Randomization

What the authors are talking about in their randomization section is using a randomizing influence to add deception to the game. The problem is that randomizing bet/call or call/fold decisions along the boundary between these two actions really doesn't add any deception and has very little effect on earn rate. Basically, the authors are bluffing and slow playing with the wrong hand categories. The curves they use to randomize actions along these boundaries are depicted in Figure 2.

Despite my criticisms of their methodology in designing these functions, they do have some good features that can be determined by inspection of Figure 2. For one thing, the peak of the "call" curve comes with d < 0. Because of the antes one needs to play more than half of all hands. This requirement is also in evidence by the area under the fold curve being less than half of the total probability space. Still, though, the fold/call ratio represented by this figure is almost certainly too large, especially if there has been betting on every street.

The authors admit to a problem with their betting model in the "Bluffing" subsection. They seem to believe that this function is partially fulfilled by their randomization, but I disagree. Nonetheless, they add additional bluffs, bluffing 5% of the time on the end.

Most experienced poker players will realize that for this game that's just not often enough. Assuming there is exactly one bet and a call on each betting round before the river, there will be 8 bets in the pot (three for each player plus the two antes). Mathematics of Poker and other sources tell us that it's optimal to bluff about 1/9 of the time when facing this situation, that the optimal bluffing ratio will be the bet size divided by the sum of the bet size and pot size. Of course, better and worse situations will arise depending on the composition of board cards.

The Bayesian poker model presented here also doesn't cover semi-bluffing. It's true that there aren't as many good opportunities for semi-bluffing in 5 card stud as there are in games such as hold 'em or 7 card stud, but it's still a part of the game.

The bottom line is that while the authors of this paper may have a firm grasp of Bayesian networks and learning A.I., their understanding of poker is not very sophisticated at all. There are certainly several modifications they could make to their model that I believe would improve its play significantly.

Experimentation

In section 5 they evaluate how their model actually plays against several opponents. Note that where they use the term "games", we would use the word "hands". Their model does well against an inferior non-learning opponent, but for even pathological models it's usually straightforward to create a worse opponent. We don't know much about this opponent, so I don't pay these results much credence.

The authors' model also wins against a simple rule-based model. At least here we can see what the algorithm the authors used looks like (Figure 4.) Some simple tweaks to this algorithm could make it much stronger, but it's not horrendous for thirteen lines of pseudo-code.

Against human opponents the algorithm lost, but not much. However, there are reasons to suspect that this result doesn't exactly demonstrate BPP's poker prowess. The best of the players is identified as an "experienced" poker player "who has frequently and successfully played amateur poker," whatever that means.

The authors lament that their learning algorithm didn't seem to make much of an impact. I believe that their model was hamstrung by some invalid assumptions about poker, and that the researchers don't have a good understanding about poker learning. For example, they note that BPP's win rate doesn't seem to change significantly after about 200 hands worth of information. That's the equivalent information of about six hours of full, live ring game observation. My opponent models aren't changing very rapidly after this much observation either.

Future Work

The authors suggest a number of areas in which they believe their model could be enhanced. I want to comment on a couple of these.

They suggest refining their hand types further, presumably by adding more "bins" for various hand types. My guess is that more bins isn't necessary, and that fewer bins is likely to be worthwhile. I suspect, though, that they need some different bins. Another avenue of attack that might be fruitful is to make the bins context dependent. That is, a split pair of eights is a lot stronger when our opponent is showing no card higher than a seven than if multiple overcards are apparent.

The authors also suggest that they might improve their network structure. I don't think card removal effects are likely to be large. However, board strength is a much bigger factor. Some algorithm that takes this into account and encourages semi-bluffing may produce some benefit. The authors are worried about driving opponents out with strong hands. When making an uncalled single unit bet on second street can yield two units of profit, I don't think this is a bad thing. Given the structure of the game they're playing, I expect that very aggressive play will be optimal.

They also suggest that adding bluffing to the opponent model might be worthwhile. Personally, I think they need to completely revamp their understanding of bluffing in general. There's little, if any, need to randomize play along decision boundaries, and bluffs should come from the hand range that is credibly strong but actually weak.

BPP also seems to suffer from a lack of multi-street context. That is, I expect it to bet a moderately strong hand on one street, be raised by its opponent, call, and then be quite likely to bet out again on the next street. I recommend incorporating a Bayesian analysis of an opponent's hand range based on behavior over the course of the hand.

Online Play

Since the original paper, several of the Monash folks have made some improvements to BPP. It now plays limit Texas hold 'em, and was entered into the 2006 computer poker competition. Since the web site intended to house the results for these events is currently undergoing some rework, I don't know how well it performed.

Anyone can play against the hold 'em version of the software online at: http://www.csse.monash.edu.au/~stevenm/texasholdem/texasholdem.html. The interface could use a little work, but that's a tangental issue to the issues discussed in this article.

Personally, I didn't find the program at all challenging. I believe it contains many of the faults I've pointed out earlier in this article. It doesn't seem to adjust its perceptions of the strength of its hand based on the board cards. That is, a small flush with three board cards seems to be treated the same as a small flush with four board cards. I also don't see evidence that it distinguishes between having two pair with an unpaired and paired board. As yet another example, I don't think it adjusts its perception of what its opponent is likely to be holding between betting rounds. In my opinion, while these shortcomings are problematic in 5 card stud, they're absolutely disastrous in Texas hold 'em.

Conclusion

Obviously, this research was carried out by folks better versed in their academic fields than in poker. That shouldn't be read as a knock on them, building a great poker algorithm wasn't their primary goal. Poker is just a means to help them understand some fundamental issues in artificial intelligence research.

Despite some weaknesses, the model they've adopted contains some interesting ideas that are worthy of consideration or adoption, even by models that already play superior poker. Just because the researchers programmed their algorithm using some significant misconceptions about the game of poker says nothing bad about the basis for their research.

Moreover, I believe there is a lot to be learned from the misconceptions their model demonstrates. It provides significant insight into the process, and especially in the pitfalls, of implementing a successful poker algorithm. Parameterizing the understanding that skilled poker players possess is not easy. When the researchers in question fall short, we have the opportunity to think about how we would improve on their results, and methods for doing so are not always as obvious as they would first appear. I believe this process helps us think about how difficult it is to create a comprehensive poker model. I hope I can use this analysis to better understand and improve how I play.