Commentary on Bayesian Poker
by Nick Christenson
This article was originally published in the June 2010 issue of
TwoPlusTwo
Magazine.
Background
The paper, Bayesian Poker, was written by Kevin Korb,
Ann Nicholson, and Nathalie Jitnah at Monash University in
Australia. It was published in the "Proceedings of Uncertainty in
Artificial Intelligence" at Stockholm, Sweden in August 1999.
The paper is available
online via anonymous FTP in PostScript format. If you don't have
a PostScript viewer, good free options are available for every platform.
In this article series I have previously examined papers that deal
with a Bayesian approach to creating poker models. One example
was Commentary
on "Bayes Bluff: Opponent Modeling in Poker",
published in the February 2010 TwoPlusTwo Magazine. The previous
article discusses Bayesian statistics in some detail. Briefly, it's
a probabilistic approach to analyzing possibilities where we adjust
how we view our initial probabilities based on new information that
we gain. A more complete look at Bayesian statistics can be found
in the article cited above.
In the Bayesian Poker paper the authors aren't dealing
directly with Bayesian statistics.
Instead their poker model is based on what is called a Bayesian network.
A Bayesian network is a probabilistic model that can be expressed by a
directed graph that satisfies the local Markov property. Okay, that
sounds really confusing, but the concepts are actually quite simple.
Here's what it means.
Think of a directed graph as a flow chart where we transition from one
node on the flow chart to another based on external events, such as
opponent actions or newly dealt cards. By "directed" it means that there
are no cycles, that is, there's no place in the flow chart where you can
form a loop. By "Markov property" they mean that all the transitions from
one node to another depend only on the state of the system at the particular
node. See, that wasn't so bad.
Introduction
The authors' research interest is coming up with algorithms for automated
reasoning in the face of uncertainty. This is why the paper was presented
at the Conference on Uncertainty in Artificial Intelligence. Dealing
with poker would seem to be an entirely appropriate way to approach
problems in this field.
The authors take a learning approach to poker. This is much like the
approach taken by the Computer Poker Research Group at the University
of Alberta. This is not a game theoretic approach where we seek a
mathematical solution to a restricted poker game. The authors are
attempting to write a poker algorithm that can perform well in "real"
poker games.
The game of choice for the Monash team is heads-up 5 card stud. This
is obviously less sexy than, say, no-limit Texas hold 'em, but the game has
some nice features. The fact that each player is dealt only five cards
means a smaller game space than seven card poker games such as Texas
hold 'em. Having only one unknown card means that there are fewer
degrees of freedom for estimating opponent hands. Also, having no
shared board means that relative hand strengths to the board are less
of an issue. I suspect another influence on the choice of games was
that the paper was written before the emergence of hold'em as the
pre-eminent form of poker.
Five-card Stud Poker
The second section of the paper describes the rules of 5 card stud.
As is common for academic papers, the game description contains both
much more information than a serious poker player needs and, almost
inevitably, overlooks some information that we would consider important.
In this case, it appears that each player antes one unit, then
they are playing limit poker with a one unit bet or raise on each
round. There is no bring-in, and the high hand always acts first.
A bet and up to three raises is allowed.
The authors provide equations to define the call/fold boundary.
However, they don't
provide equivalent formulations for check/bet or call/raise boundaries.
These are harder to solve explicitly, especially for earlier betting
rounds, but they are crucially important. I wish they had included
details on how their algorithm makes these decisions.
The authors also discuss some exploitive play details. They briefly
mention tells, which they won't use in their model for obvious reasons,
as well as the possibility of gaining information from opponent betting
patterns. Many of these issues overlap with those discussed in another
article in this series,
Commentary
on "On the Usefulness of Opponent Modeling: the Kuhn Poker case
study" from the June 2010 TwoPlusTwo Magazine.
Bayesian Network for Poker
Section 3 of the paper provides a description of the architecture for the
software they developed, which they call "BPP". This is mapped out in
Figure 1 of the paper. This is the "flow chart" I alluded to earlier in this
article. Note that there are no cycles in the graph. Readers are encouraged
to consider what has to be the case at every decision point in order to
satisfy the Markov property.
In the last paragraph of section 3.1 the authors list two assumptions
they use in their model. The first is that the hand types are independent.
That is, they don't take into account cards that have already been dealt.
For 5 card stud this probably isn't a horrible assumption. Certainly, we
can come up with cases where this assumption breaks down, but it's not
a horrendous oversimplification.
The second assumption is that an opponent's actions are solely based on
hand strength. I believe this one will have a bigger effect on the
algorithm's EV. Even the most simple 5 card stud player will be less
likely to bet a split pair of nines into a board consisting of three
overcards than if one's opponent shows nothing higher than an eight. A
player who does not allow for this behavior in opponents isn't going to
perform very well against more sophisticated players.
In the Hand Types subsection the authors divide the space of all possible
hands into a few categories. Once again, those who have been following
this article series for a while will immediately notice that we've seen
this abstraction before. The concept of grouping hands into bins was
discussed in the article,
Commentary
on "Abstraction Methods for Game Theoretic Poker", from the
June 2009 TwoPlusTwo Magazine.
The Monash researchers first tried a nine bin system. Apparently, it
didn't work well. One bin was designated for all no-pair hands.
Since these hands make up more than 50% of all five card deals, this
is obviously going to be a problem. Since their Table 1 lists nine
different hand types, even though the authors don't say, I strongly
suspect these account for their nine bins. They adapted their system
to use 17 bins, and they seem much happier with the results.
Frankly, I suspect nine bins would have been sufficient if they had
divided their hands more judiciously. It's my opinion that one bin
is probably sufficient for all hands that are a high two pair (with
the top pair higher than any opponent card) or better, for example.
Obviously, with half of all hands consisting of no pair, assigning
them all to one bin is inappropriate.
Next we're treated to a description of what the authors call their
Conditional Probability Matrices. These represent the model's view
of the probability of betting, calling, and folding on the next round
based on past behavior and prior action. These are learned values
weighted by experience. As was mentioned earlier, the authors use
only hands that reach showdown to assemble this data.
I believe this is a substantial mistake by the authors. First, hands
that reach showdown are not at all representative of the hands that
are played. Second, a great deal of information is available in
non-showdown hands, at least in terms of call/fold, raise/call, and
bet/check frequencies, among others. Under most circumstances,
extrapolating hand ranges from betting information without being able
to see the hand can be difficult. Usually, this is accomplished by
using Bayesian statistics, and it can be a serious challenge to get
right. In this case, though, the authors already have a Bayesian network
set up! They're in an ideal position to exploit this information. I
can't understand why they wouldn't.
Randomization
What the authors are talking about in their randomization section is
using a randomizing influence to add deception to the game.
The problem is that randomizing bet/call or call/fold decisions
along the boundary between these two actions really doesn't add any
deception and has very little effect on earn rate. Basically, the
authors are bluffing and slow playing with the wrong hand categories.
The curves they use to randomize actions along these boundaries are
depicted in Figure 2.
Despite my criticisms of their methodology in designing these functions,
they do have some good features that can be determined by inspection
of Figure 2. For one thing, the peak of the "call" curve comes with
d < 0. Because of the antes one needs to play more than half of all
hands. This requirement is also in evidence by the area under the fold
curve being less than half of the total probability space. Still, though,
the fold/call ratio represented by this figure is almost certainly too
large, especially if there has been betting on every street.
The authors admit to a problem with their betting model in the "Bluffing"
subsection. They seem to believe that this function is partially fulfilled
by their randomization, but I disagree. Nonetheless, they add additional
bluffs, bluffing 5% of the time on the end.
Most experienced poker players will realize that for this game that's
just not often enough. Assuming there is exactly one bet and a call
on each betting round before the river, there will be 8 bets in the pot
(three for each player plus the two antes). Mathematics of
Poker and other sources tell us that it's optimal to bluff about
1/9 of the time when facing this situation, that the optimal bluffing
ratio will be the bet size divided by the sum of the bet size and pot
size. Of course, better and worse situations will arise depending on
the composition of board cards.
The Bayesian poker model presented here also doesn't cover semi-bluffing.
It's true that there aren't as many good opportunities for semi-bluffing
in 5 card stud as there are in games such as hold 'em or 7 card stud,
but it's still a part of the game.
The bottom line is that while the authors of this paper may have a firm
grasp of Bayesian networks and learning A.I., their understanding of poker
is not very sophisticated at all. There are certainly several modifications
they could make to their model that I believe would improve its play
significantly.
Experimentation
In section 5 they evaluate how their model actually plays against several
opponents. Note that where they use the term "games", we would use the
word "hands". Their model does well against an inferior non-learning
opponent, but for even pathological models it's usually straightforward
to create a worse opponent. We don't know much about this opponent, so
I don't pay these results much credence.
The authors' model also wins against a simple rule-based model. At
least here we can see what the algorithm the authors used looks like
(Figure 4.) Some simple tweaks to this algorithm could make it much
stronger, but it's not horrendous for thirteen lines of pseudo-code.
Against human opponents the algorithm lost, but not much. However, there
are reasons to suspect that this result doesn't exactly demonstrate BPP's
poker prowess. The best of the players is identified as an "experienced"
poker player "who has frequently and successfully played amateur poker,"
whatever that means.
The authors lament that their learning algorithm didn't seem to make
much of an impact. I believe that their model was hamstrung by
some invalid assumptions about poker, and that the researchers don't
have a good understanding about poker learning. For example, they note
that BPP's win rate doesn't seem to change significantly after about 200
hands worth of information. That's the equivalent information of about
six hours of full, live ring game observation. My opponent models
aren't changing very rapidly after this much observation either.
Future Work
The authors suggest a number of areas in which they believe their model
could be enhanced. I want to comment on a couple of these.
They suggest refining their hand types further, presumably by adding
more "bins" for various hand types. My guess is that more bins isn't
necessary, and that fewer bins is likely to be worthwhile. I suspect,
though, that they need some different bins. Another avenue of attack
that might be fruitful is to make the bins context dependent. That is,
a split pair of eights is a lot stronger when our opponent is showing no
card higher than a seven than if multiple overcards are apparent.
The authors also suggest that they might improve their network structure.
I don't think card removal effects are likely to be large. However,
board strength is a much bigger factor. Some algorithm that takes this
into account and encourages semi-bluffing may produce some benefit.
The authors are worried about driving opponents out with strong hands.
When making an uncalled single unit bet on second street can yield two units
of profit, I don't think this is a bad thing. Given the structure of the
game they're playing, I expect that very aggressive play will be optimal.
They also suggest that adding bluffing to the opponent model might be
worthwhile. Personally, I think they need to completely revamp their
understanding of bluffing in general. There's little, if any, need to
randomize play along decision boundaries, and bluffs should come from
the hand range that is credibly strong but actually weak.
BPP also seems to suffer from a lack of multi-street context. That is,
I expect it to bet a moderately strong hand on one street, be raised
by its opponent, call, and then be quite likely to bet out again on the
next street. I recommend incorporating a Bayesian analysis of an opponent's
hand range based on behavior over the course of the hand.
Online Play
Since the original paper, several of the Monash folks have made some
improvements to BPP. It now plays limit Texas hold 'em, and was entered
into the 2006 computer poker competition. Since the web site intended
to house the results for these events is currently undergoing some rework,
I don't know how well it performed.
Anyone can play against the hold 'em version of the software online
at:
http://www.csse.monash.edu.au/~stevenm/texasholdem/texasholdem.html.
The interface could use a little work, but that's a tangental issue
to the issues discussed in this article.
Personally, I didn't find the program at all challenging. I believe it
contains many of the faults I've pointed out earlier in this article.
It doesn't seem to adjust its perceptions of the strength of its hand
based on the board cards. That is, a small flush with three board cards
seems to be treated the same as a small flush with four board cards. I
also don't see evidence that it distinguishes between having two pair with
an unpaired and paired board. As yet another example, I don't think it
adjusts its perception of what its opponent is likely to be holding between
betting rounds. In my opinion, while these shortcomings are problematic
in 5 card stud, they're absolutely disastrous in Texas hold 'em.
Conclusion
Obviously, this research was carried out by folks better versed in
their academic fields than in poker. That shouldn't be read as a knock
on them, building a great poker algorithm wasn't their primary goal.
Poker is just a means to help them understand some fundamental issues
in artificial intelligence research.
Despite some weaknesses, the model they've adopted contains some interesting
ideas that are worthy of consideration or adoption, even by models that
already play superior poker. Just because the researchers programmed
their algorithm using some significant misconceptions about the game
of poker says nothing bad about the basis for their research.
Moreover, I believe there is a lot to be learned from the misconceptions
their model demonstrates. It provides significant insight into the process,
and especially in the pitfalls, of implementing a successful poker algorithm.
Parameterizing the understanding that skilled poker players possess is
not easy. When the researchers in question fall short, we have the
opportunity to think about how we would improve on their results, and
methods for doing so are not always as obvious as they would first appear.
I believe this process helps us think about how difficult it is to create
a comprehensive poker model. I hope I can use this analysis to better
understand and improve how I play.
|