Commentary on "Bayes' Bluff: Opponent Modeling in Poker"
by Nick Christenson
This article originally appeared in the February 2010 issue of
TwoPlusTwo Magazine.
Preliminaries
Last time around I discussed a paper on optimal strategies for poker
from the University of Alberta's Computer Poker Research Group (CPRG).
This month I examine a paper
from the same group on the issue of opponent modeling. It is titled,
"Bayes'
Bluff: Opponent Modeling in Poker". It was presented at the
21st Conference on Uncertainty in Artificial Intelligence (UAI)
in 2005.
There are two ways to approach creating a winning poker algorithm,
and both are important in understanding the game. The first is trying
to play the game optimally, that is, trying to "solve" the game. However,
as was discussed in the last article in this series, by no means does
playing optimally guarantee that one will win the most money. It just
guarantees you won't lose (or, in the case of negative-sum games, you
will do no worse than lose your share of the rake.) In order to maximize
profits against opponents that don't play optimally, one must understand
their strategy and then compute a response against it that maximizes
profit. The first step in doing this, understanding opponents' strategy,
is called "opponent modeling".
Bayesian Poker
One of the ways to model an opponents strategy is using a mathematical
technique called Bayesian statistics. This is based around
Bayes theorem.
It's easy to misunderstand what Bayes theorem actually says, so I'll try
to explain it as simply as I can.
Suppose there are two events, A and B, and they influence each other.
If we find out that B has occurred, then the probability that A will
also occur depends not only on the probabilistic relationship between
A and B, but also on the probability that A will occur independently
of the information that B has occurred.
Here's an example. Suppose we know that sometimes after it rains, it
also snows. Suppose we go outside and see that it's snowing? What
does that tell us about whether it was raining earlier? If we know
the odds of it snowing after it rains, that's only part of the picture.
We also need to know the odds that it could just snow without raining.
If we have both of these pieces of information, then using Bayesian
statistics we can make a prediction about the likelihood that it was
raining earlier.
In the abstract case, if we don't know anything about whether or not
B occurred, then in a vacuum we may still make a prediction about the
probability that A will occur. In the example this corresponds to the
probability that it might have just rained today. This probability
distribution is called a "prior", as it's what we think the likelihood
of some event is "prior" to having any additional information that would
influence our opinion.
If we know B, or whether or not it's snowing in the example, that can change
our estimate of whether or not it might have been raining earlier. This
new probability distribution is called the "posterior" (meaning "after").
Yes, statistics students make jokes about this. On rare occasion they're
even funny. I bring this whole thing up because in the "Opponent Modeling"
paper I'm discussing, if you don't understand what "prior" and "posterior"
mean you'll be completely lost.
Sometimes Bayes' theorem sounds like simple if/then conditional
probabilities. It's more than that. This can be confusing. In any
case, this technique can be used to model poker opponents. If we know
what players show down and what percentage of the time they adopt various
betting strategies, we can make some estimates about how they would
play various hands. From that make predictions about what hand
ranges they might hold.
Introduction
The paper begins with the authors setting the stage for what
they're going to demonstrate. Assuming our opponent plays
sub-optimally, to maximize profit we must take two steps:
- Figure out our opponent's strategies.
- Determine a best response to their plays, including their mistakes.
Neither step is trivial, and both are discussed in this paper.
Several approaches to this problem are possible, and good solutions are
difficult. This difficulty has several causes. First, we won't ever
have complete knowledge of our opponents' strategy. This is due both to
the fact that we'll never have enough information to fill in all the
gaps, and it's unlikely that our opponent will continue to play the
way they always have.
Poker
The second section of the paper is more routine for poker players than
it is for mathematicians or artificial intelligence researchers. The
authors first define the characteristics of a generic hold 'em game: Each
player is dealt one or more hole cards, there will be one or more board
cards, there will be two or more rounds of betting, and between rounds
of betting one or more additional board cards will be revealed. The
authors are examining two-player limit Texas hold 'em, as has traditionally
been the focus of the CPRG folks.
In addition to looking at Texas hold 'em, they also examining an
abbreviated game, which they call "Leduc hold 'em". Leduc is a small
town near Edmonton, near the home of the University of Alberta and the
CPRG. This game is similar in purpose to Rhode Island hold 'em, which
was explained in
an
earlier article. The naming is essentially the same joke. I guess
the difference is that Canada doesn't have any truly small provinces.
For research purposes, the nice thing about Leduc (or Rhode Island)
hold 'em is that it has many of the same characteristics of Texas
hold 'em, but is small enough that optimal strategies can be
determined explicitly.
The authors then go on to discuss some of the difficulties in creating
good poker playing algorithms, but I won't discuss those here because
what they have to say is quite accessible in the paper.
Modeling the Opponent
When modeling the opponent, the authors assume their
opponent's strategy is "stationary", that is, it's not changing over
time. Of course, this is a bad assumption to make for poker players
in general. It precludes dealing with players who learn over time,
and it precludes adjusting effectively to players who change gears,
but it's a good place to start. As an extension of this, they also
assume hands are "i.i.d.", which stands for "independent and
identically-distributed". This means that in addition to not
having the effects of one hand influence subsequent hands, the
game is fair.
Before we can build a strategy for our opponent, we must come up with
a representation for the data we'll store about each hand in some
database. The authors explain that for each sample hand they're using
to derive an opponent strategy they're storing hole card information
for both players (when available), board cards, and the bet sequence
for each betting round.
The idea is that when we see a given hand being played, we can compare
it to previous hands in our database, and come up with a Bayesian
probability distribution for the types of hands our opponent might
have. This distribution is what is called the "posterior".
Responding to the Opponent
Having a distribution of what we think our opponent's possible hands
might be is only half the battle. Now we must compute a response to
this hand range. The authors consider several options:
Bayesian Best Response (BBR) - This is a direct computation of the
highest EV response given the expected distribution of all possible hole
cards. The authors point out that solving this is equivalent to solving
something called the
Expectimax algorithm.
The Expectimax algorithm is the solution to a game tree where we are picking
each node on the tree such that we expect to maximize our expectation for
the game. It's sort of a multi-round generalization of the Minimax
algorithm of elementary game theory. The problem with BBR is that it's
very difficult to compute, even if you have all the information you need.
So, for a "real" poker game, this isn't practical. The best we can do
is find approximations for BBR.
Max a posteriori response (MAP response) - "a posteriori" is Latin for
"from the latter", meaning "from effects to causes". In this method what
we do is determine the best possible result for our opponent given the
strategy they're playing, and then just calculate what our best response
is to that strategy. For complex games this isn't trivial either, but it's
much easier than computing a BBR solution.
Thompson's Response - Here we pick a strategy from our opponent's
distribution of previous hands weighted by Bayes theorem and play a
best response to that particular strategy. In some sense, with MAP
we find a counter-strategy to the worst-case scenario. With Thompson's
we find a counter-strategy to the most likely scenario.
Later in the paper the authors will compare each of these methods of
determining a response to an opponent's play.
Priors
What we call the "prior" is equivalent to what most poker players would
call an opponent's hand database. Without a good prior we can't make good
guesses about how our opponent's actions correspond with previous play,
and without this information we have no chance of calculating a
profitable response.
A good prior should have several properties. It should capture
the strategy of our opponent. It should be set up to make it easy to
calculate our posterior. It's structure should make it easy
to calculate good responses. Given all this, it should be as small
as possible. Obviously, some of these work in opposition to each other.
Generating a good prior is not simple, and I'm not aware of a general
approach to creating or improving a prior.
The authors explore different priors for the two different poker games.
For Leduc hold 'em they use a
Dirichlet distribution.
This is a special kind of probability distribution that is appropriate for
this situation. Unfortunately, I don't know how to explain the details of
why they chose this particular distribution in simple terms, so you'll just
have to trust the authors here. In any case, the game space is small enough
so they can take a good sample of an opponent's play that spans many possible
hand outcomes.
The second prior, which they use for the Texas hold 'em samples, is
what they call an "informed" prior. That is, a skilled player selects
what he or she feels is a subset of hands that adequately represents some
other player's strategy. They use these samples to create several parameters
to define how an opponent plays. These are described in Table 1 of the
paper. Many of these will seem quite familiar to anyone who has used
opponent data gathering software designed for online poker. The parameters
the authors use include the following:
| Parameter |
Measurement |
| r0 |
bet to check ratio |
| r1 |
raise to call ratio |
| b |
value bet to bluff ratio |
| f |
fold frequency |
| t |
slow play frequency |
Experimental Setup
Both Leduc hold 'em and Texas hold 'em were played against the
opponent. Unfortunately, we don't know much about this opponent,
nor do we know much about the hand history used to generate the
prior. I suppose the authors felt the details weren't that important,
but I would have liked some details.
Results were averaged over 1000 trials for Leduc hold'em and 280
trials for Texas hold'em. Each trial (what poker players would
call a session), consisted of 200 hands. For each trial
1000 strategies were sampled from the prior and used for
opponent modeling. Also in each trial, different Bayesian
methods, BBR, MAP, and Thompson's, were used to come up with a response
to each strategy.
These algorithms also played against two other
opponents: "Opti", which in Leduc hold 'em is the optimal strategy, and
a 'bot called "Frequentist", which is an adaptive opponent modeling
program that corresponds to the "Vexbot" opponent found in the
commercially available Poki's Poker Academy software.
Results
Leduc hold 'em:
The results of the various algorithms playing against the player
are shown in Figure 2. Since for Leduc
hold'em we can explicitly compute the best possible response against
any strategy, that result is shown in the top line. Needless to say,
none of the Bayesian strategies achieve this win rate. Of note, though,
is that all of the Bayesian methods converge to similar results quite
quickly. Moreover, all of them perform better than the Frequentist
non-Bayesian opponent modeling 'bot.
Note that all of these opponent modeling strategies produce a better
return than the Optimum strategy 'bot. By definition none of them played
"better" than Opti,
but all of them played more exploitively. While the results are impressive,
it would be easy to overstate their significance. Remember that our
Bayesian models are playing against opponents employing stationary
strategies from whom the prior was drawn. Clearly, though, these
methods have some merit.
I want to point out that in paragraph 4 of section 7.1 the authors
explain why they believe Frequentist didn't perform well. There is
some good information here that I won't repeat, because I can't
improve on what the authors said in that paragraph.
When the Bayesian models play against Opti, they all lose, which we
expect (actually, in the long run this has to be the case). The
performance of each of the Bayesian models is again comparable.
Also, once again the Bayesian 'bots do better than Frequentist.
The Bayesian models also win against Frequentist, although in this
case Opti wins more. We also see some small divergence in long-term
performance of each of the Bayesian models with MAP and Thompson's
outperforming BBR. This is unexpected, and the authors discuss why
this might be the case in the last two paragraphs of section 7.1.
To be honest, though, I find their justification a little unsatisfying.
Texas hold 'em:
For Texas hold'em we can't generate a Dirichlet prior, nor can we
solve BBR for a game this complex. I expect that we also can't solve MAP,
although the authors don't say so explicitly. In any case, this
limits the number of competitors to play against the the opponent
who generated the prior. These are Frequentist, Bayesian using the
Thompson's response strategy, and Opti.
I don't know what Opti represents in this case; the authors don't say.
Obviously, it can't represent a true optimal strategy as it does in
the Leduc hold 'em case. I presume it's the pseudo-optimal Opti algorithm
as discussed in their previous paper, "Approximating Game-Theoretic Optimal
Strategies for Full-Scale Poker", which I discussed in a
December '09 article.
In any case, both the Frequentist and the Bayesian opponents performed
comparably, with the Bayesian algorithm gaining a slight lead over
Frequentist late in the contest. Both beat the non-exploitive Opti
algorithm. The authors speculate that a 200 hand prior may not
have been enough for the Bayesian model to be able to assert a bigger
advantage over Frequentist.
Conclusions
This work shows that Bayesian methods of opponent modeling have
promise. Moreover, just a few hundred hands of data can provide useful
exploitive strategies, even for real poker games such as Texas hold 'em.
There are many areas of future work for this methodology that are likely
to be fruitful. One of these is research in coming up with better priors,
both in terms of strategy and in terms of representing an opponent
strategy in a minimal number of hands. Clearly, another avenue for
exploration is to make these models more effective against opponents
who vary their strategies.
|