LV Revealed
 
 

Commentary on "Bayes' Bluff: Opponent Modeling in Poker"


by Nick Christenson

This article originally appeared in the February 2010 issue of TwoPlusTwo Magazine.

Preliminaries

Last time around I discussed a paper on optimal strategies for poker from the University of Alberta's Computer Poker Research Group (CPRG). This month I examine a paper from the same group on the issue of opponent modeling. It is titled, "Bayes' Bluff: Opponent Modeling in Poker". It was presented at the 21st Conference on Uncertainty in Artificial Intelligence (UAI) in 2005.

There are two ways to approach creating a winning poker algorithm, and both are important in understanding the game. The first is trying to play the game optimally, that is, trying to "solve" the game. However, as was discussed in the last article in this series, by no means does playing optimally guarantee that one will win the most money. It just guarantees you won't lose (or, in the case of negative-sum games, you will do no worse than lose your share of the rake.) In order to maximize profits against opponents that don't play optimally, one must understand their strategy and then compute a response against it that maximizes profit. The first step in doing this, understanding opponents' strategy, is called "opponent modeling".

Bayesian Poker

One of the ways to model an opponents strategy is using a mathematical technique called Bayesian statistics. This is based around Bayes theorem. It's easy to misunderstand what Bayes theorem actually says, so I'll try to explain it as simply as I can.

Suppose there are two events, A and B, and they influence each other. If we find out that B has occurred, then the probability that A will also occur depends not only on the probabilistic relationship between A and B, but also on the probability that A will occur independently of the information that B has occurred.

Here's an example. Suppose we know that sometimes after it rains, it also snows. Suppose we go outside and see that it's snowing? What does that tell us about whether it was raining earlier? If we know the odds of it snowing after it rains, that's only part of the picture. We also need to know the odds that it could just snow without raining. If we have both of these pieces of information, then using Bayesian statistics we can make a prediction about the likelihood that it was raining earlier.

In the abstract case, if we don't know anything about whether or not B occurred, then in a vacuum we may still make a prediction about the probability that A will occur. In the example this corresponds to the probability that it might have just rained today. This probability distribution is called a "prior", as it's what we think the likelihood of some event is "prior" to having any additional information that would influence our opinion.

If we know B, or whether or not it's snowing in the example, that can change our estimate of whether or not it might have been raining earlier. This new probability distribution is called the "posterior" (meaning "after"). Yes, statistics students make jokes about this. On rare occasion they're even funny. I bring this whole thing up because in the "Opponent Modeling" paper I'm discussing, if you don't understand what "prior" and "posterior" mean you'll be completely lost.

Sometimes Bayes' theorem sounds like simple if/then conditional probabilities. It's more than that. This can be confusing. In any case, this technique can be used to model poker opponents. If we know what players show down and what percentage of the time they adopt various betting strategies, we can make some estimates about how they would play various hands. From that make predictions about what hand ranges they might hold.

Introduction

The paper begins with the authors setting the stage for what they're going to demonstrate. Assuming our opponent plays sub-optimally, to maximize profit we must take two steps:

  1. Figure out our opponent's strategies.
  2. Determine a best response to their plays, including their mistakes.
Neither step is trivial, and both are discussed in this paper.

Several approaches to this problem are possible, and good solutions are difficult. This difficulty has several causes. First, we won't ever have complete knowledge of our opponents' strategy. This is due both to the fact that we'll never have enough information to fill in all the gaps, and it's unlikely that our opponent will continue to play the way they always have.

Poker

The second section of the paper is more routine for poker players than it is for mathematicians or artificial intelligence researchers. The authors first define the characteristics of a generic hold 'em game: Each player is dealt one or more hole cards, there will be one or more board cards, there will be two or more rounds of betting, and between rounds of betting one or more additional board cards will be revealed. The authors are examining two-player limit Texas hold 'em, as has traditionally been the focus of the CPRG folks.

In addition to looking at Texas hold 'em, they also examining an abbreviated game, which they call "Leduc hold 'em". Leduc is a small town near Edmonton, near the home of the University of Alberta and the CPRG. This game is similar in purpose to Rhode Island hold 'em, which was explained in an earlier article. The naming is essentially the same joke. I guess the difference is that Canada doesn't have any truly small provinces. For research purposes, the nice thing about Leduc (or Rhode Island) hold 'em is that it has many of the same characteristics of Texas hold 'em, but is small enough that optimal strategies can be determined explicitly.

The authors then go on to discuss some of the difficulties in creating good poker playing algorithms, but I won't discuss those here because what they have to say is quite accessible in the paper.

Modeling the Opponent

When modeling the opponent, the authors assume their opponent's strategy is "stationary", that is, it's not changing over time. Of course, this is a bad assumption to make for poker players in general. It precludes dealing with players who learn over time, and it precludes adjusting effectively to players who change gears, but it's a good place to start. As an extension of this, they also assume hands are "i.i.d.", which stands for "independent and identically-distributed". This means that in addition to not having the effects of one hand influence subsequent hands, the game is fair.

Before we can build a strategy for our opponent, we must come up with a representation for the data we'll store about each hand in some database. The authors explain that for each sample hand they're using to derive an opponent strategy they're storing hole card information for both players (when available), board cards, and the bet sequence for each betting round.

The idea is that when we see a given hand being played, we can compare it to previous hands in our database, and come up with a Bayesian probability distribution for the types of hands our opponent might have. This distribution is what is called the "posterior".

Responding to the Opponent

Having a distribution of what we think our opponent's possible hands might be is only half the battle. Now we must compute a response to this hand range. The authors consider several options:

Bayesian Best Response (BBR) - This is a direct computation of the highest EV response given the expected distribution of all possible hole cards. The authors point out that solving this is equivalent to solving something called the Expectimax algorithm. The Expectimax algorithm is the solution to a game tree where we are picking each node on the tree such that we expect to maximize our expectation for the game. It's sort of a multi-round generalization of the Minimax algorithm of elementary game theory. The problem with BBR is that it's very difficult to compute, even if you have all the information you need. So, for a "real" poker game, this isn't practical. The best we can do is find approximations for BBR.

Max a posteriori response (MAP response) - "a posteriori" is Latin for "from the latter", meaning "from effects to causes". In this method what we do is determine the best possible result for our opponent given the strategy they're playing, and then just calculate what our best response is to that strategy. For complex games this isn't trivial either, but it's much easier than computing a BBR solution.

Thompson's Response - Here we pick a strategy from our opponent's distribution of previous hands weighted by Bayes theorem and play a best response to that particular strategy. In some sense, with MAP we find a counter-strategy to the worst-case scenario. With Thompson's we find a counter-strategy to the most likely scenario.

Later in the paper the authors will compare each of these methods of determining a response to an opponent's play.

Priors

What we call the "prior" is equivalent to what most poker players would call an opponent's hand database. Without a good prior we can't make good guesses about how our opponent's actions correspond with previous play, and without this information we have no chance of calculating a profitable response.

A good prior should have several properties. It should capture the strategy of our opponent. It should be set up to make it easy to calculate our posterior. It's structure should make it easy to calculate good responses. Given all this, it should be as small as possible. Obviously, some of these work in opposition to each other. Generating a good prior is not simple, and I'm not aware of a general approach to creating or improving a prior.

The authors explore different priors for the two different poker games. For Leduc hold 'em they use a Dirichlet distribution. This is a special kind of probability distribution that is appropriate for this situation. Unfortunately, I don't know how to explain the details of why they chose this particular distribution in simple terms, so you'll just have to trust the authors here. In any case, the game space is small enough so they can take a good sample of an opponent's play that spans many possible hand outcomes.

The second prior, which they use for the Texas hold 'em samples, is what they call an "informed" prior. That is, a skilled player selects what he or she feels is a subset of hands that adequately represents some other player's strategy. They use these samples to create several parameters to define how an opponent plays. These are described in Table 1 of the paper. Many of these will seem quite familiar to anyone who has used opponent data gathering software designed for online poker. The parameters the authors use include the following:

Parameter Measurement
r0 bet to check ratio
r1 raise to call ratio
b value bet to bluff ratio
f fold frequency
t slow play frequency

Experimental Setup

Both Leduc hold 'em and Texas hold 'em were played against the opponent. Unfortunately, we don't know much about this opponent, nor do we know much about the hand history used to generate the prior. I suppose the authors felt the details weren't that important, but I would have liked some details.

Results were averaged over 1000 trials for Leduc hold'em and 280 trials for Texas hold'em. Each trial (what poker players would call a session), consisted of 200 hands. For each trial 1000 strategies were sampled from the prior and used for opponent modeling. Also in each trial, different Bayesian methods, BBR, MAP, and Thompson's, were used to come up with a response to each strategy.

These algorithms also played against two other opponents: "Opti", which in Leduc hold 'em is the optimal strategy, and a 'bot called "Frequentist", which is an adaptive opponent modeling program that corresponds to the "Vexbot" opponent found in the commercially available Poki's Poker Academy software.

Results

Leduc hold 'em:

The results of the various algorithms playing against the player are shown in Figure 2. Since for Leduc hold'em we can explicitly compute the best possible response against any strategy, that result is shown in the top line. Needless to say, none of the Bayesian strategies achieve this win rate. Of note, though, is that all of the Bayesian methods converge to similar results quite quickly. Moreover, all of them perform better than the Frequentist non-Bayesian opponent modeling 'bot.

Note that all of these opponent modeling strategies produce a better return than the Optimum strategy 'bot. By definition none of them played "better" than Opti, but all of them played more exploitively. While the results are impressive, it would be easy to overstate their significance. Remember that our Bayesian models are playing against opponents employing stationary strategies from whom the prior was drawn. Clearly, though, these methods have some merit.

I want to point out that in paragraph 4 of section 7.1 the authors explain why they believe Frequentist didn't perform well. There is some good information here that I won't repeat, because I can't improve on what the authors said in that paragraph.

When the Bayesian models play against Opti, they all lose, which we expect (actually, in the long run this has to be the case). The performance of each of the Bayesian models is again comparable. Also, once again the Bayesian 'bots do better than Frequentist.

The Bayesian models also win against Frequentist, although in this case Opti wins more. We also see some small divergence in long-term performance of each of the Bayesian models with MAP and Thompson's outperforming BBR. This is unexpected, and the authors discuss why this might be the case in the last two paragraphs of section 7.1. To be honest, though, I find their justification a little unsatisfying.

Texas hold 'em:

For Texas hold'em we can't generate a Dirichlet prior, nor can we solve BBR for a game this complex. I expect that we also can't solve MAP, although the authors don't say so explicitly. In any case, this limits the number of competitors to play against the the opponent who generated the prior. These are Frequentist, Bayesian using the Thompson's response strategy, and Opti.

I don't know what Opti represents in this case; the authors don't say. Obviously, it can't represent a true optimal strategy as it does in the Leduc hold 'em case. I presume it's the pseudo-optimal Opti algorithm as discussed in their previous paper, "Approximating Game-Theoretic Optimal Strategies for Full-Scale Poker", which I discussed in a December '09 article.

In any case, both the Frequentist and the Bayesian opponents performed comparably, with the Bayesian algorithm gaining a slight lead over Frequentist late in the contest. Both beat the non-exploitive Opti algorithm. The authors speculate that a 200 hand prior may not have been enough for the Bayesian model to be able to assert a bigger advantage over Frequentist.

Conclusions

This work shows that Bayesian methods of opponent modeling have promise. Moreover, just a few hundred hands of data can provide useful exploitive strategies, even for real poker games such as Texas hold 'em.

There are many areas of future work for this methodology that are likely to be fruitful. One of these is research in coming up with better priors, both in terms of strategy and in terms of representing an opponent strategy in a minimal number of hands. Clearly, another avenue for exploration is to make these models more effective against opponents who vary their strategies.