LV Revealed
 
 

Commentary on "Pattern Classification in No-Limit Poker: a Head-Start Evolutionary Approach"

by Nick Christenson

This article originally published in the October 2010 issue of TwoPlusTwo Magazine.

Background

This month I'm discussing the paper titled, "Pattern Classification in No-Limit Poker: a Head-Start Evolutionary Approach". It was written by Brien Beattie, Garrett Nicolai, David Gerhard, and Robert J. Hilderman from the University of Regina in Canada, and was published in the Proceedings of the 20th Canadian Conference on Artificial Intelligence in Montreal in May of 2007.

Throughout this article series I've discussed many approaches to designing poker playing algorithms. Among these are a game theoretic approach, a heuristic (rule-based) approach, and a Bayesian approach. In this paper we encounter yet another method for constructing a poker playing computer program. This time it's an evolutionary methods approach.

What are evolutionary methods? In nature evolution occurs by several mechanisms: selection, whereby an organism that does well compared to others will thrive; recombination, merging the characteristics of several successful organisms to create a hybrid; and mutation, random changes that will improve some organisms and worsen others. We can replicate these phenomena in the form of a computer program, and then we can apply them to a set of strategies these programs use to play poker.

Introduction

After some brief poker background, the paper discusses the "classification problem". This is a general problem in decision programming where we're trying to place objects into categories based on their properties. In the context of poker, what we're usually talking about is whether a given situation is a good poker situation for a given player or a bad one. The authors talk about this in terms of goals in poker. While their analysis is hardly new ground for anyone who is likely to take the time to read this column, it got me to think about poker in some new ways.

The yardstick stick by which we measure success in poker is who makes the most money. Of course, most players have other goals, and that's fine, but ultimately poker is about winning money from our opponents. We're not in it to see who wins the most pots. If that were the measure of a successful poker player, then the right strategy would be to call down every hand. We're also not in it to with the largest possible percentage of the pots we play. If that were the case, we'd only play our strongest hands. We're also not keeping score by some other more complex method, such as number of pots won minus the number of pots lost. If that were the case, we'd play all hands that had a greater than 50% chance of winning, and it would never be correct to slow pay.

If each of these non-goals represented our aims as poker players, we would be led to very different playing strategies. If the strategy we adopt does not align with our goals, we can't expect good results. When we try to create poker playing software, its easy for us to accidentally write our code in a way that doesn't coincide with our goals. As it turns out, the object of maximizing net earn in the long run is quite difficult to reflect in a set of rules. It means that sometimes we should be willing to give up some profit now if we can win more later. In some situations we might play a hand differently if we were beginning a long session than we would if it were our last hand of the evening. These factors are quite difficult to parameterize.

The authors also discuss the dangers of predictability. I get the feeling from this exposition they're not all that well versed in game theory. A game theoretic approach will create unpredictability through optimal bluffing. Once one adopts this approach the predictability issues mostly take care of themselves.

In the next subsection the authors discuss previous work in this area. They mention the CPRG work at the University of Alberta and their Poki program specifically. I think this includes a misconception about Poki which they relate at the end of the first paragraph:

Unlike our system, much of the decision making in Poki is rule based, and thus, the best choice will be made every time, making the player predictable to some extent.

It is true that Poki is primarily heuristic, but a heuristic approach can still contain random elements. Poki is capable of bluffing, and of playing a similar situation in different ways, but more importantly, it creates deception by playing different situations the same way. Being unpredictable in poker does not require someone to play the same hands differently at different times under similar circumstances. There's more than one way to be unpredictable in poker, and I don't think the authors understand this.

System Architecture

The algorithm the authors are developing is designed to play no-limit Texas hold'em. They're also trying to design something that can play against arbitrary numbers of opponents. There are good reasons that other research teams have chosen to tackle the heads-up game exclusively, including those teams, like the CPRG, that have a lot more real world poker experience than the authors of this paper. I'm not sure this team realizes just what they've bitten off.

Bet sizing is a key element of big bet poker. For this project, the authors decided to allow their agents to select from three different bet sizes, expressed as a fraction of the agent's stack. First, since the authors omit any discussion of blinds or antes, it's awfully difficult to evaluate these bet sizes. Second, anyone reading this article who has also read the book I wrote with Russ Fox, Winning Strategies for No-limit Hold'em, can anticipate my reaction to the notion that the authors of this paper choose to correlate bet size with hand strength. Russ and I firmly believe that bet size is dictated by many factors, but that hand strength shouldn't be one of them. For this to be the one and only criterion used in bet sizing seems entirely inappropriate to me.

The authors discuss the three main factors they use in evaluating each poker situation: hand value, risk, and opponent aggressiveness. The notion of hand value seems pretty self-explanatory, but it's not at all trivial, and the paper doesn't provide much in the way of details. How are hand strengths evaluated? Is there a table lookup or is it evaluated as the hand develops? How are hand strengths weighted based on the board cards? We don't know the answers to any of these questions, and they make a great deal of difference.

Also, it seems from the text of the paper that they're assuming that their agents' opponents have random hands, which is a problematic assumption, especially if those opponents are calling or raising. This is a place where a Bayesian approach would be beneficial, such as the one advocated by the CPRG folks in the paper I discussed in the February 2010 article.

In essence, the authors define risk as the chance of winning the pot. Again, this seems straightforward enough, but there's much more to it than is provided in this paper. Specifically, how is this evaluated against multiple opponents? How do they weight the increased gain from multiple players against the higher hand strength typically needed to take down the pot? We aren't told. I'm not certain the authors' thoughts on this are all that sophisticated.

They also discuss the concept of implied odds, stating that it is inappropriate in this case because it doesn't take into consideration the size of bets. This is nonsense. However, they are correct to bring up the difficulty in calculating implied odds due to the problems of evaluating the future size of the pot and the probability that an opponent will call. They describe a weighting function they use to account for this. While the approach seems reasonable, they notably don't provide any justification for their choices.

Once again, when it comes to the topic of aggressiveness, the authors provide little in the way of details. Basically, this is a stripped down opponent model which, I presume, would measure something like the ratio of bets and raises to calls and/or folds. However, neither the criteria that go into this factor nor the range of values it might produce are mentioned. Also, in their discussion of equations 3 and 4, they mention the variable "L", which is never used, and don't give ranges for values like G, which is important. I suspect G falls within the range of (0,1), and, consequently, A and w will fall in the [0,1] range, but I don't know this for certain. We also have no idea what the algorithm's idea of a "proper" number for A is. In any case, I promise that nobody can reproduce the algorithms they use on this topic from the description in this paper.

Training and Testing Data

An evolutionary learning algorithm can either evolve from a zero-knowledge starting state, or it can be seeded with some data. The authors chose the latter approach. The problem is, where could they find a source of that data? Public records of poker games typically don't reveal everyone's hole cards, so the Regina folks used hands and plays from televised poker. Yikes. Of course, with the possible exception of High Stakes Poker, the hands and plays shown on television won't be representative of a random selection of poker hands. I'm not sure such a skewed subset will be more useful than starting from scratch.

Implementation

The authors use what's called a "k-nearest neighbor" estimation to fill in the gaps in their playing charts when it comes to evaluating folds, calls, and raises. The k-NN method is a very simple machine learning algorithm that's commonly used in these sorts of situations. Conceptually, it works by assigning a value to a given set of conditions that doesn't have a known value (from the hand database or from previous experience) by assigning it the majority value from its nearest neighbors. It's a good place to start, and it's likely good enough for what they're doing here.

Next they take several different instances of their algorithm, called agents, and have them play a tournament against each other. Why a tournament? I have no idea, and I think that having them play a fixed number of hands as a cash game against each other would be a better approach. In any case, those agents that do well move on to the next round unaltered. These winners are also used as parents for recombination of various agents, and as the basis for agents "afflicted" with mutations.

In many evolutionary approaches, a mutation would be implemented by changing some value in the action probability tables. However, the authors felt that this would be too fine grained to have the desired effect, and I agree. So what they did was change the play for a range of situations with what they call a "hill function". Qualitatively, think of the fine grained approach as possibly changing how the algorithm plays 86 suited in second position against three opponents and the hill function approach to change how the agent plays suited one gappers in middle position against more than one opponent. At the very least, when varying poker strategy the effects of the latter approach will be easier to evaluate.

Comparison and Results

They create several different kinds of agents and have them duke it out. The first kind of agent they call "data-driven". It contains the knowledge accumulated from the televised poker hands but does no learning. The second agent is one that uses the same information, but then uses the evolutionary learning algorithm. The third agent also uses evolutionary learning methods, but it starts out with no poker knowledge.

After several generations of iteration, Figure 6 demonstrates that the performance of best evolutionary algorithms of a particular generation improve over those of previous generations. It doesn't take long before the evolutionary data-driven agent plays much better than the non-evolutionary data-driven agent. The improvement seems to come quickly at first, and then slows down, which shouldn't be surprising for any learning method. However, I'm a bit concerned by the y-axis of the graph. What is the nature of the top 90 algorithms that aren't represented in Figure 6?

The authors next talk some poker strategy. They seem surprised that folding would be the dominant strategy. Of course, they don't tell us how many players are at each table, but if it's more than three this result won't surprise any serious poker player. Raises occur with good hands. Calls occur with what they call "low risk" hands, although I'm not exactly sure how this translates into more familiar poker terms.

I find it hard to learn much from the figures in the paper. The exception to this is Figure 7. Note that the learning algorithm has come up with a strategy that consists of: lots of folding, a fair bit of raising, and very little calling. This seems quite reasonable. It's difficult to tell from the figures, but it also looks like the algorithm does at least some bluffing with its worst hands, which is also promising.

The researchers ran tournaments comparing these three agents along with two other trivial ones: an "always fold" agent and one that acts randomly. Table 1 shows these results, but to honest, I can't tell you what it says. The data make absolutely no sense to me.

Conclusions and Further Work

The authors admit that they don't know if their agents are good poker players or not. I think they're being a little disingenuous here. I suspect that even the best of their agents are poor players, and that they know it. Nonetheless, they demonstrate that evolutionary methods can lead to an improvement in algorithmic poker performance. I don't see any reason to think that if they started with a better framework that these methods couldn't be effective.

There is a lot of work to be done to turn this work into a worthy poker opponent. The authors note that they've performed little optimization of the evolutionary parameters. They also suggest that further evolution of the agents should continue to bear fruit.

My suggestions on the next steps they might take are a little different. First, I suggest they revisit their bet sizing algorithms. They should also incorporate game theoretic bluffing. Using Bayesian methods to adjust perceptions of opponent hand ranges would be a great addition, but would require significant work. Also, they could sure use some better seed data to jump start their learning algorithms. There are probably some other implementation issues that could use fixing but that we don't know about because the paper doesn't provide sufficient details.

Conclusion

Once again, to a serious student of poker this probably looks like a failure. As poker players, we tend to think of the success of a poker algorithm in terms of how well it plays. The goal of the authors is to demonstrate that their academic technique could be applied to a new kind of problem. It doesn't matter whether the final product actually ends up being any good at poker. They're only interested in demonstrating the viability of their technique. I know I have a tough time getting my head around this at times.

Nonetheless, we can add evolutionary methods to the list of legitimate ways me might improve a poker algorithm. Also, I found their unorthodox, from a poker standpoint, take on poker goals to stimulate my thinking about the issue. So, for me the paper is worth studying even if the resulting agent would get crushed by an average 3-6 player from a typical public card room.