LV Revealed
 
 

Commentary on "Approximating Game-Theoretic Optimal Strategies for Full-Scale Poker"


by Nick Christenson

This article originally appeared in the December 2009 issue of Two Plus Two Magazine.

Introduction

Over the last decade or so, the research group that has pushed furthest the frontiers of computer poker is the Computer Poker Research Group (CPRG) at the University of Alberta. The most visible member of the group is the lead author of the paper I will discuss in this month's column, Darse Billings.

The CPRG has written many vitally important papers advancing the state of the art in the way computers play poker. Moreover, they're not just producing ivory tower research results. Their efforts have led to several competitive poker applications. These include commercial software products such as Poker Academy Pro, based around the Poki poker software. They are also responsible for Polaris, a program which has performed capably against top poker professionals in competition.

Much of the theoretical underpinnings that have gone into the Polaris software are discussed in the paper, "Approximating Game-Theoretic Optimal Strategies for Full-scale Poker", which was published in the Proceedings of the 18th International Joint Conference on Artificial Intelligence in August, 2003.

The goal of the research underlying this paper is to improve the state-of-the-art in a classic man vs. machine situation. Specifically, the authors try to come up with better poker playing algorithms. Their goal is to advance computer skill in poker the way similar research efforts have advanced computer skill in checkers and chess.

In practice, this is difficult since poker has a very large "game space". That is, the enumeration of all possible options in a given hand is very large. In fact, with present computing resources, it's impossible to solve even a relatively simple "real" poker game, such as heads-up limit hold'em.

Researchers can take either of two approaches in dealing with overly complex games. First, we can simplify the game and solve the easier problem. Second, we can make simplifying approximations about the complete game. Neither of these techniques lead to the ultimate goal of game theory, and that is provably correct strategies. However, they do get us a step closer to that goal by providing insight about the full game and creating testable hypotheses, some of which will lead to improved strategies.

At the end of Section 1 in the paper, the authors claim the following results for their research: Their simplifications reduce the search space for the poker games they're considering by 11 orders of magnitude. Also, their new algorithms prove superior to other computer opponents. Moreover, these new algorithms can often play competitively with strong human poker players.

Game Theory

This section describes the group's general approach to writing poker software. Unlike games such as chess or checkers, poker is a game with hidden information (our opponent's cards) and random events. This makes the algorithms necessary to solve poker much different than for other complex games that have been studied in the past.

The standard method for solving games is to make a tree of all possible decisions branching out from decision points. This is then converted to a system of mathematical equations, which we then solve. However, because of the large number of possible decisions combined with the number of decision points, for any popularly played game of poker we cannot solve these equations directly. Therefore, if we want to tackle a real poker game, we need to make some approximations.

The third section of the paper provides the reader background on the game of Texas hold 'em. Needless to say, such an introduction isn't necessary for readers of this article. However, it is important to note that the game the CPRG folks are trying to solve here is specifically heads-up limit Texas hold 'em. Since this is the least complex way Texas hold 'em, can be played, it is a logical place to start.

Abstractions

In this section, the authors list a possible set of abstractions that can be adopted to reduce the complexity of the game to more manageable levels without rendering the resulting strategies invalid. Some of these are "free", that is, they don't have any strategic effect, some are more dangerous. Some of these abstractions are adopted by the authors in the programs they developed, some aren't.

The first abstraction they consider is what they call "suit equivalence isomorphism". An "isomorphism" in this context is when one changes something that doesn't change any results. An example would be that playing A♣ K♣ with a K♠ 8♥ 3♦ flop will always be identical to how one approaches A♦ K♦ with a flop of K♣ 8♠ 3♥. Basically, this is a fancy math way of saying, "If I'm dealt two suited or unsuited cards, it doesn't matter which suits they are except in context with the board." This is something every poker player already knows. This abstraction does not affect strategy, and it will be used in the authors' algorithms.

The authors also discuss "rank equivalence" and "rank near equivalence". The authors don't provide any examples, but one possible example of "rank equivalence" is that a hand of A♠ 7♥ with a board of 9♠ 9♣ 8♥ 8♦ K♠ has the same equity as a hand of A♠ 3♥ with the same board. An example of "rank near equivalence" might be considering starting hands such as K7s and K6s as the same. They make the same number of trips or better hands. The only difference is that K7 is slightly more likely to make top pair hands, and K6 is more likely to make bottom pair hands. However, I expect most people would agree that considering these hands as having equivalent strength won't hurt our equity very much, if the difference is even measurable. Rank equivalence abstractions won't change strategy, rank near equivalence might alter our strategy a little bit, but only a little bit and only in rare occasions. The authors use both abstractions.

The next abstraction the authors discuss is "deck reduction". It becomes a lot easier to solve poker math problems if we reduce the number of cards in the deck. A radical example of "deck reduction" is the AKQ game discussed in Mathematics of Poker. Since this would alter the way the game is played too greatly, the authors don't use this form of abstraction in the poker-playing computer programs they discuss in this paper.

Similarly, one can also reduce the number of cards in a player's hand or that come on the board. An example that the authors cite of research that uses this sort of abstraction is Shi and Littman's paper discussing the abbreviated game of Rhode Island hold 'em. The paper they reference was discussed in an article I wrote in the June, 2009 issue of TwoPlusTwo Magazine. The authors use a restricted form of this abstraction which will be discussed in a little more detail later on.

If one adopts abstractions which might affect the game strategy, any solution to the game that includes these abstractions cannot be said to be an "optimal" solution to the game. However, if the abstractions are chosen judiciously, then the derived strategies should be close to optimal, or "pseudo-optimal", as the authors call them.

Next the authors discuss how they implemented each of these abstractions in practice. First, regarding betting round reductions, they restricted their strategies to considering the effects of a bet and two raises only on each street. That cut down by two the number of decision nodes per betting round. They ran some simulations using the restricted number of bets and don't feel that this affects strategy all that much. However, if they reduced the possible number of bets and raises to one bet and one raise, that would have altered strategies substantially, so that additional level of abstraction would be inappropriate.

They also came up with an interesting method of cutting down the number of betting rounds. Cutting a betting round out completely would have changed strategies too greatly to be useful as an abstraction. However, they can make some less drastic simplifications.

First, when considering pre-flop play only, they looked at eliminating late betting rounds. Instead of betting on these later rounds, they award the pot on a percentage basis based on each hand's equity at that time. They create one program that eliminates just the river betting round in this manner, and another that eliminates both the turn and river betting round. This cuts down the complexity for pre-flop decisions.

Another abstraction one can consider is making the strategies on each betting round independent of each other. Again, this was suggested in the paper by Shi and Littman, and, again, the ramifications of this were discussed in my June article. The authors believe that this abstraction is too problematic, as it eliminates intentional multi-street strategies.

Another abstraction method mentioned in the Shi and Littman paper is "bucketing". That is treating hands of similar strengths that play similarly as equivalent. The authors also give a nod to Sklansky and Malmuth's Hold 'em Poker for Advanced Players, and their nine groups for starting hands as another source expressing a similar concept. In this implementation they create six buckets for starting hands. Five of these buckets categorize hands by intrinsic strength, with a sixth bucket set aside for low-value but high-potential hands. The authors don't mention which starting hands go in which bucket, but since they mention high-potential hands being those hands that tend to make flushes and straights, we can assume that suited connectors, bad suited aces, and hands such as 98off might be in this bucket, perhaps among others.

The authors note that going from 6 to 7 to 8 buckets doesn't seem to make much of a difference. They also note that many successful human players probably don't make any more distinctions between types of hands than this.

Now that the authors have described how each of these betting rounds work, all that remains is to tie them together with a list of the probabilities of transitioning from one set of possibilities to another, and we have a pseudo-optimal solution to the game.

Testing

In order to explore the ramifications of some of the abstraction decisions the authors have made, they created a few slightly different versions of the poker software that uses these methods. They also employed some other poker 'bots, including Poki, Anti-Poki (a 'bot created specifically to exploit Poki's weaknesses), Adapti (a 'bot designed to detect and exploit opponent weaknesses), and two trivial 'bots called "Always Call" and "Always Raise" who play exactly the way one would expect. All of these programs play at least 20,000 hands against each other in a round robin fashion. Their win rates against each other can then be compared.

It's important to note that a game theoretically optimal poker strategy can be more properly described as playing "not to lose" than as "playing to maximize gain". A player who plays optimally will not lose against any game, but may not necessarily win at a high rate against anyone, even an especially weak player. The way to maximize one's win rate against a bad player is to play exploitively, that is, in a non-optimal way, but in such a way that takes maximum advantage of an opponent's mistakes. Overall, Section 5.1 of the paper does a really good job of presenting the issues involved in optimal vs. exploitive strategies in just a few paragraphs.

After the round-robin, the two biggest winners were the two most sophisticated programs that implemented the algorithms discussed in the paper. However, the projected number 2 program achieved the best results. The authors attribute this to some programming errors that were discovered after the fact. Interestingly, although these top two programs never lost a match (except against each other), they didn't perform all that well against the especially weak programs in the competition. This lines up with the discussion in the previous paragraph.

One interesting result of note is the performance of the program they called "PsiOpti0". This program played a pretty sophisticated post-flop game, but its pre-flop strategy is simply "always call". Yet, the only two matches it lost were to the two more sophisticated programs that implemented a more complete pseudo-optimal strategy. Plus, it did almost as well as the most sophisticated programs against the least skilled opponents. This provides some credence to the notion that if one plays well from the flop on, it's possible to be a successful poker player even if one plays badly pre-flop.

Next, the top two programs are pitted against human opponents of varying skill. Skill levels ranged from complete novices to world class players specializing in short-handed limit hold 'em games. Expert players were able to beat the program, but experienced but non-expert players played about break-even with the program, and less experienced players generally lost to it. This is no small accomplishment, especially considering that we're talking about a programs and computers that are at the present time more than 6 years old.

The authors make some observations about these matches. First, experienced poker players indicated that the program played in a different manner than they expected. Second, in these matches it becomes clear that against many types of opponents, one of the program's biggest advantages is that it doesn't go on tilt and doesn't get tired, whereas these problems plague their human opponents.

Recall that the programs described in this paper don't do any opponent modeling. They never attempt to exploit their opponents. The authors expect that adding opponent modeling will improve performance for two reasons: First, nearly all human players have exploitable weaknesses. Second, even slight variations in play caused by the program adapting to its opponent would make it much harder for a human to get a line on the computer's play. This makes it more difficult to find weaknesses and exploit them with an appropriate counter-strategy. The authors claim that as it turns out, most humans can be accurately modeled with only a small amount of information.

Conclusions

The best machine players in the world are getting much better at playing poker. The 2008 Polaris results are a good indication of this. We're probably not all that far away from having computers that can beat all humans in the restricted domain of playing heads-up limit Texas hold 'em.

The addition of opponent modeling, advancements in computing power, and refinements in the algorithms used will lead to programs that will require the best humans to improve their games in order to just stay even. In this way, the poker world will likely become analogous to the world of chess. We're still a long way from being able to completely solve the game of Texas hold 'em, but pseudo-optimal strategies will probably get us very close, perhaps in an unexpectedly short span of time.