LV Revealed
 
 

Commentary on Computing an Approximate Jam/Fold Equilibrium for 3-player No-Limit Texas Hold'em Tournaments

by Nick Christenson

This article was originally published in the May 2011 issue of TwoPlusTwo Magazine.

Background

The paper I'm discussing this month is titled "Computing an Approximate Jam/Fold Equilibrium for 3-player No-Limit Texas Hold'em Tournaments". It was written by Sam Ganzfried and Tuomas Sandholm from Carnegie Mellon University. The paper was included in the Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008) in Estoril, Portugal from May 2008. It's available online at: http://www.cs.cmu.edu/~sandholm/3-player%20jam-fold.AAMAS08.pdf.

This paper comes out of a group at CMU who has emerged as a top rival to the University of Alberta group for pre-eminence in the field of computer poker, especially regarding no-limit hold 'em. The author, Sandholm, was also featured in a recent interview with National Public Radio on the issue of bots in online poker.

Introduction

As with the last article in this series, I'm looking at the special issues involved in finding strategies for 3-player poker. Specifically, the authors are looking at 3-player no-limit hold 'em. They constrain their problem to a given total number of chips in play and restrict strategy decisions to going all-in or folding pre-flop.

Ganzfried and Sandholm make a reference to another paper where this question is solved for 2-player no-limit hold 'em, and they discuss the techniques used to arrive at that solution. The authors also make the usual caveats about the increase in complexity in going from a 2-player to 3-player game and that in the 3-player case there is no guarantee that an equilibrium solution even exists.

Jam/Fold Strategies and Heads-up Results

Section 2 of the paper covers no-limit hold 'em. Readers of the paper may need this information, but readers of this article won't. The last paragraph of this section covers sit-n-go tournaments. The tournament results of this paper have relevance in that the problem discussed in the paper is the situation sit-n-go players (or players of any tournament) face once it gets down to the final three. In fact, the authors have gone out of their way to try to make the results as real-life relevant as possible, using blinds, antes, payouts, and stack sizes similar to those used in actual online events.

In the first paragraph of section 3 the authors justify the restriction to jam/fold strategies. Of course, anyone who has read any of the recent literature on tournament poker knows that there are many situations in which adopting a jam/fold strategy makes sense, and there are many more where even if it isn't optimal, it's still a pretty strong strategy. So, once again, this motivation is primarily for an audience less informed about the particulars of poker strategy.

The authors again refer to the paper by Miltersen and Sorensen that solves this problem for the 2-player game. Their synopsis of the key results from that paper go by a little fast, but in summary, Miltersen and Sorensen find a 2-player near-optimal solution using what looks a lot like the regret minimization technique discussed in the March 2011 article in this series. They can also guarantee that restricting this strategy to jam/fold creates only a very small deviation from a less restrictive strategy. They also point out that the optimal strategy for a cash game is almost identical to what it would be in a tournament. Honestly, I expect that it would be identical in the 2-player case, so I don't know what's going on here.

Independent Chip Model

Section 4 begins with an explanation of the Independent Chip Model (ICM). Again, this is something that readers of this article are probably quite familiar with. If not, the paper refers to the twoplustwo.com single table tournament FAQ as a source on the subject, which I find to be cool.

Just as a refresher, tournament play is influenced by the fact that when played multi-way, optimal strategy is different from what it would be in cash games because tournament chips have a non-linear value. The authors don't have a way to deal with this non-linearity with complete generality, so they adopt ICM as a widely accepted approximation for converting tournament stack sizes to their expected value.

Our Two-Level Iterative Algorithm for Equilibrium Finding

This section covers the algorithm they use to come up with their strategy. This is shown in pseudo-code form as Algorithm 5.1. Again, while they don't use this language, it looks a lot like they're using regret minimization to iteratively find an ε-Nash equilibrium for the game.

Unfortunately, this algorithm is not explained nearly as well as I would like. While δ is given a value, what it is measuring isn't explained in this paper. Further, we don't have a really good explanation of what the outer loop is counting. It seems, though, that the outer loop is repeatedly measuring the payoffs to each player and terminates when the difference in successive runs falls below the δ threshold.

The inner loop represents the implementation of a learning algorithm that calculates an updated best response for each player to the average strategy adopted by its opponents thus far. It loops until the maximum regret for a given strategy also falls below some threshold. Again, check out the March 2011 article in this series for a more complete explanation of regret minimization.

This approach would be guaranteed to converge toward an equilibrium in the 2-player game, but no such guarantee exists here. Fortunately, the authors are able to prove via Theorem 1 that if this method does converge, that it will converge to a Nash equilibrium.

Implementation Discussion

With three players and five board cards there are a lot of possible card combinations over which the algorithm must iterate. Obviously, the authors want to simplify the problem as much as possible, both using abstraction techniques we've seen before, such as those discussed in the article from this series published in December 2009, and those that are new.

From the folio of familiar abstraction methods the authors use suit isomorphism: the idea that once they've dealt with a state in their game space with certain suits represented, that they can rotate through the equivalent suit combinations and infer the same results.

In the list of new mechanisms, they use something I choose to call "rank isomorphism", where the results of players' hole cards or board cards that are revealed on a given street have the same value regardless of the order in which they are revealed. This is especially useful in this application. Since we're playing pre-flop jam/fold poker, all five board cards are dealt at once after the pre-flop action is concluded. Without loss of generality we can consider only those combinations where the cards are dealt by increasing rank and suit. That is, there's no need to evaluate a situation with a given set of hole cards and the board is T♦ Q♦ 4♣ 3♠ K♦ if we have already evaluated the same hole cards with the board cards in ascending order: 3♠ 4♣ T♦ Q♦ K♦.

Finally, they were able to shrink their lookup tables considerably by using a technique they call "colexicographical indexing", although, frankly, I'm unable to understand what this entails from the description they provide.

Results From Our Equilibrium Computation

Through this process the derived strategies converge to an approximate equilibrium for both a cash game and 3-player tournament situation. The solutions for the stack sizes they presuppose are presented in tables 5 through 16.

The first result they report is that in places ICM breaks down a bit. The places where this is most significant are exactly where we'd suspect. Breakdown occurs when the stack sizes and positions relative to the blinds are most pronounced. That is, a small stack under the gun is in a more precarious position than ICM would suggest because if that player doesn't play this hand, he will be essentially pot committed next hand, and this reduces his equity.

The authors point out the distinctions between cash game and tournament 3-player strategies. Namely, that these strategies are similar for the first player to enter the pot, but that players should be more willing to call an all-in bet in cash games than in tournaments. Poker players might rephrase this by saying that David Sklansky's Gap Concept is magnified in tournaments as compared to cash games, something that most experienced poker players already know.

An interesting artifact of the authors' derived strategies is that there is no fixed ranking of hands. That is, as stack sizes change, the ordering of hand values sometimes changes as well. For example, depending on stack sizes, sometimes Q2o is a jamming hand out of the small blind whereas 64s isn't, and sometimes the reverse is true. This is an artifact that Miltersen and Sorensen found for 2-player games as well.

The authors compare their results against the Sklansky-Chubukov All In No Limit Hold 'em Rankings and find good, but not 100% complete, agreement. Any differences aren't surprising due to approximations in both sets of calculations, but the fact that they're close provides some validation for the strategy depicted in the paper.

Finally, the authors note that, in their words, "approximate equilibrium strategy for three players involves little randomization." This is even more true than the charts might suggest. Ganzfried and Sandholm state that most of the times the charts refer to a strategy as one that should be adopted in the 97-99% range, the reason it isn't 100% is because in the iteration process the results didn't fully converge yet. They expect that the true value for these strategies to be 100%.

Regular readers of this article series may fairly accuse me of sounding like a broken record here, but one of the big themes I take away from the research into algorithmic poker is that of the two types of poker deception (playing a given situation in multiple ways and playing multiple different situations in the same way), equilibrium strategies seem to strongly prefer the latter. The quoted statement above and the tables at the end of the paper are further evidence of this.

Conclusions and Future Work

Even though the authors had to make some simplifications and assumptions in order to find an approximate solution to 3-player jam/fold no-limit hold 'em games, it is possible to use game theoretic techniques to derive a strong strategy. The authors would like to be able to demonstrate that jam/fold strategies still produce an equilibrium even if strategies other than jam/fold are allowed. It seems difficult to prove that, though.

The authors make the conjecture that colluders can't hurt a player that much if the hero confines himself to jamming or folding. Certainly, it stands to reason that this would minimize the effects of collusion in that it reduces opponents' strategic options dramatically. As an example, colluders can't whipsaw a player who is either already all-in or has already folded. It would be exciting to be able to bound the value of collusion in such a game.

Finally, the authors conjecture that an equilibrium for 3-player poker might be unique. This would be an exciting result in the research community if they could prove it. I'm not optimistic that they will be able to do so any time soon. However, given results such as those in this paper, it seems reasonable to proceed as if this were the case, at least until such time as someone demonstrates that this assumption is invalid.