Commentary on Computing an Approximate Jam/Fold Equilibrium
for 3-player No-Limit Texas Hold'em Tournaments
by Nick Christenson
This article was originally published in the May 2011 issue
of TwoPlusTwo
Magazine.
Background
The paper I'm discussing this month is titled "Computing an Approximate
Jam/Fold Equilibrium for 3-player No-Limit Texas Hold'em Tournaments". It
was written by Sam Ganzfried and Tuomas Sandholm from Carnegie Mellon
University. The paper was included in the Proceedings of the 7th International
Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008) in
Estoril, Portugal from May 2008. It's available online at:
http://www.cs.cmu.edu/~sandholm/3-player%20jam-fold.AAMAS08.pdf.
This paper comes out of a group at CMU who has emerged as a top rival
to the University of Alberta group for pre-eminence in the field of computer
poker, especially regarding no-limit hold 'em. The author, Sandholm, was
also featured in
a
recent interview with National Public Radio on the issue of bots in
online poker.
Introduction
As with the last article in this series, I'm looking at the special
issues involved in finding strategies for 3-player poker. Specifically,
the authors are looking at 3-player no-limit hold 'em. They constrain
their problem to a given total number of chips in play and restrict
strategy decisions to going all-in or folding pre-flop.
Ganzfried and Sandholm make a reference to another paper where this
question is solved for 2-player no-limit hold 'em, and they discuss
the techniques used to arrive at that solution. The authors also make
the usual caveats about the increase in complexity in going from a
2-player to 3-player game and that in the 3-player case there is no
guarantee that an equilibrium solution even exists.
Jam/Fold Strategies and Heads-up Results
Section 2 of the paper covers no-limit hold 'em. Readers of the paper
may need this information, but readers of this article won't. The
last paragraph of this section covers sit-n-go tournaments.
The tournament results of this paper have relevance in that the problem
discussed in the paper is the situation sit-n-go players (or players of
any tournament) face once it gets down to the final three. In fact, the
authors have gone out of their way to try to make the results as real-life
relevant as possible, using blinds, antes, payouts, and stack sizes similar
to those used in actual online events.
In the first paragraph of section 3 the authors justify the restriction
to jam/fold strategies. Of course, anyone who has read any of the recent
literature on tournament poker knows that there are many situations in
which adopting a jam/fold strategy makes sense, and there are many more
where even if it isn't optimal, it's still a pretty strong strategy.
So, once again, this motivation is primarily for an audience less informed
about the particulars of poker strategy.
The authors again refer to the paper by Miltersen and Sorensen that
solves this problem for the 2-player game. Their synopsis of the key
results from that paper go by a little fast, but in summary, Miltersen
and Sorensen find a 2-player near-optimal solution using what looks a
lot like the regret minimization technique discussed in the
March 2011 article in
this series. They can also guarantee that restricting this strategy
to jam/fold creates only a very small deviation from a less
restrictive strategy. They also point out that the optimal strategy for
a cash game is almost identical to what it would be in a tournament.
Honestly, I expect that it would be identical in the 2-player case, so
I don't know what's going on here.
Independent Chip Model
Section 4 begins with an explanation of the Independent Chip Model (ICM).
Again, this is something that readers of this article are probably quite
familiar with. If not, the paper refers to the
twoplustwo.com
single table tournament FAQ as a source on the subject, which I find
to be cool.
Just as a refresher, tournament play is influenced by the fact that when
played multi-way, optimal strategy is different from what it would be in
cash games because tournament chips have a non-linear value. The authors
don't have a way to deal with this non-linearity with complete generality,
so they adopt ICM as a widely accepted approximation for converting tournament
stack sizes to their expected value.
Our Two-Level Iterative Algorithm for Equilibrium Finding
This section covers the algorithm they use to come up with their strategy.
This is shown in pseudo-code form as Algorithm 5.1. Again, while they
don't use this language, it looks a lot like they're using regret minimization
to iteratively find an ε-Nash equilibrium for the game.
Unfortunately, this algorithm is not explained nearly as well as I would
like. While δ is given a value, what it is measuring isn't explained
in this paper. Further, we don't have a really good explanation of
what the outer loop is counting. It seems, though, that the outer loop
is repeatedly measuring the payoffs to each player and terminates when
the difference in successive runs falls below the δ threshold.
The inner loop represents the implementation of a learning algorithm that
calculates an updated best response for each player to the average strategy
adopted by its opponents thus far. It loops until the maximum regret
for a given strategy also falls below some threshold. Again, check out
the March 2011 article in this series for a more complete explanation of
regret minimization.
This approach would be guaranteed to converge toward an equilibrium in
the 2-player game, but no such guarantee exists here. Fortunately, the
authors are able to prove via Theorem 1 that if this method does converge,
that it will converge to a Nash equilibrium.
Implementation Discussion
With three players and five board cards there are a lot of possible
card combinations over which the algorithm must iterate. Obviously,
the authors want to simplify the problem as much as possible, both
using abstraction techniques we've seen before, such as those discussed
in the
article from this series published in December 2009,
and those that are new.
From the folio of familiar abstraction methods the authors use suit
isomorphism: the idea that once they've dealt with a state in their
game space with certain suits represented, that they can rotate through
the equivalent suit combinations and infer the same results.
In the list of new mechanisms, they use something I choose to call "rank
isomorphism", where the results of players' hole cards or board cards
that are revealed on a given street have the same value regardless of the
order in which they are revealed. This is especially useful in this
application. Since we're playing pre-flop jam/fold poker, all five
board cards are dealt at once after the pre-flop action is concluded.
Without loss of generality we can consider only those combinations where
the cards are dealt by increasing rank and suit. That is, there's no
need to evaluate a situation with a given set of hole cards and the board
is T♦ Q♦ 4♣ 3♠ K♦ if we have already evaluated
the same hole cards with the board cards in ascending order: 3♠
4♣ T♦ Q♦ K♦.
Finally, they were able to shrink their lookup tables considerably by
using a technique they call "colexicographical indexing", although,
frankly, I'm unable to understand what this entails from the description
they provide.
Results From Our Equilibrium Computation
Through this process the derived strategies converge to an approximate
equilibrium for both a cash game and 3-player tournament situation.
The solutions for the stack sizes they presuppose are presented in
tables 5 through 16.
The first result they report is that in places ICM breaks down a bit.
The places where this is most significant are exactly where we'd suspect.
Breakdown occurs when the stack sizes and positions relative to the blinds
are most pronounced. That is, a small stack under the gun is in a more
precarious position than ICM would suggest because if that player doesn't
play this hand, he will be essentially pot committed next hand, and this
reduces his equity.
The authors point out the distinctions between cash game and tournament
3-player strategies. Namely, that these strategies are similar for the
first player to enter the pot, but that players should be more willing
to call an all-in bet in cash games than in tournaments. Poker players
might rephrase this by saying that David Sklansky's Gap Concept is
magnified in tournaments as compared to cash games, something that most
experienced poker players already know.
An interesting artifact of the authors' derived strategies is that there is
no fixed ranking of hands. That is, as stack sizes change, the ordering
of hand values sometimes changes as well. For example, depending on stack
sizes, sometimes Q2o is a jamming hand out of the small blind whereas 64s
isn't, and sometimes the reverse is true. This is an artifact that
Miltersen and Sorensen found for 2-player games as well.
The authors compare their results against the
Sklansky-Chubukov
All In No Limit Hold 'em Rankings and find good, but not 100% complete,
agreement. Any differences aren't surprising due to approximations in
both sets of calculations, but the fact that they're close provides some
validation for the strategy depicted in the paper.
Finally, the authors note that, in their words, "approximate equilibrium
strategy for three players involves little randomization." This is even
more true than the charts might suggest. Ganzfried and Sandholm state
that most of the times the charts refer to a strategy as one that should
be adopted in the 97-99% range, the reason it isn't 100% is because in
the iteration process the results didn't fully converge yet. They expect
that the true value for these strategies to be 100%.
Regular readers of this article series may fairly accuse me of sounding
like a broken record here, but one of the big themes I take away from
the research into algorithmic poker is that of the two types of poker
deception (playing a given situation in multiple ways and playing
multiple different situations in the same way), equilibrium strategies
seem to strongly prefer the latter. The quoted statement above
and the tables at the end of the paper are further evidence of this.
Conclusions and Future Work
Even though the authors had to make some simplifications and assumptions
in order to find an approximate solution to 3-player jam/fold no-limit
hold 'em games, it is possible to use game theoretic techniques to derive
a strong strategy. The authors would like to be able to demonstrate that
jam/fold strategies still produce an equilibrium even if strategies other
than jam/fold are allowed. It seems difficult to prove that, though.
The authors make the conjecture that colluders can't hurt a player that
much if the hero confines himself to jamming or folding. Certainly, it
stands to reason that this would minimize the effects of collusion in
that it reduces opponents' strategic options dramatically. As an example,
colluders can't whipsaw a player who is either already all-in or has already
folded. It would be exciting to be able to bound the value of collusion
in such a game.
Finally, the authors conjecture that an equilibrium for 3-player poker
might be unique. This would be an exciting result in the research community
if they could prove it. I'm not optimistic that they will be able to do
so any time soon. However, given results such as those in this paper, it
seems reasonable to proceed as if this were the case, at least until such
time as someone demonstrates that this assumption is invalid.
|