Commentary on "A Memory-Based Approach to Two-Player Texas
Hold'em"
by Nick Christenson
This article was originally published in the January 2011 issue of
TwoPlusTwo
Magazine.
Background
The paper I'm discussing this time around is titled "A Memory-Based Approach
to Two-Player Texas Hold'em". It was written by Jonathan Rubin and Ian
Watson from the University of Auckland in New Zealand. The paper was
published in the proceedings of AI 2009: Advances in Artificial
Intelligence, and is available online at:
http://www.cs.auckland.ac.nz/research/gameai/projects/SartreAI09.pdf.
The abstract describes the methodology used in this paper as a "Case-Based
Reasoning system". What is that? It's a method by which solutions to
problems are obtained by comparing the present situation to previous
situations already known (thus the "memory-based approach" referenced in
the paper's title) and selecting the most appropriate solution from the
program's "experience base".
It's straightforward to understand how this might be applied to poker. In
fact, all human players with even the tiniest bit of experience incorporate
these methods in the way they play the game of poker.
So, the first obvious question to ask regarding programming a poker player
is, "Where does this list of previously known solutions come from?" The best
place to obtain such an experience base is from the experience of the best
poker player(s) one can find. Basically, it's a hand history from winning
players.
Introduction
The authors make the well-known but difficult to over-emphasize point
that there are really two approaches when it comes to parameterizing winning
poker play. The first is to try to compute an optimal or near-optimal
strategy. The second is to compute a strategy that achieves maximal
expectation against prospective opponents. The authors point out the danger
in this second approach, and that is that any exploitive strategy is,
itself, non-optimal, so it may be counter-exploited.
The interesting question that the paper doesn't address is to which of
these categories the authors' memory-based approach belongs? It's
obviously not computing an optimal or near-optimal strategy. On the
other hand, it possesses no opponent modeling capability, so it can't
be considered exploitive. So what is it? Well, it's a reflection of
the way the expert player(s) played from whom the program's experience
base is drawn. Which type of strategy can be best adapted to this memory-based
approach? Well, with no opponent modeling capability, it would be especially
dangerous for a memory-based poker algorithm to try to play exploitively, so
this approach will probably work best learning from poker players who are
playing something close to optimal poker.
The authors created their program to play two-player limit Texas hold 'em and
named it SARTRE. They selected two-player limit Texas hold 'em for three
reasons, two obvious, one less so. First, it's the baseline game for most
poker researchers approaching the idea of writing poker modeling software
for the first time. Second, this choice of game allows poker researchers
to measure their progress by playing against other programs, especially
through direct competition via the
Annual Computer Poker
Competition.
Third, because the hand histories from these competitions are available,
those that come from the best players in previous competitions can be
assembled to serve as the experience base for this particular program.
The poker algorithm chosen as the baseline for SARTRE's strategy is the 2008
CPC competition champion, the University of Alberta's "Hyperborean-eq".
I suspect that this is probably the set of hands from the best poker player
that the researchers had at their disposal.
Texas Hold'em
The second section of the paper covers the rules to the poker game played,
two-person limit Texas hold 'em. Again, this information is necessary in
the paper, but I won't go over it here since this audience doesn't need to
be reminded of this information.
One thing that I will point out is that in this second section,
second paragraph, fifth sentence the paper reads:
"... in heads-up play both players need to play a lot more
hands in order to be profitable."
What the authors mean is that successful heads-up players must play
a larger percentage of the hands dealt than they would in a full ring
game. They don't mean that a heads-up player needs to play more hands
against an opponent to expect to make (or lose) the same amount
money as they would in a full ring game. At least I hope that's what
they mean.
SARTRE: System Overview
In the third section of this paper the authors describe how SARTRE was
constructed. Again, in brief, the program compares the current situation
it faces to those in its experience base and picks the one that best
represents the current situation. One may ask, by what criteria is the
present situation is compared to those in the experience base? These are:
- The betting pattern in the hand up until now.
- Current hand strength.
- Composition of the board.
In my opinion, this is a pretty good list.
The betting pattern is a list of check/call and bet/raise actions from each
street for each betting round. Basically, this is represented as a path
along a decision tree. This all seems reasonable enough. The authors
measure the similarity of a betting path for the current hand to those in
the experience base. If they're identical, they set the similarity factor
to 1. If they're not identical, they set the similarity factor to 0. So,
either the betting pattern for the present situation exactly matches one in
the experience base, or it doesn't.
It's reasonable to consider implementation of an algorithm that allows for
similarity rather than an exact match, especially if the program is at a point
in a hand where an exact match to the corresponding betting pattern doesn't
exist in the experience base. The authors understand this, but they elected to
not implement it, at least not in this release. They don't say why they made
this decision, or whether it was due to time constraints, the complexity of the
issue, or problems that manifested themselves in an attempt to implement such
a feature.
At this point let us note that position was not included on the list of
criteria that defines a given poker situation. It doesn't have to be.
Position is implicitly accounted for through criterion 1. While it's
certainly the case that SARTRE's strategy does not explicitly vary based
on position, if Hyperborean-eq is any good at all, it will have taken
position into account when it played the hands from which SARTRE's
experience base is derived. Since the betting pattern will always be
different in situations where SARTRE starts in the small blind than when
it starts in the big blind, these distinctions are already reflected in
the data, and therefore they will be represented in the way that SARTRE
plays.
As far as hand strength goes, the authors qualitatively mapped a given
hand type into the familiar poker hand categories, i.e., pair, two pair,
trips, etc.. As I've discussed in previous articles, despite this being
the most obvious way to do it, it's simply not a good way of categorizing
hands. Distinguishing between a bottom pair and top pair/best kicker
is definitely going to be more significant than in differences between
hands that rate a full house or better.
Despite the naive categorizations of made hands, SARTRE does consider a
fairly sophisticated categorization of drawing hands, including straight,
flush, overcards, and whether both hole cards are used to make the draw.
It's not clear whether they consider nut straight or flush draws to be
superior to other draws, but I'm surprised that they would categorize drawing
hands this well yet still use such a weak categorization for made hands.
It doesn't take even a novice player very long to learn that board texture
is important, and SARTRE does account for this. For example, in the paper
they categorize three consecutive cards as a "straight possible" board. Of
course, there are other board configurations where straights are possible,
even though there are fewer combinations of cards that produce them. I'm not
certain they take this into account.
As was mentioned earlier, the SARTRE experience base is based on the play of
Hyperborean-eq from the previous year's CPC. This experience base consists of
over one million cases. Of course, this is derived from much less than one
million hands. Each poker hand can be broken down into one case per betting
decision, with a minimum of one per hand and two per post-flop betting round.
When SARTRE encounters a situation, it examines all cases that exactly match
all three criteria listed above: betting pattern, hand strength, and board
composition. If more than one case matches these particular criteria, then
a response is selected randomly from the matching cases, weighted by how
frequently Hyperborean-eq chose that specific response.
Experimental Results
Once the researchers completed work on SARTRE, they tested it against two
publicly available bots: Fell Omen 2 and BluffBot. Fell Omen 2 is a bot
whose source code was publicly available, but its home web site seems to
be offline these days. BluffBot is one of the opponents available to users of
Poker Academy Pro
software. Poker Academy Pro provides a software interface to allow users
to interface a bot they've written with this software. Note, the SARTRE results
in this paper are not against the "BluffBot" that was a competitor in several
of the Computer Poker Competitions. That was written by a different developer.
SARTRE plays duplicate hands against Fell Omen 2, but this feature isn't
available for the matches against BluffBot. The authors note that playing a
duplicate match reduces variance in the outcome, because it prevents one
player from getting dealt more fortunate situations than the other. This is
true, but with a duplicate poker setup there's still a chance that one program
will elect to play one or more suck-out hands that its opponent will pass on,
and these could skew the results. Playing duplicate poker should reduce the
variance, but it's not a complete substitute for very large sample sizes.
In any case, SARTRE gets beat by Fell Omen 2. It's pretty clear Fell Omen 2
is the better player, but check out the variance in the results in each trial,
it's huge. I don't believe the trials are long enough for anyone to be
truly confident at exactly what Fell Omen 2's expected win rate against SARTRE
is, and I notice that the authors don't include a margin for error in that
result.
Over 30,000 hands, SARTRE beat BluffBot. Again, I believe we can be
confident with this qualitative result, but I don't know what margin for error
should be assigned to SARTRE's win rate. Based on SARTRE's rate of loss to
Fell Omen 2, the authors project that SARTRE would have been competitive at
the 2008 CPC, finishing about the middle of the pack.
The authors compare SARTRE's results to those of the origin of its strategies,
Hyperborean-eq. Nobody should be surprised that Hyperborean-eq performs
better, as SARTRE is, essentially, an imperfect mimic of Hyperborean-eq's
strategy.
One footnote to the results in the paper, SARTRE finished 3rd in the 2010
CPC heads-up limit Texas hold 'em division out of fifteen competitors. It
lost out to champion PULPO and second place Hyperborean-TBR. I don't know
what improvements have been made to it after this paper was written. I'd
expect that it's experience base was updated, and I'd have to believe that
at a bare minimum its hand strength evaluator was improved.
Conclusion
To some people, writing a program that does nothing but imperfectly attempt
to ape a good poker player's strategy might not seem like a significant
accomplishment. However, I believe there is still plenty to learn from
this research project.
While mimicking a good opponent may seem like a trivial bot-building strategy,
this covers a lot of the way human poker players improve their games. It's
useful to see this methodology formally modeled. One thing that's worth
noticing is that through the data provided in this paper and the 2008 CPC
results, we can project how SARTRE might have performed against its paragon,
Hyperborean-eq. It looks to me like Hyperborean-eq would have crushed it.
This demonstrates that there can be a huge difference between "understanding"
how to play good poker and trying to play the same way as such a player
without having such an "understanding" oneself.
It would be interesting to hear what improvements were made to SARTRE before
its participation in the 2010 CPC. I'd be interested to know if those areas
of focus would be relevant to human players attempting to improve the quality
of their games. Perhaps, though, it came through correcting some of the more
obvious deficiencies in the version discussed in the paper, such as implementing
a weighting for similar but not identical situations from the experience base
and providing a better hand strength evaluation function.
Finally, I have to wonder how much better or worse such a program would do if
instead of learning from another program, it could form its experience base
from the hand histories of the best online heads-up limit hold 'em players?
Perhaps at that level such players play so exploitively that a program with
no opponent modeling would just have its strategy diluted by contradictory
strategies. In any case, it would be fascinating to find out.
|