Commentary on "Approximating Game-Theoretic Optimal Strategies for Full-Scale Poker"
by Nick Christenson
This article originally appeared in the December 2009 issue of
Two Plus Two Magazine.
Introduction
Over the last decade or so, the research group that has pushed furthest the
frontiers of computer poker is the
Computer
Poker Research Group
(CPRG) at the University of Alberta. The most visible member of the group
is the lead author of the paper I will discuss in this month's column,
Darse Billings.
The CPRG has written many vitally important papers advancing the state
of the art in the way computers play poker. Moreover, they're not just
producing ivory tower research results. Their efforts have led to
several competitive poker applications. These include
commercial software products such as
Poker Academy Pro, based
around the Poki poker software. They are also responsible for
Polaris, a program
which has performed capably against top poker professionals in
competition.
Much of the theoretical underpinnings that have gone into the Polaris
software are discussed in the paper,
"Approximating
Game-Theoretic Optimal Strategies for Full-scale Poker", which
was published in the Proceedings of the 18th International Joint
Conference on Artificial Intelligence in August, 2003.
The goal of the research underlying this paper is to improve the
state-of-the-art in a classic man vs. machine situation. Specifically,
the authors try to come up with better poker playing algorithms. Their
goal is to advance computer skill in poker the way similar research
efforts have advanced computer skill in checkers and chess.
In practice, this is difficult since poker has a very large "game space".
That is, the enumeration of all possible options in a given hand is very
large. In fact, with present computing resources, it's impossible to
solve even a relatively simple "real" poker game, such as heads-up limit
hold'em.
Researchers can take either of two approaches in dealing with overly
complex games. First, we can simplify the game and solve the easier
problem. Second, we can make simplifying approximations about the
complete game. Neither of these techniques lead to the ultimate goal
of game theory, and that is provably correct strategies. However,
they do get us a step closer to that goal by providing insight about
the full game and creating testable hypotheses, some of which will
lead to improved strategies.
At the end of Section 1 in the paper, the authors claim the following
results for their research: Their simplifications reduce the search space
for the poker games they're considering by 11 orders of magnitude.
Also, their new algorithms prove superior to other computer opponents.
Moreover, these new algorithms can often play competitively with
strong human poker players.
Game Theory
This section describes the group's general approach to writing poker
software. Unlike games such as chess or checkers, poker is a game
with hidden information (our opponent's cards) and random events. This
makes the algorithms necessary to solve poker much different than for
other complex games that have been studied in the past.
The standard method for solving games is to make a tree of all possible
decisions branching out from decision points. This is then converted
to a system of mathematical equations, which we then solve. However,
because of the large number of possible decisions combined with the
number of decision points, for any popularly played game of poker we
cannot solve these equations directly. Therefore, if we want to tackle
a real poker game, we need to make some approximations.
The third section of the paper provides the reader background on the
game of Texas hold 'em. Needless to say, such an introduction isn't
necessary for readers of this article. However, it is important to
note that the game the CPRG folks are trying to solve here is specifically
heads-up limit Texas hold 'em. Since this is the least complex way
Texas hold 'em, can be played, it is a logical place to start.
Abstractions
In this section, the authors list a possible set of abstractions that
can be adopted to reduce the complexity of the game to more manageable
levels without rendering the resulting strategies invalid. Some of
these are "free", that is, they don't have any strategic effect, some
are more dangerous. Some of these abstractions are adopted by the authors
in the programs they developed, some aren't.
The first abstraction they consider is what they call "suit equivalence
isomorphism". An "isomorphism" in this context is when one changes something
that doesn't change any results. An example would be that playing A♣
K♣ with a K♠ 8♥ 3♦ flop will always be identical
to how one approaches A♦ K♦ with a flop of K♣ 8♠
3♥. Basically, this is a fancy math way of saying, "If I'm dealt
two suited or unsuited cards, it doesn't matter which suits they are except
in context with the board." This is something every poker player already
knows. This abstraction does not affect strategy, and it will be used in
the authors' algorithms.
The authors also discuss "rank equivalence" and "rank near equivalence".
The authors don't provide any examples, but one possible example of "rank
equivalence" is that a hand of A♠ 7♥ with a board of 9♠
9♣ 8♥ 8♦ K♠ has the same equity as a hand of
A♠ 3♥ with the same board. An example of "rank near
equivalence" might be considering starting hands such as K7s and K6s as the
same. They make the same number of trips or better hands. The only
difference is that K7 is slightly more likely to make top pair hands, and
K6 is more likely to make bottom pair hands. However, I expect most
people would agree that considering these hands as having equivalent
strength won't hurt our equity very much, if the difference is even
measurable. Rank equivalence abstractions won't change strategy, rank
near equivalence might alter our strategy a little bit, but only a
little bit and only in rare occasions. The authors use both abstractions.
The next abstraction the authors discuss is "deck reduction". It becomes a
lot easier to solve poker math problems if we reduce the number of cards in
the deck. A radical example of "deck reduction" is the AKQ game discussed
in Mathematics of Poker. Since this would alter the way the
game is played too greatly, the authors don't use this form of abstraction
in the poker-playing computer programs they discuss in this paper.
Similarly, one can also reduce the number of cards in a
player's hand or that come on the board. An example that the authors
cite of research that uses this sort of abstraction is Shi and Littman's
paper discussing the abbreviated game of Rhode Island hold 'em.
The
paper they reference was discussed in an article I wrote in the June,
2009 issue of TwoPlusTwo Magazine. The authors use a restricted form
of this abstraction which will be discussed in a little more detail later
on.
If one adopts abstractions which might affect the game strategy, any
solution to the game that includes these abstractions cannot be said
to be an "optimal" solution to the game. However, if the abstractions
are chosen judiciously, then the derived strategies should be close to
optimal, or "pseudo-optimal", as the authors call them.
Next the authors discuss how they implemented each of these abstractions
in practice. First, regarding betting round reductions, they restricted
their strategies to considering the effects of a bet and two raises only
on each street. That cut down by two the number of decision nodes per
betting round. They ran some simulations using the restricted number
of bets and don't feel that this affects strategy all that much. However,
if they reduced the possible number of bets and raises to one bet and
one raise, that would have altered strategies substantially, so that
additional level of abstraction would be inappropriate.
They also came up with an interesting method of cutting down the number
of betting rounds. Cutting a betting round out completely would have
changed strategies too greatly to be useful as an abstraction. However,
they can make some less drastic simplifications.
First, when considering pre-flop play only, they looked at
eliminating late betting rounds. Instead of betting on these later rounds,
they award the pot on a percentage basis based on each hand's equity at
that time. They create one program that eliminates just the river betting
round in this manner, and another that eliminates both the turn and river
betting round. This cuts down the complexity for pre-flop decisions.
Another abstraction one can consider is making the strategies on each
betting round independent of each other. Again, this was suggested in
the paper by Shi and Littman, and, again, the ramifications of this were
discussed in my June article. The authors believe that this abstraction
is too problematic, as it eliminates intentional multi-street strategies.
Another abstraction method mentioned in the Shi and Littman paper is
"bucketing". That is treating hands of similar strengths that play
similarly as equivalent. The authors also give a nod to Sklansky and
Malmuth's Hold 'em Poker for Advanced Players, and their
nine groups for starting hands as another source expressing a similar
concept. In this implementation they create six buckets for starting
hands. Five of these buckets categorize hands by intrinsic strength,
with a sixth bucket set aside for low-value but high-potential hands.
The authors don't mention which starting hands go in which bucket,
but since they mention high-potential hands being those hands that
tend to make flushes and straights, we can assume that suited connectors,
bad suited aces, and hands such as 98off might be in this bucket,
perhaps among others.
The authors note that going from 6 to 7 to 8 buckets doesn't seem
to make much of a difference. They also note that many successful
human players probably don't make any more distinctions between
types of hands than this.
Now that the authors have described how each of these betting rounds
work, all that remains is to tie them together with a list of the
probabilities of transitioning from one set of possibilities to another,
and we have a pseudo-optimal solution to the game.
Testing
In order to explore the ramifications of some of the abstraction
decisions the authors have made, they created a few slightly different
versions of the poker software that uses these methods. They also
employed some other poker 'bots, including Poki, Anti-Poki (a 'bot
created specifically to exploit Poki's weaknesses), Adapti (a 'bot
designed to detect and exploit opponent weaknesses), and two trivial
'bots called "Always Call" and "Always Raise" who play exactly the
way one would expect. All of these programs play at least 20,000
hands against each other in a round robin fashion. Their win rates
against each other can then be compared.
It's important to note that a game theoretically optimal poker
strategy can be more properly described as playing "not to lose" than
as "playing to maximize gain". A player who plays optimally will not
lose against any game, but may not necessarily win at a high rate against
anyone, even an especially weak player. The way to maximize one's win
rate against a bad player is to play exploitively, that is, in a non-optimal
way, but in such a way that takes maximum advantage of an opponent's
mistakes. Overall, Section 5.1 of the paper does a really good job of
presenting the issues involved in optimal vs. exploitive strategies in
just a few paragraphs.
After the round-robin, the two biggest winners were the two most
sophisticated programs that implemented the algorithms discussed
in the paper. However, the projected number 2 program achieved the
best results. The authors attribute this to some programming errors
that were discovered after the fact. Interestingly, although these top
two programs never lost a match (except against each other), they
didn't perform all that well against the especially weak programs in
the competition. This lines up with the discussion in the previous
paragraph.
One interesting result of note is the performance of the program they
called "PsiOpti0". This program played a pretty sophisticated
post-flop game, but its pre-flop strategy is simply "always call".
Yet, the only two matches it lost were to the two more sophisticated
programs that implemented a more complete pseudo-optimal strategy.
Plus, it did almost as well as the most sophisticated programs against
the least skilled opponents. This provides some credence to the notion
that if one plays well from the flop on, it's possible to be a successful
poker player even if one plays badly pre-flop.
Next, the top two programs are pitted against human opponents of
varying skill. Skill levels ranged from complete novices to world
class players specializing in short-handed limit hold 'em games.
Expert players were able to beat the program, but experienced but
non-expert players played about break-even with the program, and
less experienced players generally lost to it. This is no small
accomplishment, especially considering that we're talking about
a programs and computers that are at the present time more than 6
years old.
The authors make some observations about these matches. First,
experienced poker players indicated that the program played in
a different manner than they expected. Second, in these matches
it becomes clear that against many types of opponents, one of the
program's biggest advantages is that it doesn't go on tilt and
doesn't get tired, whereas these problems plague their human
opponents.
Recall that the programs described in this paper don't do any opponent
modeling. They never attempt to exploit their opponents. The authors
expect that adding opponent modeling will improve performance for
two reasons: First, nearly all human players have exploitable weaknesses.
Second, even slight variations in play caused by the program adapting
to its opponent would make it much harder for a human to get a line on
the computer's play. This makes it more difficult to find weaknesses
and exploit them with an appropriate counter-strategy. The authors claim
that as it turns out, most humans can be accurately modeled with only a
small amount of information.
Conclusions
The best machine players in the world are getting much better at playing
poker. The 2008 Polaris results are a good indication of this. We're
probably not all that far away from having computers that can beat all
humans in the restricted domain of playing heads-up limit Texas hold 'em.
The addition of opponent modeling, advancements in computing power,
and refinements in the algorithms used will lead to programs that will
require the best humans to improve their games in order to just stay even.
In this way, the poker world will likely become analogous to the world
of chess. We're still a long way from being able to completely solve
the game of Texas hold 'em, but pseudo-optimal strategies will probably
get us very close, perhaps in an unexpectedly short span of time.
|