Commentary on "Pattern Classification in No-Limit Poker: a Head-Start
Evolutionary Approach"
by Nick Christenson
This article originally published in the October 2010 issue of
TwoPlusTwo
Magazine.
Background
This month I'm discussing the paper titled,
"Pattern
Classification in No-Limit Poker: a Head-Start Evolutionary Approach".
It was written by Brien Beattie, Garrett Nicolai, David Gerhard, and Robert
J. Hilderman from the University of Regina in Canada, and was published
in the Proceedings of the 20th Canadian Conference on Artificial Intelligence
in Montreal in May of 2007.
Throughout this article series I've discussed many approaches to designing
poker playing algorithms. Among these are a game theoretic approach,
a heuristic (rule-based) approach, and a Bayesian approach. In this paper
we encounter yet another method for constructing a poker playing computer
program. This time it's an evolutionary methods approach.
What are evolutionary methods? In nature evolution occurs by
several mechanisms: selection, whereby an organism that does well
compared to others will thrive; recombination, merging the characteristics
of several successful organisms to create a hybrid; and mutation, random
changes that will improve some organisms and worsen others. We can
replicate these phenomena in the form of a computer program, and then
we can apply them to a set of strategies these programs use to play poker.
Introduction
After some brief poker background, the paper discusses the "classification
problem". This is a general problem in decision programming where we're
trying to place objects into categories based on their properties. In
the context of poker, what we're usually talking about is whether a
given situation is a good poker situation for a given player or a bad one.
The authors talk about this in terms of goals in poker. While their
analysis is hardly new ground for anyone who is likely to take the time
to read this column, it got me to think about poker in some new ways.
The yardstick stick by which we measure success in poker is who
makes the most money. Of course, most players have other goals,
and that's fine, but ultimately poker is about winning money from our opponents.
We're not in it to see who wins the most pots. If that were
the measure of a successful poker player, then the right strategy would
be to call down every hand. We're also not in it to with the largest
possible percentage of the pots we play. If that were the case, we'd
only play our strongest hands. We're also not keeping score by some
other more complex method, such as number of pots won minus the number of
pots lost. If that were the case, we'd play all hands that had a greater
than 50% chance of winning, and it would never be correct to slow pay.
If each of these non-goals represented our aims as poker players, we
would be led to very different playing strategies. If the strategy we
adopt does not align with our goals, we can't expect good results.
When we try to create poker playing software, its easy for us to
accidentally write our code in a way that doesn't coincide with our goals. As it
turns out, the object of maximizing net earn in the long run is quite
difficult to reflect in a set of rules. It means that sometimes we should
be willing to give up some profit now if we can win more later. In
some situations we might play a hand differently if we were beginning
a long session than we would if it were our last hand of the evening.
These factors are quite difficult to parameterize.
The authors also discuss the dangers of predictability. I get the
feeling from this exposition they're not all that well versed in game
theory. A game theoretic approach will create unpredictability through
optimal bluffing. Once one adopts this approach the predictability issues
mostly take care of themselves.
In the next subsection the authors discuss previous work in this area.
They mention the CPRG work at the University of Alberta and their Poki
program specifically. I think this includes a misconception about
Poki which they relate at the end of the first paragraph:
Unlike our system, much of the decision making in Poki is rule based,
and thus, the best choice will be made every time, making the player
predictable to some extent.
It is true that Poki is primarily heuristic, but a heuristic approach
can still contain random elements. Poki is capable of bluffing, and
of playing a similar situation in different ways, but more importantly,
it creates deception by playing different situations the same way.
Being unpredictable in poker does not require someone to play the
same hands differently at different times under similar circumstances.
There's more than one way to be unpredictable in poker, and I don't think
the authors understand this.
System Architecture
The algorithm the authors are developing is designed to play no-limit
Texas hold'em. They're also trying to design something that can play
against arbitrary numbers of opponents. There are good reasons that
other research teams have chosen to tackle the heads-up game exclusively,
including those teams, like the CPRG, that have a lot more real world
poker experience than the authors of this paper. I'm not sure
this team realizes just what they've bitten off.
Bet sizing is a key element of big bet poker. For this project, the
authors decided to allow their agents to select from three different
bet sizes, expressed as a fraction of the agent's stack. First, since
the authors omit any discussion of blinds or antes, it's awfully
difficult to evaluate these bet sizes. Second, anyone reading this
article who has also read the book I wrote with Russ Fox, Winning
Strategies for No-limit Hold'em, can anticipate my reaction to
the notion that the authors of this paper choose to correlate bet size
with hand strength. Russ and I firmly believe that bet size is dictated
by many factors, but that hand strength shouldn't be one of them. For
this to be the one and only criterion used in bet sizing seems entirely
inappropriate to me.
The authors discuss the three main factors they use in evaluating
each poker situation: hand value, risk, and opponent aggressiveness.
The notion of hand value seems pretty self-explanatory, but it's not
at all trivial, and the paper doesn't provide much in the way of
details. How are hand strengths evaluated? Is there a table lookup
or is it evaluated as the hand develops? How are hand strengths
weighted based on the board cards? We don't know the answers to
any of these questions, and they make a great deal of difference.
Also, it seems from the text of the paper that they're assuming that
their agents' opponents have random hands, which is a problematic
assumption, especially if those opponents are calling or raising. This is a place
where a Bayesian approach would be beneficial, such as the one advocated
by the CPRG folks in the
paper
I discussed in the February 2010 article.
In essence, the authors define risk as the chance of winning the pot.
Again, this seems straightforward enough, but there's much more to it
than is provided in this paper. Specifically, how is this evaluated
against multiple opponents? How do they weight the increased gain
from multiple players against the higher hand strength typically needed
to take down the pot? We aren't told. I'm not certain the authors'
thoughts on this are all that sophisticated.
They also discuss the concept of implied odds, stating that it is
inappropriate in this case because it doesn't take into consideration
the size of bets. This is nonsense. However, they are correct to
bring up the difficulty in calculating implied odds due to the problems
of evaluating the future size of the pot and the probability that an
opponent will call. They describe a weighting function they use to
account for this. While the approach seems reasonable, they notably
don't provide any justification for their choices.
Once again, when it comes to the topic of aggressiveness, the authors
provide little in the way of details. Basically, this is a stripped
down opponent model which, I presume, would measure something like
the ratio of bets and raises to calls and/or folds. However, neither the
criteria that go into this factor nor the range of values it might
produce are mentioned. Also, in their discussion of equations 3 and 4,
they mention the variable "L", which is never used, and don't give
ranges for values like G, which is important. I suspect G falls within
the range of (0,1), and, consequently, A and w will fall in the
[0,1] range, but I don't know this for certain. We also have no idea
what the algorithm's idea of a "proper" number for A is. In any case,
I promise that nobody can reproduce the algorithms they use on this
topic from the description in this paper.
Training and Testing Data
An evolutionary learning algorithm can either evolve from a zero-knowledge
starting state, or it can be seeded with some data. The authors chose
the latter approach. The problem is, where could they find a source of that
data? Public records of poker games typically don't reveal everyone's
hole cards, so the Regina folks used hands and plays from televised poker.
Yikes. Of course, with the possible exception of High Stakes
Poker, the hands and plays shown on television won't be representative
of a random selection of poker hands. I'm not sure such a skewed subset
will be more useful than starting from scratch.
Implementation
The authors use what's called a "k-nearest neighbor" estimation to
fill in the gaps in their playing charts when it comes to evaluating
folds, calls, and raises. The k-NN method is a very simple machine
learning algorithm that's commonly used in these sorts of situations.
Conceptually, it works by assigning a value to a given set of conditions
that doesn't have a known value (from the hand database or from previous
experience) by assigning it the majority value from its nearest neighbors.
It's a good place to start, and it's likely good enough for what they're
doing here.
Next they take several different instances of their algorithm, called
agents, and have them play a tournament against each other. Why a
tournament? I have no idea, and I think that having them play a fixed
number of hands as a cash game against each other would be a better
approach. In any case, those agents that do well move on to the next
round unaltered. These winners are also used as parents for recombination
of various agents, and as the basis for agents "afflicted" with mutations.
In many evolutionary approaches, a mutation would be implemented by
changing some value in the action probability tables. However, the
authors felt that this would be too fine grained to have the desired
effect, and I agree. So what they did was change the play for a
range of situations with what they call a "hill function". Qualitatively,
think of the fine grained approach as possibly changing how the
algorithm plays 86 suited in second position against three opponents
and the hill function approach to change how the agent plays
suited one gappers in middle position against more than one opponent.
At the very least, when varying poker strategy the effects of the
latter approach will be easier to evaluate.
Comparison and Results
They create several different kinds of agents and have them duke
it out. The first kind of agent they call "data-driven". It
contains the knowledge accumulated from the televised poker hands
but does no learning. The second agent is one that uses the same
information, but then uses the evolutionary learning algorithm.
The third agent also uses evolutionary learning methods, but it
starts out with no poker knowledge.
After several generations of iteration, Figure 6 demonstrates that
the performance of best evolutionary algorithms of a particular
generation improve over those of previous generations. It doesn't
take long before the evolutionary data-driven agent plays much better
than the non-evolutionary data-driven agent. The improvement seems
to come quickly at first, and then slows down, which shouldn't be
surprising for any learning method. However, I'm a bit concerned by
the y-axis of the graph. What is the nature of the top 90 algorithms
that aren't represented in Figure 6?
The authors next talk some poker strategy. They seem surprised that
folding would be the dominant strategy. Of course, they don't tell us
how many players are at each table, but if it's more than three this
result won't surprise any serious poker player. Raises occur with good
hands. Calls occur with what they call "low risk" hands, although I'm
not exactly sure how this translates into more familiar poker terms.
I find it hard to learn much from the figures in the paper. The
exception to this is Figure 7. Note that the learning algorithm
has come up with a strategy that consists of: lots of folding, a
fair bit of raising, and very little calling. This seems quite reasonable.
It's difficult to tell from the figures, but it also looks like the
algorithm does at least some bluffing with its worst hands, which is
also promising.
The researchers ran tournaments comparing these three agents along
with two other trivial ones: an "always fold" agent and one that
acts randomly. Table 1 shows these results, but to honest, I can't
tell you what it says. The data make absolutely no sense to me.
Conclusions and Further Work
The authors admit that they don't know if their agents are good
poker players or not. I think they're being a little disingenuous
here. I suspect that even the best of their agents are poor players,
and that they know it. Nonetheless, they demonstrate that evolutionary
methods can lead to an improvement in algorithmic poker performance.
I don't see any reason to think that if they started with a better
framework that these methods couldn't be effective.
There is a lot of work to be done to turn this work into a worthy
poker opponent. The authors note that they've performed little
optimization of the evolutionary parameters. They also suggest that
further evolution of the agents should continue to bear fruit.
My suggestions on the next steps they might take are a little different.
First, I suggest they revisit their bet sizing algorithms. They should
also incorporate game theoretic bluffing. Using Bayesian methods to
adjust perceptions of opponent hand ranges would be a great addition,
but would require significant work. Also, they could sure use some
better seed data to jump start their learning algorithms. There are
probably some other implementation issues that could use fixing but
that we don't know about because the paper doesn't provide sufficient
details.
Conclusion
Once again, to a serious student of poker this probably looks like
a failure. As poker players, we tend to think of the success of a
poker algorithm in terms of how well it plays. The goal of the authors
is to demonstrate that their academic technique could be applied to
a new kind of problem. It doesn't matter whether the final product
actually ends up being any good at poker. They're only interested in
demonstrating the viability of their technique. I know I have a tough
time getting my head around this at times.
Nonetheless, we can add evolutionary methods to the list of
legitimate ways me might improve a poker algorithm. Also, I found
their unorthodox, from a poker standpoint, take on poker goals to
stimulate my thinking about the issue. So, for me the paper is
worth studying even if the resulting agent would get crushed by an
average 3-6 player from a typical public card room.
|