Commentary on
Poker Superstars: Skill or Luck?
by Nick Christenson
This article originally appeared in the August, 2009 issue of
TwoPlusTwo Magazine.
Introduction
A poker player's results during a limited time interval, such as a
cash game session or single tournament, will be partially due to
skill and partially due to luck. Over short time intervals, we expect
results to be dominated by luck. Over time intervals that are sufficiently
long, we expect skill to dominate results. For tournament poker, how
many results do we need before we can be confident in saying that the
poker players who have had the best results are the best tournament
players?
I've talked to several people who have tried to support or disprove this
hypothesis over the years, and each time the effort they have expended
has come to naught. However, a 2008 issue of the magazine
Chance
in vol. 21, no. 4 contains an article which provides evidence that
yes, indeed, we can demonstrate that poker tournaments are skill-based.
The article is titled
Poker Superstars: Skill or Luck?, and it is written
by Rachel Croson, Peter Fishman, and Devin Pope. Disclaimer: I
know Dr. Croson personally and was asked to review an early copy of
the article.
Beginning
The article begins with a discussion of the political issues
surrounding the legalization of poker in the United States. This
examination will be familiar to most all serious students of poker,
but the crux of this is that much of the debate concerning how to treat
poker seems centered on whether poker is a game of chance or skill.
If it can be demonstrated that poker, specifically tournament poker, is
a game of skill, then that may pave the way for additional legalization.
The authors intend to compare tournament poker results to those of
professional golf, a game/sport widely considered to be skill-based.
If the correlation between player performance from tournament to tournament
is similar for the largest poker tournaments and professional golf, then
poker should be treated as a game of skill, right? At least, that's
the argument.
Data
The authors explain the nature of tournament poker, important to do
in Chance, but hardly necessary for my audience. Then
they explain their data set. One of the big problems with analyzing
poker tournament results is that we almost never get to look at the complete
list of who initially bought in, so it's difficult
to measure the rates at which people cash and their relative ROIs. To
deal with this, the authors just look at the final 18 players of all
limit and no-limit Texas hold'em tournaments between 2001 and 2005 with
a $3000 or greater buy-in from the WSOP, WPT, and World Poker Open.
Thus, they are ignoring everything that happens in each tournament until
they get down to the final 18 players, and then they treat the last two
tables as if they were their own mini-tournament. There are issues with
this that we'll get to later, but it's a clever way to deal with the
problem of not having information on complete tournament fields. There
were 81 tournaments that met these criteria, and 899 players who made
at least one final 18 in these events.
The results of these poker "sub-tournaments" are compared against golfers
who finished in the top 18 in at least one of the 48 men's PGA
tournaments held during 2005. There were 218 golfers who met this
criterion. For either type of contest, if there's skill involved then
we'd expect to see a statistically significant number of players who
score high in several events be more likely to score high in subsequent
events. If this correlation is similar for both tournament poker and
golf, then that would be evidence that poker tournament and professional
golf results are due to a similar measure of skill.
Methods
Here's where the paper starts to get a little hairy, and the jargon gets
more than a little thick. The researchers are using "econometric" methods
to compare these two data sets, and if you're not familiar with their
terminology, following their methods will be difficult.
First, econometrics is a mathematical toolkit used by economists to
statistically analyze data. The method they're actually using in the
paper may be familiar to many reading this article. It's called
"ordinary least squares" (OLS) and it's a method for finding the best
fit for some curve to a data set. Basically, what they're saying in
the first paragraph of this section is that they're going to use
standard methods to fit a line to their data sets.
They spend a lot of time, though, talking about whether the assumptions
that make OLS valid really hold here using words like "heteroskedasticity"
that rarely come up in casual conversation. The second paragraph in
this section basically says, "We know that the standard OLS assumptions
might not hold, and we've taken that into account in our math, and if
you don't believe we've done it right, go read this book by Woolridge
before you complain."
They're doing three calculations here with each data set. The first they
call "experience", the second they call "finishes", and the third they
call "previous rank". They believe that if there are high correlations
between players finishes in one event and in other events, as defined
by these terms, that indicates the presence of skill.
Experience measures whether a player who finishes in the top 18 of either
a golf or poker tournament has finished in the top 18 of a previous
tournament. Finishes measures the number of times a player has previously
finished in the top 18. Previous rank is the average rank of players
based on all previous tournaments in which they had a result.
In the fourth paragraph, they discuss the possibility that their model
might not be valid and state that they ran their tests a second time using
an "ordered probit model" achieving the same results. With OLS, there's
a tacit assumption that the outcomes could be any real number; that they
don't fit nicely into bins. With these particular data sets, both our events
and our outcomes are discrete (nobody enters tournament 3.87, and nobody
gets 12.184 place), so technically OLS can run into problems. The "ordered"
in "ordered probit model" means that the order in which one records the
data are important. An ordered probit model is based on a binomial
distribution of data rather than a normal distribution of data. As the
authors claim, this doesn't affect the outcome. It's there to say,
"Don't complain about this issue, we've already thought of it."
For each of these measures, they compare the players' ranks at the end of
the tournament series to their ranks along the way and fit lines to
these data sets. This leads to the data presented in the first three
rows of Table 2. The number in brackets is the margin of error for the
measure in question.
Basically, if there's no skill involved, then a person's performance
shouldn't be correlated to their previous performance, and the "coefficients"
as they call them in the paper should be within their margins of error
of zero. The further they are away from zero, or more specifically,
the more of the multiples of their margins for error these factors are
from zero, the greater the effect of skill.
Results
For each of the three metrics discussed above, both poker and golf
display a correlation number well outside the margin of error from
zero. In the "experience" metric, golf shows a considerably higher
correlation than poker for showing up in one final 18 and then another.
This may be due to the smaller number of overall participants in
the golf events. In the "finishes" metric, the two values are within
the margins of error of each other, indicating a similar effect. In
the "previous rank" metric, poker shows a considerably higher correlation
between a player's finish in one tournament and his average finish in
the previous events. I may have a theory to explain this that I describe
below. Based on this data, we would have to conclude that both tournament
golf and tournament poker are games of skill, and at least to a first-order
degree, they are similar in the extent to which skill plays a part in
determining rankings in an event.
The bottom three rows of the table indicate some statistical information
about each of the data samples themselves. Some who are statistically
minded may be drawn to the "R-Squared" values and how low they are. One
might wonder whether a data set with such low values can be trusted.
In this case, the errors on each coefficient in the top three rows is
the measure of whether there is a skill component or not, and in the case
of poker, since each value is between 2.3 and 4 times the error away from
zero, that's a pretty strong indicator that the correlation is real.
What the low r-squared values really indicate is what a poor predictor
the data are of future results for any individual in either type of
contest, and that shouldn't be terribly surprising.
Discussion and Conclusion
The authors conclude by making two points:
- There is a significant skill component to poker.
- The skill differences among poker players seem to be somewhat similar
to the skill differences among professional golfers.
Consequently, they believe that tournament poker should be judged as a
game of skill. What that should mean for the legality of poker in various
jurisdictions within the United States is a matter for another forum
(and methodology.)
I believe that given the data, the authors' conclusions are reasonable.
There are a couple of things, though, that trouble me a little about
the results. First, the differences in the number of individuals in each
data set concerns me a little. In each men's PGA tour event, the entrants,
and the money finishers, are predominantly the same 150 players. According
to the data in the paper, 218 different people received a top 18 finish
at some point during 2005. In the data sample of poker tournaments
there were 899 individuals who achieved a top 18 finish, and I expect in
the tens of thousands of people who entered these events. I'm not sure
what this will do to statistical samples where we're measuring correlations
between a player's performance in one event vs. another.
Second, and my main concern, is that all poker players who make it into
the "sub-tournament" of the final 18 places in a poker tournament are
unlikely to do so on even footing. Consider this possibility for a moment:
Imagine that half of the starting poker tournament field were to adopt
a tournament strategy we call "survive at any cost", where their goal
is to maximize the chance that they make the top 18. Imagine that the
other half of the starting field were to adopt a strategy called "first
place or bust", where the goal is to sacrifice equity in order to maximize
the chance that they would be the eventual tournament winner. Also assume
that every player entering the tournament were of equal skill.
If this were the case, then what we'd expect is to see a majority of
players in the top 18 of any tournament would be from the "survive at
any cost" group although many of them would be short stacks. The few
who came from the "first place or bust" group would more likely dominate
the chip leaders. In this case, we'd see performance for those who repeat
in the top 18 would be highly correlated from event to event, even if there
were no skill involved. This sort of effect might explain some of the
correlations we're seeing in this data. This possibility requires an
assumption, but not a huge one. There are a large number of poker
tournament players who claim that their goal is maximizing their overall
equity in a tournament, and a large number who have stated that they
play to win every tournament they enter. This philosophical schism
already exists among top-flight tournament poker players.
Of course, to the extent that it might exist, this potential effect is
very difficult to measure without knowing the complete list of who bought
in to each tournament. Also, even if it, or some similar effect, were
present that still doesn't mean that there isn't a measurable skill
component to tournament poker, nor does it mean that this is the
effect the authors are measuring.
There is a rich source of data where all the entries for poker tournaments
are collected, and that's in the online arena. I'd really like it if
one of the big online poker tournament sites could turn over their
tournament registration and rank data for some long time period, properly
anonymized, of course, to someone with the experience and interest to do
a strong analysis of the data. This may be able to settle this question
once and for all. Until then, this paper is a worthy study that goes
further toward answering this question than I had suspected possible with
publicly accessible data sets.
I'd like to thank Dr. Rachel Croson and Dr. Devin Pope for taking the
time to answer some of my questions about their research.
|