LV Revealed
 
 

Commentary on Poker Superstars: Skill or Luck?


by Nick Christenson

This article originally appeared in the August, 2009 issue of TwoPlusTwo Magazine.

Introduction

A poker player's results during a limited time interval, such as a cash game session or single tournament, will be partially due to skill and partially due to luck. Over short time intervals, we expect results to be dominated by luck. Over time intervals that are sufficiently long, we expect skill to dominate results. For tournament poker, how many results do we need before we can be confident in saying that the poker players who have had the best results are the best tournament players?

I've talked to several people who have tried to support or disprove this hypothesis over the years, and each time the effort they have expended has come to naught. However, a 2008 issue of the magazine Chance in vol. 21, no. 4 contains an article which provides evidence that yes, indeed, we can demonstrate that poker tournaments are skill-based. The article is titled Poker Superstars: Skill or Luck?, and it is written by Rachel Croson, Peter Fishman, and Devin Pope. Disclaimer: I know Dr. Croson personally and was asked to review an early copy of the article.

Beginning

The article begins with a discussion of the political issues surrounding the legalization of poker in the United States. This examination will be familiar to most all serious students of poker, but the crux of this is that much of the debate concerning how to treat poker seems centered on whether poker is a game of chance or skill. If it can be demonstrated that poker, specifically tournament poker, is a game of skill, then that may pave the way for additional legalization. The authors intend to compare tournament poker results to those of professional golf, a game/sport widely considered to be skill-based. If the correlation between player performance from tournament to tournament is similar for the largest poker tournaments and professional golf, then poker should be treated as a game of skill, right? At least, that's the argument.

Data

The authors explain the nature of tournament poker, important to do in Chance, but hardly necessary for my audience. Then they explain their data set. One of the big problems with analyzing poker tournament results is that we almost never get to look at the complete list of who initially bought in, so it's difficult to measure the rates at which people cash and their relative ROIs. To deal with this, the authors just look at the final 18 players of all limit and no-limit Texas hold'em tournaments between 2001 and 2005 with a $3000 or greater buy-in from the WSOP, WPT, and World Poker Open. Thus, they are ignoring everything that happens in each tournament until they get down to the final 18 players, and then they treat the last two tables as if they were their own mini-tournament. There are issues with this that we'll get to later, but it's a clever way to deal with the problem of not having information on complete tournament fields. There were 81 tournaments that met these criteria, and 899 players who made at least one final 18 in these events.

The results of these poker "sub-tournaments" are compared against golfers who finished in the top 18 in at least one of the 48 men's PGA tournaments held during 2005. There were 218 golfers who met this criterion. For either type of contest, if there's skill involved then we'd expect to see a statistically significant number of players who score high in several events be more likely to score high in subsequent events. If this correlation is similar for both tournament poker and golf, then that would be evidence that poker tournament and professional golf results are due to a similar measure of skill.

Methods

Here's where the paper starts to get a little hairy, and the jargon gets more than a little thick. The researchers are using "econometric" methods to compare these two data sets, and if you're not familiar with their terminology, following their methods will be difficult.

First, econometrics is a mathematical toolkit used by economists to statistically analyze data. The method they're actually using in the paper may be familiar to many reading this article. It's called "ordinary least squares" (OLS) and it's a method for finding the best fit for some curve to a data set. Basically, what they're saying in the first paragraph of this section is that they're going to use standard methods to fit a line to their data sets.

They spend a lot of time, though, talking about whether the assumptions that make OLS valid really hold here using words like "heteroskedasticity" that rarely come up in casual conversation. The second paragraph in this section basically says, "We know that the standard OLS assumptions might not hold, and we've taken that into account in our math, and if you don't believe we've done it right, go read this book by Woolridge before you complain."

They're doing three calculations here with each data set. The first they call "experience", the second they call "finishes", and the third they call "previous rank". They believe that if there are high correlations between players finishes in one event and in other events, as defined by these terms, that indicates the presence of skill.

Experience measures whether a player who finishes in the top 18 of either a golf or poker tournament has finished in the top 18 of a previous tournament. Finishes measures the number of times a player has previously finished in the top 18. Previous rank is the average rank of players based on all previous tournaments in which they had a result.

In the fourth paragraph, they discuss the possibility that their model might not be valid and state that they ran their tests a second time using an "ordered probit model" achieving the same results. With OLS, there's a tacit assumption that the outcomes could be any real number; that they don't fit nicely into bins. With these particular data sets, both our events and our outcomes are discrete (nobody enters tournament 3.87, and nobody gets 12.184 place), so technically OLS can run into problems. The "ordered" in "ordered probit model" means that the order in which one records the data are important. An ordered probit model is based on a binomial distribution of data rather than a normal distribution of data. As the authors claim, this doesn't affect the outcome. It's there to say, "Don't complain about this issue, we've already thought of it."

For each of these measures, they compare the players' ranks at the end of the tournament series to their ranks along the way and fit lines to these data sets. This leads to the data presented in the first three rows of Table 2. The number in brackets is the margin of error for the measure in question.

Basically, if there's no skill involved, then a person's performance shouldn't be correlated to their previous performance, and the "coefficients" as they call them in the paper should be within their margins of error of zero. The further they are away from zero, or more specifically, the more of the multiples of their margins for error these factors are from zero, the greater the effect of skill.

Results

For each of the three metrics discussed above, both poker and golf display a correlation number well outside the margin of error from zero. In the "experience" metric, golf shows a considerably higher correlation than poker for showing up in one final 18 and then another. This may be due to the smaller number of overall participants in the golf events. In the "finishes" metric, the two values are within the margins of error of each other, indicating a similar effect. In the "previous rank" metric, poker shows a considerably higher correlation between a player's finish in one tournament and his average finish in the previous events. I may have a theory to explain this that I describe below. Based on this data, we would have to conclude that both tournament golf and tournament poker are games of skill, and at least to a first-order degree, they are similar in the extent to which skill plays a part in determining rankings in an event.

The bottom three rows of the table indicate some statistical information about each of the data samples themselves. Some who are statistically minded may be drawn to the "R-Squared" values and how low they are. One might wonder whether a data set with such low values can be trusted. In this case, the errors on each coefficient in the top three rows is the measure of whether there is a skill component or not, and in the case of poker, since each value is between 2.3 and 4 times the error away from zero, that's a pretty strong indicator that the correlation is real. What the low r-squared values really indicate is what a poor predictor the data are of future results for any individual in either type of contest, and that shouldn't be terribly surprising.

Discussion and Conclusion

The authors conclude by making two points:

  1. There is a significant skill component to poker.
  2. The skill differences among poker players seem to be somewhat similar to the skill differences among professional golfers.

Consequently, they believe that tournament poker should be judged as a game of skill. What that should mean for the legality of poker in various jurisdictions within the United States is a matter for another forum (and methodology.)

I believe that given the data, the authors' conclusions are reasonable. There are a couple of things, though, that trouble me a little about the results. First, the differences in the number of individuals in each data set concerns me a little. In each men's PGA tour event, the entrants, and the money finishers, are predominantly the same 150 players. According to the data in the paper, 218 different people received a top 18 finish at some point during 2005. In the data sample of poker tournaments there were 899 individuals who achieved a top 18 finish, and I expect in the tens of thousands of people who entered these events. I'm not sure what this will do to statistical samples where we're measuring correlations between a player's performance in one event vs. another.

Second, and my main concern, is that all poker players who make it into the "sub-tournament" of the final 18 places in a poker tournament are unlikely to do so on even footing. Consider this possibility for a moment: Imagine that half of the starting poker tournament field were to adopt a tournament strategy we call "survive at any cost", where their goal is to maximize the chance that they make the top 18. Imagine that the other half of the starting field were to adopt a strategy called "first place or bust", where the goal is to sacrifice equity in order to maximize the chance that they would be the eventual tournament winner. Also assume that every player entering the tournament were of equal skill.

If this were the case, then what we'd expect is to see a majority of players in the top 18 of any tournament would be from the "survive at any cost" group although many of them would be short stacks. The few who came from the "first place or bust" group would more likely dominate the chip leaders. In this case, we'd see performance for those who repeat in the top 18 would be highly correlated from event to event, even if there were no skill involved. This sort of effect might explain some of the correlations we're seeing in this data. This possibility requires an assumption, but not a huge one. There are a large number of poker tournament players who claim that their goal is maximizing their overall equity in a tournament, and a large number who have stated that they play to win every tournament they enter. This philosophical schism already exists among top-flight tournament poker players.

Of course, to the extent that it might exist, this potential effect is very difficult to measure without knowing the complete list of who bought in to each tournament. Also, even if it, or some similar effect, were present that still doesn't mean that there isn't a measurable skill component to tournament poker, nor does it mean that this is the effect the authors are measuring.

There is a rich source of data where all the entries for poker tournaments are collected, and that's in the online arena. I'd really like it if one of the big online poker tournament sites could turn over their tournament registration and rank data for some long time period, properly anonymized, of course, to someone with the experience and interest to do a strong analysis of the data. This may be able to settle this question once and for all. Until then, this paper is a worthy study that goes further toward answering this question than I had suspected possible with publicly accessible data sets.

I'd like to thank Dr. Rachel Croson and Dr. Devin Pope for taking the time to answer some of my questions about their research.