LV Revealed
 
 

Commentary on "A Memory-Based Approach to Two-Player Texas Hold'em"

by Nick Christenson

This article was originally published in the January 2011 issue of TwoPlusTwo Magazine.

Background

The paper I'm discussing this time around is titled "A Memory-Based Approach to Two-Player Texas Hold'em". It was written by Jonathan Rubin and Ian Watson from the University of Auckland in New Zealand. The paper was published in the proceedings of AI 2009: Advances in Artificial Intelligence, and is available online at: http://www.cs.auckland.ac.nz/research/gameai/projects/SartreAI09.pdf.

The abstract describes the methodology used in this paper as a "Case-Based Reasoning system". What is that? It's a method by which solutions to problems are obtained by comparing the present situation to previous situations already known (thus the "memory-based approach" referenced in the paper's title) and selecting the most appropriate solution from the program's "experience base". It's straightforward to understand how this might be applied to poker. In fact, all human players with even the tiniest bit of experience incorporate these methods in the way they play the game of poker.

So, the first obvious question to ask regarding programming a poker player is, "Where does this list of previously known solutions come from?" The best place to obtain such an experience base is from the experience of the best poker player(s) one can find. Basically, it's a hand history from winning players.

Introduction

The authors make the well-known but difficult to over-emphasize point that there are really two approaches when it comes to parameterizing winning poker play. The first is to try to compute an optimal or near-optimal strategy. The second is to compute a strategy that achieves maximal expectation against prospective opponents. The authors point out the danger in this second approach, and that is that any exploitive strategy is, itself, non-optimal, so it may be counter-exploited.

The interesting question that the paper doesn't address is to which of these categories the authors' memory-based approach belongs? It's obviously not computing an optimal or near-optimal strategy. On the other hand, it possesses no opponent modeling capability, so it can't be considered exploitive. So what is it? Well, it's a reflection of the way the expert player(s) played from whom the program's experience base is drawn. Which type of strategy can be best adapted to this memory-based approach? Well, with no opponent modeling capability, it would be especially dangerous for a memory-based poker algorithm to try to play exploitively, so this approach will probably work best learning from poker players who are playing something close to optimal poker.

The authors created their program to play two-player limit Texas hold 'em and named it SARTRE. They selected two-player limit Texas hold 'em for three reasons, two obvious, one less so. First, it's the baseline game for most poker researchers approaching the idea of writing poker modeling software for the first time. Second, this choice of game allows poker researchers to measure their progress by playing against other programs, especially through direct competition via the Annual Computer Poker Competition. Third, because the hand histories from these competitions are available, those that come from the best players in previous competitions can be assembled to serve as the experience base for this particular program.

The poker algorithm chosen as the baseline for SARTRE's strategy is the 2008 CPC competition champion, the University of Alberta's "Hyperborean-eq". I suspect that this is probably the set of hands from the best poker player that the researchers had at their disposal.

Texas Hold'em

The second section of the paper covers the rules to the poker game played, two-person limit Texas hold 'em. Again, this information is necessary in the paper, but I won't go over it here since this audience doesn't need to be reminded of this information.

One thing that I will point out is that in this second section, second paragraph, fifth sentence the paper reads: "... in heads-up play both players need to play a lot more hands in order to be profitable."

What the authors mean is that successful heads-up players must play a larger percentage of the hands dealt than they would in a full ring game. They don't mean that a heads-up player needs to play more hands against an opponent to expect to make (or lose) the same amount money as they would in a full ring game. At least I hope that's what they mean.

SARTRE: System Overview

In the third section of this paper the authors describe how SARTRE was constructed. Again, in brief, the program compares the current situation it faces to those in its experience base and picks the one that best represents the current situation. One may ask, by what criteria is the present situation is compared to those in the experience base? These are:

  1. The betting pattern in the hand up until now.
  2. Current hand strength.
  3. Composition of the board.

In my opinion, this is a pretty good list.

The betting pattern is a list of check/call and bet/raise actions from each street for each betting round. Basically, this is represented as a path along a decision tree. This all seems reasonable enough. The authors measure the similarity of a betting path for the current hand to those in the experience base. If they're identical, they set the similarity factor to 1. If they're not identical, they set the similarity factor to 0. So, either the betting pattern for the present situation exactly matches one in the experience base, or it doesn't.

It's reasonable to consider implementation of an algorithm that allows for similarity rather than an exact match, especially if the program is at a point in a hand where an exact match to the corresponding betting pattern doesn't exist in the experience base. The authors understand this, but they elected to not implement it, at least not in this release. They don't say why they made this decision, or whether it was due to time constraints, the complexity of the issue, or problems that manifested themselves in an attempt to implement such a feature.

At this point let us note that position was not included on the list of criteria that defines a given poker situation. It doesn't have to be. Position is implicitly accounted for through criterion 1. While it's certainly the case that SARTRE's strategy does not explicitly vary based on position, if Hyperborean-eq is any good at all, it will have taken position into account when it played the hands from which SARTRE's experience base is derived. Since the betting pattern will always be different in situations where SARTRE starts in the small blind than when it starts in the big blind, these distinctions are already reflected in the data, and therefore they will be represented in the way that SARTRE plays.

As far as hand strength goes, the authors qualitatively mapped a given hand type into the familiar poker hand categories, i.e., pair, two pair, trips, etc.. As I've discussed in previous articles, despite this being the most obvious way to do it, it's simply not a good way of categorizing hands. Distinguishing between a bottom pair and top pair/best kicker is definitely going to be more significant than in differences between hands that rate a full house or better.

Despite the naive categorizations of made hands, SARTRE does consider a fairly sophisticated categorization of drawing hands, including straight, flush, overcards, and whether both hole cards are used to make the draw. It's not clear whether they consider nut straight or flush draws to be superior to other draws, but I'm surprised that they would categorize drawing hands this well yet still use such a weak categorization for made hands.

It doesn't take even a novice player very long to learn that board texture is important, and SARTRE does account for this. For example, in the paper they categorize three consecutive cards as a "straight possible" board. Of course, there are other board configurations where straights are possible, even though there are fewer combinations of cards that produce them. I'm not certain they take this into account.

As was mentioned earlier, the SARTRE experience base is based on the play of Hyperborean-eq from the previous year's CPC. This experience base consists of over one million cases. Of course, this is derived from much less than one million hands. Each poker hand can be broken down into one case per betting decision, with a minimum of one per hand and two per post-flop betting round.

When SARTRE encounters a situation, it examines all cases that exactly match all three criteria listed above: betting pattern, hand strength, and board composition. If more than one case matches these particular criteria, then a response is selected randomly from the matching cases, weighted by how frequently Hyperborean-eq chose that specific response.

Experimental Results

Once the researchers completed work on SARTRE, they tested it against two publicly available bots: Fell Omen 2 and BluffBot. Fell Omen 2 is a bot whose source code was publicly available, but its home web site seems to be offline these days. BluffBot is one of the opponents available to users of Poker Academy Pro software. Poker Academy Pro provides a software interface to allow users to interface a bot they've written with this software. Note, the SARTRE results in this paper are not against the "BluffBot" that was a competitor in several of the Computer Poker Competitions. That was written by a different developer.

SARTRE plays duplicate hands against Fell Omen 2, but this feature isn't available for the matches against BluffBot. The authors note that playing a duplicate match reduces variance in the outcome, because it prevents one player from getting dealt more fortunate situations than the other. This is true, but with a duplicate poker setup there's still a chance that one program will elect to play one or more suck-out hands that its opponent will pass on, and these could skew the results. Playing duplicate poker should reduce the variance, but it's not a complete substitute for very large sample sizes.

In any case, SARTRE gets beat by Fell Omen 2. It's pretty clear Fell Omen 2 is the better player, but check out the variance in the results in each trial, it's huge. I don't believe the trials are long enough for anyone to be truly confident at exactly what Fell Omen 2's expected win rate against SARTRE is, and I notice that the authors don't include a margin for error in that result.

Over 30,000 hands, SARTRE beat BluffBot. Again, I believe we can be confident with this qualitative result, but I don't know what margin for error should be assigned to SARTRE's win rate. Based on SARTRE's rate of loss to Fell Omen 2, the authors project that SARTRE would have been competitive at the 2008 CPC, finishing about the middle of the pack.

The authors compare SARTRE's results to those of the origin of its strategies, Hyperborean-eq. Nobody should be surprised that Hyperborean-eq performs better, as SARTRE is, essentially, an imperfect mimic of Hyperborean-eq's strategy.

One footnote to the results in the paper, SARTRE finished 3rd in the 2010 CPC heads-up limit Texas hold 'em division out of fifteen competitors. It lost out to champion PULPO and second place Hyperborean-TBR. I don't know what improvements have been made to it after this paper was written. I'd expect that it's experience base was updated, and I'd have to believe that at a bare minimum its hand strength evaluator was improved.

Conclusion

To some people, writing a program that does nothing but imperfectly attempt to ape a good poker player's strategy might not seem like a significant accomplishment. However, I believe there is still plenty to learn from this research project.

While mimicking a good opponent may seem like a trivial bot-building strategy, this covers a lot of the way human poker players improve their games. It's useful to see this methodology formally modeled. One thing that's worth noticing is that through the data provided in this paper and the 2008 CPC results, we can project how SARTRE might have performed against its paragon, Hyperborean-eq. It looks to me like Hyperborean-eq would have crushed it. This demonstrates that there can be a huge difference between "understanding" how to play good poker and trying to play the same way as such a player without having such an "understanding" oneself.

It would be interesting to hear what improvements were made to SARTRE before its participation in the 2010 CPC. I'd be interested to know if those areas of focus would be relevant to human players attempting to improve the quality of their games. Perhaps, though, it came through correcting some of the more obvious deficiencies in the version discussed in the paper, such as implementing a weighting for similar but not identical situations from the experience base and providing a better hand strength evaluation function.

Finally, I have to wonder how much better or worse such a program would do if instead of learning from another program, it could form its experience base from the hand histories of the best online heads-up limit hold 'em players? Perhaps at that level such players play so exploitively that a program with no opponent modeling would just have its strategy diluted by contradictory strategies. In any case, it would be fascinating to find out.