Wednesday, 10 October 2018

How to Train a Computer to Bluff

poker game
The world's top Go player, Ke Jie, couldn’t get past Google’s AlphaGo AI. Image credits

In 1996, IBM’s Deep Blue became the first computer to beat a reigning world chess champion. In 2007, after almost 20 years of computation, Checkers was solved. In 2011, IBM’s Watson won in Jeopardy! In 2016, Google’s AlphaGo triumphed over the then world Go champion, only to repeat its success later against a player considered the best at the game.

During the past several decades, the field of artificial intelligence (AI) research had many “firsts,” each of which more impressive than the last one. In 2017, Noam Brown and Tuomas Sandholm from the Carnegie Mellon University (CMU) achieved the latest remarkable feat: they programmed an AI called Libratus who won a No-Limit Texas Hold ‘em heads-up tournament against four poker pros. The scientists’ work was outlined in a series of papers. Here’s a summary.

Libratus: To Err is Human and Makes Computers Smarter

poker game
Noam Brown (L) and one of Libratus’ opponents (R), poker player Daniel McAulay. Image credits

After Libratus’ triumph, the Computer Science Department at CMU published a paper titled “Endgame Solving in Large Imperfect-Information Games.” The paper, co-authored by Sandholm, starts with “The leading approach for computing strong game-theoretic strategies in large imperfect-information games is to first solve an abstracted version of the game offline, then perform a table lookup during gameplay.”

The first algorithm (called “counterfactual regret minimization”) that scientists “fed” Libratus allowed it to create a “blueprint” of possible events a poker player will face during play. But because “a popular variant of two-player no-limit Texas Hold ‘em has about 10,165 nodes [of possible outcomes],” the team behind Libratus bundled similar events, thus downsizing the range of possibilities to 1,012. For example, the initial Libratus blueprint of the game didn’t consider a king flush or a queen flush to be separate events that require an independent strategy. That allowed the AI to get an underlying sense of the game upon which to build through the other two algorithms.

The second-tier algorithm went into more detail. By using “nested subgame solving” (NSS), Libratus treated each round of poker as a subgame, part of the bigger game, fine-tuning its immediate play based on the blueprint and the opponent’s decisions. And while NSS has been successfully applied to AIs that have “conquered” complete information games, such as chess and Go, using this algorithm in poker required some tweaking.

The duo behind Libratus published a paper titled “Safe and Nested Subgame Solving for Imperfect-Information Games,” which detailed how they implemented NSS, building upon existing models and adapting them to the poker game. To get a more flexible gameplay out of Libratus, the researchers kept the trunk of the “game tree” (the initial stages of the game) from the first algorithm and combined it with the second algorithm, which went into more detail to solve the “endgame” (the last stage of the game). This way, the scientists taught Libratus to consider the best way of play for each round but to also keep in mind the long-term strategy. Thus, the AI consisted of both the initially generated blueprint and the developing actual gameplay. A significant implication of Brown and Sandholm’s work is that they “show that subgame solving can be repeated as the game progresses down the line, leading to significantly lower exploitability.”

While each of Libratus’ three algorithms worked in sync, the last one is arguably the most exciting part and one that kept the AI “awake” at night. The first two algorithms allowed Libratus to construct models and continually refine them based on how the game was progressing. But the third algorithm, employing machine learning, instructed Libratus to go over each event from the past day and learn from its mistakes.

The AI did so by analyzing how its human opponents managed to outplay it. “Typically, researchers develop algorithms that try to exploit the opponent’s weaknesses. In contrast, here the daily improvement is about algorithmically fixing holes in our own strategy,” said Sandholm. That makes Libratus invincible — not only is it able to calculate probabilities way beyond the scope of a human brain but also, learns from its mistakes. It was a fact that would be proved once again with Libratus’ successor a couple of months later.

Lengpudashi: The Computer That Bluffs

poker game
Entrepreneur and poker player “Yue” Du, who led the team against Libratus’ successor, was the first Chinese to win a WSOP bracelet. Image credits: PokerNews

Libratus proved that computers are no longer the linear machines that require continuous human overview to do their job. The poker-playing AI managed to perfect its operations after only receiving a limited set of data. And because in the field of AI one breakthrough quickly leads to another, a couple of months after the successful run against poker pros, the team behind Libratus came up with an updated version named Lengpudashi.

Lengpudashi is a refined version of an AI that already proved to be unbeatable by humans, so the players that were supposed to claim a victory for humans also received some perks. This time, the machine was facing a single team of six people — a mix of venture capitalists and computer scientists (including Brown and Sandholm) led by entrepreneur and poker player Alan “Yue” Du.

The approach to the game was also different: instead of trying to beat the AI in poker, the team set out to defeat Lengpudashi on the scientific field by attempting to exploit its vulnerabilities. Despite the combined brainpower of six able minds, Lengpudashi landed a crushing defeat, once again proving that computers are undisputed champions of mind games.

What’s more, Lengpudashi bluffed, prompting Brown to point out that “people have a misunderstanding of what computers and people are each capable of doing well. A computer can learn from experience that if it has a weak hand and it bluffs, it can make more money.” An equally impressive detail is that as per the researchers’ words, Lengpudashi learned to bluff not by studying other players but “comput[ing] from just the rules of the game.” In other words, Lengpudashi figured out that even if it didn’t know the decision to make, acting as if it did could help it win even more money by causing its opponents to fold.

What does it all mean?

poker game
From real-time translation to robotic surgeons, AI is reshaping every industry. Image credits

For starters, it does not mean that computers will take over poker rooms and chess boards, at least for now. However, scientists are using recreational games as a field for testing AI’s capabilities due to their complexity and reliance on various capabilities of the human brain believed to be unachievable by other species let alone a machine.

Making chess, poker, Go or Jeopardy “unfun” for humans is not the focus of research. Instead, computers are now widely used by players to hone their skills in these games. Uncountable are the other areas where AI prove useful. “Many real-world applications can be modeled as imperfect-information games, such as negotiations, business strategy, security interactions and auctions,” pointed out Brown and Sandholm in their paper titled “Libratus: The Superhuman AI for No-Limit Poker.”

Granted, AI is still quite resource-intensive, meaning that only a handful of entities can drive research further. For example, during its 20-day poker streak, Libratus used 19 million core computational hours (CCH) from 600 nodes part of Pittsburgh Supercomputing Center’s “Bridges” supercomputer. That’s a considerable load, especially compared to Libratus’ predecessor, the poker bot Claudicio, which used 2-3 million CCH. But this steep increase in resource demands is a testament to the long way that AI research is coming since its infancy only a couple of decades ago.


Post a Comment