What big data can teach us about our favorite sports.
Data analysis is a fact of life in pro and college sports. Teams have statistics gurus on the payroll, fans track every player movement, and corporate owners collect data on the optimal stadium experience. More than ever before, we know which variables matter in determining who wins and who loses. So what if it turns out that the games we love are far less elaborate than everyone makes them out to be? Could the scientific deconstruction of sports end up killing the religion of the game?
If that’s the case, Aaron Clauset could be the man to blame. Clauset, a computer science professor at the University of Colorado, recently analyzed every point scored in every game over a decade of college football, pro football, pro hockey, and pro basketball—more than 1.25 million scoring events across 40,000-plus games. Among his conclusions? “These games look a lot less complicated than most people think.”
This is what Clauset does for a living. As one of the young stars of the big data movement, he’s adept at finding simple patterns amid complicated chaos. With little background in biology, the 34-year-old devised a basic mathematical model that replicated the sizes of 4,002 land-mammal species from the last 2.5 million years. More notably, he parsed the data from the nearly 30,000 terrorist attacks worldwide since 1968 and found that they followed the same pattern as earthquakes. This discovery, which could help predict future attacks, became a chapter in Nate Silver’s best-seller The Signal and the Noise.
Clauset recently turned his attention to team sports, not because he’s a fan (“I’ve never really been a sports person,” he says), but because it seemed like a subject ripe for investigation. « I always took a dim view of data analysis of sports statistics,” he says. “A lot of it tends to focus on numbers about the players or about the teams, with uncertain relevance to game outcomes or game dynamics.”
With the help of then-PhD student Sears Merritt, Clauset worked to change that. Just as he did with mammals and terrorist attacks, the computer scientist found basic patterns in all that scoring data. As he detailed in a paper submitted to the Journal of Quantitative Analysis in Sports, Clauset discovered that scoring rhythms remained remarkably stable throughout hockey, football, and basketball games. At the beginning of a game or period, the scoring rate is relatively slow before rising to a plateau after that initial warm-up phase. At the end of a period, scoring spikes as the opportunity for future points, goals, or touchdowns wanes. This pattern might seem like a no-brainer, but to Clauset these stable tempos suggest that each scoring play is an independent process—that “there is very little correlation between one point and the next.” In other words, he found no evidence at all that hot hands or “momentum” exist in any of these sports. What you might think is a hot streak is just a random sequence of events.
Clauset and Merritt found another interesting pattern: While hockey and football teams tend to extend their leads, pro basketball squads play worse when they’re ahead. They’re not the first to notice this pattern. Jonah Berger, a professor at Wharton and the author of Contagious: Why Things Catch On, has argued that this phenomenon suggests losing teams are inspired to play harder. Berger tells me via email that we only see the pattern in basketball because “teams score frequently, and differences in motivation can easily impact whether a scoring event occurs. In hockey, and even football to some extent, scoring occurs less frequently and is more discrete. So even if players were more motivated it would be harder for that motivation to translate into additional scoring events.”
Clauset considers Berger’s “underdog inspiration” theory interesting, but as a data scientist, he wants to see the numbers. “How do you measure motivation?” he asks. “I don’t know.” Clauset thinks the NBA’s “restoring force”—the tendency for teams to lose their leads—might instead be due to player management. In basketball, he theorizes, coaches often pull their best players from the lineup when they’re in the lead, meaning they’re less likely to score. In football, by contrast, coaches rarely substitute in this manner, and there’s so much rotation in hockey that it’s more difficult to orchestrate when the best players shuffle in and out.
For Clauset, though, it wasn’t just about finding these patterns in team sport—it was about testing them. Factoring in these few basic findings, Clauset and Merritt developed a mathematical model that, after observing just a few scoring events, predicted game outcomes for college and pro football, the NHL, and the NBA with surprising accuracy. Their model proved more accurate than the simple metric of looking who was in the lead at a given time, and it outperformed SportsbookReview.com’s pregame betting odds while more or less matching the accuracy of the live-betting site Bovada. Impressive results, considering Clauset and Merritt spent just three months analyzing the data and coming up with their model.
Clauset believes his big-data discoveries could prove a boon to teams and oddsmakers alike. By looking at scoring patterns across these various sports, it’s possible that a manager could change his strategies to take advantage of the natural flow of the game. But his findings can also be seen as a bit of a downer. After all, Clauset shows that when it comes to scoring dynamics, football, hockey, and basketball are essentially the same game. In all of these sports, he writes, there’s a “strong focus on short-term maximization of scoring opportunities” and “no evidence of strategic planning across plays, as in games like chess or Go. Teams largely react to events as they occur.” Despite all of the time and money spent on developing strategies, perhaps our favorite games are a lot simpler than they appear.
Russell Carleton, a sabermetrician and regular contributor to Baseball Prospectus, concedes that Clauset’s work lends credence to the xkcd comic that suggests team sports are just fancy random number generators. But he’s not worried that big data is going to invalidate all the puzzling over on-field tactics and the other “little data” of sports. “If somebody came along and said, ‘I just need these three things, and my model is 97 percent accurate,’ I don’t see that as a specific threat, ” he says. “I would just write an article about the theoretical underpinnings of what is going on there, and I would try to split the atom from there. It’s kind of like a fractal in that way. If you zoom down another level, there is more complexity, and if you zoom down another level after that, there is even more complexity.”
Clauset agrees that there are incredibly complicated forces underpinning his seemingly simple findings. “These teams are working so hard to beat everyone else,” he says. “But in the end, it’s like the Red Queen in Alice in Wonderland—you have to run as hard as you can just to stand still. Only by working so hard and figuring out these strategies can you achieve this system that is like a random coin flip.”
That, to Clauset, is good news for the fans. It means these games are inherently balanced, that all teams have more or less the same advantages. What victories hinge on are the rare chance events—the tragic mistakes or lucky breaks, the stuff that gets stadiums and arenas cheering. “From a fan perspective, that’s the most exciting kind of system,” says Clauset. Or so he thinks, since he’s not a fan himself.
By Joel Warner
Source: slate.com