April 4, 2014

Why Big Data Analysis Can’t Predict March Madness

Written by

Big Data is a term on the lips of companies everywhere these days. It’s even making its way into popular culture. Big Data applications have been referenced and used in advertising, political campaigns, entertainment, and sports. Data Scientists, Strategists, and Engineers are finding their job prospects quite healthy in nearly every industry sector and business size, from small Web startups to multinational enterprises.

From its famous application to the Oakland Athletics’ roster to March Madness 2014, Big Data analysis is here to stay in every industry, but particularly in sports. Sports generate millions of statistics each season. People care about this data, for everything from fantasy football to Vegas odds-making.

Big Data Applications


Statistician and Big Data strategist Nate Silver, who famously and successfully predicted the outcome of the 2012 United States Presidential election in every state, decided to pit his statistical prowess against every matchup in the 2014 March Madness field. Silver’s extremely complex stacked model, developed over the last four years, is based on what is known as predictive analytics.

This type of analysis uses data to understand patterns that can be used to determine probabilities and predict future outcomes. It’s known as a “stacked” model because it’s a conglomerate of five separately weighted, computer-generated power rankings developed by Ken Pomeroy, Jeff Sagarin, Sonny Moore, Joel Sokol, and ESPN. Silver stacks these five models with the Associated Press preseason rankings, the tournament’s seeding, and other factors including travel distance and injuries. Here’s his initial picks compared to the current tournament results:


Silver’s Picks

Big Data Applications

Click to open a larger version in a new window

Current Bracket Standings

Big Data Applications

Click to open a larger version in a new window

Silver initially predicted Florida, Louisville, and Arizona all had the best chances to win, but only one of his top three picks has made it to the Final Four, giving him an accuracy of just 33 percent. His next three picks were Virginia, Michigan State, and Kansas, which also didn’t pan out that well. Their dances ended in the Sweet Sixteen, Elite Eight, and round of 32, respectively.

But while Silver’s model may have not have performed as well as some members of your office pool, it’s important to remember that the odds of picking every March Madness game correctly are 1:9,223,372,036,854,775,808, or 1 in 9.2 quintillion.

Even the best predictive analysis struggles with such odds.

Warren Buffett’s billion dollars was essentially safe before the second day of games was even complete, as only 16 potential perfect brackets remained eligible. Silver, who has continued to update his model as the games are played, now has 38 percent confidence that Florida will win out, with Wisconsin, Kentucky, and Connecticut following at 31, 19, and 11 percent, respectively. As we go into Final Four weekend, it will be interesting to see whether the odds line up with the results.

UPDATE: Following the Final Four and the 2014 NCAA Men’s Basketball Championship, we took another look at Silver’s predictions. Interestingly enough, both of Silver’s top picks lost in the Final Four, and the team he thought was least likely to win the championship, Connecticut, handily defeated Kentucky, the second least likely team. This result just goes to show that while Big Data analysis is a powerful tool, it has limits.

While Big Data analysis may not have helped Nate Silver’s bracket, the continued input of data into his predictive model will only enhance his ability to make future predictions. Companies interested in how business intelligence software can uncover profit/loss opportunities through predictive analytic models should take note. The more data, the better. So don’t wait too long to see how Big Data analysis can help you achieve your operational goals.

Find the best business intelligence software for your company with the TechnologyAdvice product selection tool.

photo credit: ChadCooperPhotos via photopin cc