Friday, February 18, 2011

A Basketball Jones and a Binomial Distribution

85.6% of all people make up their own statistics

I have a basketball jones— an addiction to hoops.  I’ve had it since I was ten.

I got it from dribbling a basketball in the dirt and shooting at a rim with no net. 

I still have it. I’m Tyrone Shoelaces with an AARP card.

I also have a statistics jones. I’ve seen mental health professionals about it. They couldn't explain why someone would be interested in statistics and, completely flummoxed, they finally suggested I just keep it a secret and learn to live with it.

A lot of people are fascinated with numbers apparently, and not just people who grasp them. Take the FaceBook conversation back in January about the excitement over the date 1/11/11.

Someone noted that if you take your age and add it to the year you were born and disregard the first two digits, no matter what your age, it always adds up to 11.

Several people immediately commented, “Wow! That’s cool!”.

“Hold on,” I objected. “You’re amazed that in 2011 you can add your age to the year you were born and it totals 2011? Are you going to be excited again next year when they add up to 2012?”

The most perplexing comment, however, came from a guy who insisted, “It doesn’t work for my age. I get 10.”

Maybe he was born in the wrong year.

Ever notice how, around the end of February, a lot of college basketball fans wish they had paid more attention in statistics class?

Ken Pomeroy, a well-known basketball statistician, maintains an enormous amount of statistical information on NCAA Division I basketball teams at He projects winners for every game, and overall conference records for every team based on cumulative probabilities.

The Kentucky Sports Report tweeted yesterday that UK has an 80% to 90% chance of winning each of its next six conference games, based on Pomeroy’s projections. I’m sure that to most Kentucky Wildcat fans, that means their team should handily win their next six games. But, how likely is that, really?

If a team plays six games with a 90% chance of winning each individual game, the probability of winning all six of those games is nowhere near 90%. You may recall that the outcomes are binomially distributed, if you happened to stay awake in statistics class (no easy task, I admit), and the probability of winning all six games is 0.96, or only about 53%.

If your team has an 80% chance of winning each of 6 individual games, the probability of winning all six is 0.86, or only about 26%.

So, a Wildcat fan hears 80% to 90% chance of winning each of the next six games and thinks, “We’ll win ‘em all”, while a statistician looks at the same data and says, “you have about a 40% chance of winning them all.”

I would bet that Wildcat fans are disappointed more often than statisticians, but I don't have any statistics to prove that. 

If these probabilities seem low, think about another binomial distribution, tossing a fair coin.  There is a 50% chance that you will toss heads (or tails) on each individual coin toss, but the probability of tossing heads (or tails) six times in a row is a mere one in 64, or about 1.5%.  These are the odds that a team faces with a 50% probability of winning each game.

The nature of cumulative probabilities is such that even small probabilities of losing individual games "accumulate" to actual losses over a lot of games.

What constitutes a "lot" of games? Not as many as you might think.

Check out the chart below.  It shows the probability of winning all games (running the table) with 1 to 6 games remaining, when the probability of winning each individual game is 90% (blue), 80% (red), or 60% (yellow).

Long winning streaks are statistically challenging. And if a long winning streak with an 80% chance of winning each game is challenging (26%), it looks darned near impossible as the individual game winning probability tends toward 50%.

A team with six games remaining and an 80% chance of winning each game, as I have said, has only a 26% chance of winning all six. But with the same odds and only three games remaining, that team still has only a 51% probability of winning all three. So, the probability of winning all remaining games that a team is expected to win increases significantly as the season draws to a close, but probably not as fast as you think.

So, what am I saying? That you should’ve stayed awake in statistics class if you wanted to enjoy college basketball?

Nah, I have to look up that stuff every time I use it, too, or use an online calculator once I figure out the correct distribution. I’m just trying to point out how difficult it is to win several games in a row, no matter how likely you are to win each individual game.

There are so few perfect seasons in college basketball, even in years with a dominant team, because there are so many games.  It isn't like football. Play enough games and the small probabilities catch up with you. Ohio State won 24 straight games before losing on Saturday.

That has some serious implications for the NCAA tournament, too, and the recently discussed expansion of the field.

I don’t believe most fans understand how difficult it is to win a 6-game national championship tournament.

The NCAA championship will be won this year by the only team that puts together a 6-game winning streak. Until 1950, the national championship field consisted of just 8 teams, so the champion had to win 3 games in a row. Back in the day, when UCLA won championship after championship, there were only 22 to 32 teams in the NCAA tournament. Each team had to win five games. The champion now has to win six games in a row with a field of 64, ignoring the play-in game.

A strongly favored team, one with a 90% chance of winning each individual game, had a 73% chance of winning a 1940’s tournament with three consecutive wins but only a 59% chance of winning 5 in a row in the 70’s. The same team’s chances of going on a 6-game win streak drop to 53%. Expand the tournament to seven games and it will drop to 48%.

That’s my biggest beef with the talk of expanding the field, but I’m a UK fan and we’re often favored. Had I attended a college that has never won a championship, I’d support the expansion. It expands the field and the probability that the strongest team won’t win.

Your preference would depend on whether you want the NCAA tournament winner to be considered the best team in the country, as most people seem to consider it now, or you want a tournament that lots of teams might win. Larger fields favor the latter.

Before the Vandy game, UK was projected to win all seven of its remaining conference games, but statisticians knew they probably wouldn’t. And they didn’t. They lost their next game in a close one to Vandy.

And that’s what makes basketball fun. No matter what your chances of winning, you still have to play the game. And you can lose a game that you have a 95% probability of winning.

And once in a blue moon, and maybe even once ever, a 6-seed like Jim Valvano’s 1983 NC State team can toss heads six times in a row and win a national championship.

PS Writing a column that includes lots of statistics is challenging, too, but I'm 92% certain that I calculated them correctly.

No comments:

Post a Comment

Search This Blog