Can Nate Silver Be Open Sourced?



Nate Silver is yet another example of data reinventing the world we live in.


The day before the presidential election, Silver’s FiveThirtyEight blog drove 20 percent of the traffic to the New York Times website, according to The New Republic. Some said the methods of this new-age political forecaster were bunk, but people certainly paid attention. And in the end, he was right, predicting the outcome of the presidential race in all 50 states using hard data rather gut feel.


In 2008, he was nearly as successful, predicting 49 out of 50 states.


No doubt, some will continue to badmouth his methods. The 34-year-old has tested his model on only two presidential elections, and he says only so much about how the model works. What we really need is an open source version of Silver’s methods. As Zeynep Tufekci points out in her opinion piece on Silver, this would allow for peer review and eliminate so much of the controversy around his predictions. It would also let so many others benefit from his methods — not only in the political world but perhaps other areas as well.


It’s understandable that Silver and The Times want to keep the methodology under wraps. Silver’s work is driving valuable traffic to The Times‘ website, and if he reveals his methods, the site loses a competitive advantage. In the end, peer-review isn’t all that important to The Times. But the peer review problem only gets bigger as publications start to imitate The Times, as they surely will. We’ll have all sorts of secret algorithms competing against each other — and no one will quite know whom to trust.


With Silver and unwilling to reveal the details, the question is whether we could build our own open source version of his methods. Ideally, this would indeed be software that anyone could use. But most importantly, it would allow anyone to review the algorithms.


According to Anthony Goldbloom — the CEO and founder of Kaggle, a San Francisco outfit that seeks to solve data problem by running contests among some of world’s top data scientists — Silver’s methods are pretty sophisticated. Silver collects public poll data, weighs it by historical reliability, and makes various other adjustments based on factors such as momentum and incumbency status. He then combines this data in a regression model and uses the model to simulate 100,000 fake elections, all with an eye towards determining the probability the each candidate will win.


The trouble is Silver doesn’t reveal how he weights different polls. This was a point of contention among conservative commentators who thought Silver’s political biases may factor into the weighting.


But his methods aren’t immune to reverse engineering. After all, Silver wasn’t the only quant with freakishly accurate predictions. Princeton University’s Sam Wang and Davidson College’s Josh Putnam proved pretty prescient as well. And we imagine that the data scientist community overseen by Kaggle would have a field day playing with political data.


But again, the idea is not to tie these methods to any one individual. What we really need is Nate Silver software. Earlier this year, Wired looked at Narrative Science, a company that makes software that can write news stories without human intervention. But what would be far more useful is software that could make humans better reporters — i.e. make them more like Nate Silver.


The market for this type of software would extend far beyond news publications. As Wired reported earlier this week, the Obama and Romney campaigns relied heavily on data analysis this election cycle. If the methods used by the campaign quants could be turned into software — open source or not — it could serve candidates across the political landscape, taking a lot of the fat out of campaigns and maybe even saving the world some cash. It wouldn’t be the first the first purpose-built analytics tool, look at exPOS, a business analytics system built specifically for restaurants.


And why stop at elections? Silver began with baseball before moving into the political game. There are so many places where the Moneyball ethos has yet to take hold. How about a Nate Silver for the data center game? Clearly, no one quite knows what’s going on there.


What Silver has done — at least to a certain extent — is take the guessing out political punditry. Dick Morris’ prediction that Romney would win by landslide looks bad — but it looks even worse beside Silver’s success rate. It’s too early to say whether data driven analysis will replace traditional punditry or merely supplement it. It’s certainly faster to make off the cuff predictions rather than wait for the results to come in and the public may still demand this type of analysis, but like or not, Nate Silver effect is very real.


How nice it would be to then shine a light on the army of Dick Morrises practicing in so many other areas of the news world. You can do that with data. And if you share your methods, the light is that much brighter.


You're reading an article about
Can Nate Silver Be Open Sourced?
This article
Can Nate Silver Be Open Sourced?
can be opened in url
http://itchynews.blogspot.com/2012/11/can-nate-silver-be-open-sourced.html
Can Nate Silver Be Open Sourced?