Super Crunchers

Super Crunchers by Ian Ayres

Book: Super Crunchers by Ian Ayres Read Free Book Online
Authors: Ian Ayres
Ads: Link
months would be pretty small, but for inaccurate predictions this probability might begin to balloon. A lot of people want to know whether they can really trust a regression prediction. If the prediction is imprecise (say because of poor or incomplete data), the regression itself will be the first one to tell you not to rely on it. When was the last time you heard a traditional expert tell you the precision of his or her estimate?
    And finally, the regression output tells Wal-Mart how precisely it was able to measure the impact of individual parts of the regression equation. Wal-Mart isn’t about to report the results of its regression formula. However, the regression output might tell Wal-Mart that applicants who think “there is room in every corporation for a non-conformist” are likely to work 2.8 months less than people who disagree. The prediction associated with the specific question is 2.8 fewer months, holding everything else about the applicant constant. The regression output can go even further and tell Wal-Mart the chance that “non-conformist” applicants will end up working
longer
. Depending on the accuracy of the 2.8-month prediction, this probability or a contrary influence might be 2 percent or 40 percent. The regression begins the process of validating itself. It tells you the impact of more rainfall on wine, and whether that particular influence is really valid.
    All the World’s a Mine
    Tera mining of customer records, airline prices, and inventories is peanuts compared to Google’s goal of organizing all the world’s information. Google reportedly has five petabytes of storage capacity. That’s a whopping 5,000 terabytes (or a quadrillion bytes). At first, it may not seem that a search engine really has much to do with data mining. Google makes a concordance of all the words used on the Internet and then if you search for “kumquat,” it simply sends you a list of all the web pages that use that word the most times. Yet Google uses all kinds of Super Crunching to help you find the kumquat pages you really want to see.
    Google has developed a Personalized Search feature that uses your past search history to further refine what you really have in mind. If Bill Gates and Martha Stewart both Google “blackberry,” Gates is more likely to see web pages about the email device at the top of his results list, while Stewart is more likely to see web pages about the fruit. Google is pushing this personalized data mining into almost every one of its features. Its new web accelerator dramatically speeds up access to the Internet—not by some breakthrough in hardware or software technology—but by predicting what you are going to want to read next. Google’s web accelerator is continually pre-picking web pages from the net. So while you’re reading the first page of an article, it’s already downloading pages two and three. And even before you fire up your browser tomorrow morning, simple data mining helps Google predict what sites you’re going to want to look at (hint: it’s probably the same sites that you look at most days).
    Yahoo! and Microsoft are desperately trying to play catch-up in this analytic competition. Google has deservedly become a verb. I’m frankly in awe of how it has improved my life. Nonetheless, we Internet users are fickle friends. The search engine that can best guess what we’re really looking for is likely to win the lion’s share of our traffic. If Microsoft or Yahoo! can figure out how to outcrunch Google, they will very quickly take its place. To the Super Crunching victor go the web traffic spoils.
    Guilt by Association
    The granddaddy of all of Google’s Super Crunching is its vaunted PageRank. Among all the web pages that include the word “kumquat,” Google will rank a page higher if it has more web pages that are linking to it. To Google, every link to a page is a kind of vote for

Similar Books

A Famine of Horses

P. F. Chisholm

The Redeeming

Tamara Leigh

Pack Investigator

Crissy Smith

The Death-Defying Pepper Roux

Geraldine McCaughrean