Why Our Numbers Are Always Wrong
Our data driven society requires hard numbers. We take those numbers, plug them into models to create solid plans and execute those plans with ruthless efficiency. If we do it right, things are supposed to go well.
The problem is that our numbers are fantasies, our models are broken and our budgets are farces. We all know it, try to make allowances for it and the game goes on because, quite frankly, it is the only one we know how to play.
Somewhere along the way we became enamoured by certainty and obsessed with precision in the hopes that, if we only built better tools, we could conquer complexity. That effort has failed miserably. There is, however, another way that was abandoned long ago. It has a rich history of solving the thorniest, most uncertain problems. It’s time we returned to it.
The Guessing Game
Sometime in the 1740’s, Thomas Bayes, a minister and amateur scholar, had a brilliant idea. He wrote it down, tucked it away and there it stayed until his death. His friend, Richard Price, found it among his papers, refined and published it in 1763. The theory was later augmented and formalized by Laplace, the greatest mathematician of the age.
The idea, inverse probability, built on Abraham de Moivre’s work on the the Doctrine of Chances, which provided rules for predicting future events on present information and became a hit with gamblers. Bayes wanted to reverse the process, to ascertain causes from events. Could we, through observation, determine why things occur?
His solution, was to start with a guess. Even if it was far off, there would still a quantified, working hypothesis that could be adjusted as new information came in. What it lacked in precision, it made up for in common sense and was invaluable in solving problems like hunting German subs in World War II to determining who wrote the Federalist Papers.
The idea was controversial even at the time of its inception. There’s just something about guessing that seems unscientific and unprofessional. It was only a matter of time before Bayes and his simple idea fell into disrepute.
Hard Numbers, Soft Facts
The man who would lead the charge against the Bayesian method was the brilliant and famously cantankerous Ronald A Fisher, who railed against the guessing game. He felt that science is only valuable when it is built on the solid edifice of clear data and established many of the methods that you find in standard statistics textbooks today.
The key to his approach was in the design of controlled experiments. He vociferously advocated large, randomized samples which could then be analyzed using the Gaussian Bell Curve. Sample data would be collected and the significance would be derived mathematically through the use of a confidence interval.
Because of his emphasis on samples, his method became known as the frequentist approach. Guessing would be replaced by cold, hard facts augmented by complex mathematics (Fisher pioneered the use of modern techniques such as the z test, the t statistic and chi-squared).
By the middle of the 20th century, this frequentist approach became standard. Controlled experiments would lead to scientifically verifiable conclusions that could be trusted and treated as fact. Or so it was hoped.
The Problem of Uncertainty
To understand the difference between the two approaches, imagine a basketball tryout where a free throw test is used to measure skill. Under the frequentist approach, a certain number of trials (say 100) would be required to establish confidence. Under the Bayesian method, confidence increases with each shot and you just take as many as you need.
The problem, of course, is that the world is an uncertain place no matter how many Greek letter equations you affix to a problem. It is extremely difficult, if not impossible, to create controlled experiments that match real life conditions. In fact, a recent study in the journal Nature found that a majority of cancer research studies could not be replicated.
If highly trained scientists working in controlled lab settings can get it so wrong, what does that say about the billions of dollars spent on market research every year, which are not nearly as tightly controlled or, to be frank, as transparent? What, for that matter, are we supposed to make of business planning based on market research?
The problem underlies the basic dilemma of frequentist statistics. We take studies which, if done properly (often a generous assumption), tell us that we can be 95% confident that a result falls within a certain variance, and then treat that conclusion as if it were forever settled, never to be questioned or returned to.
What is possibly worse is that the frequentist approach leaves us no avenue of taking an assertion that is clouded in uncertainty and making it more concrete over time, causing us to miss opportunities in the name of “sound evidence.”
Micromotives and Macrobehavior
He started, as you can see in the video below, by thinking about segregation. He posited what would happen if people want to live in mixed neighborhoods, but preferred not to be outnumbered by people of a different race. As they moved around to satisfy their seemingly reasonable preferences, they would end up with extreme segregation.
Schelling’s’ key insight was that because our decisions often affect the actions of others and theirs, in turn, affect ours, a small change in preferences can lead to large changes in behavior. Controlled experiments using independent variables fail to account for this kind of feedback loop.
The problem, of course, goes far beyond the makeup of neighborhoods. As I explained in a previous post, reliance on frequentist methods contributed to the recent financial crises and, as marketing practitioners increasingly rely on similar methods, we have ample reason for concern.
The world is a chaotic place, we need to account for the fact that anything can happen. Business planning based on the false certainty of “controlled” studies isn’t science, it is pseudoscience. We need to return to a more iterative, less certain model for strategy.
In the world of statistics, Bayes is making a comeback. Noted polling analyst Nate Silver, for one, is a strong advocate and many college textbooks are being revised to put greater emphasis on Bayesian inference. However, business strategy is still largely mired in misleading conclusions driven by confidence intervals.
We need to move to an approach which becomes less wrong over time rather than the present paradigm of false certainty. Merely extrapolating past data is not enough, we need to factor in new data as it becomes available. Bayes rule gives us a mathematically viable way to do that.
Another development is the increased use of agent based models, Markov Chain simulations and other types of sequential analysis that have been built on Schelling’s work. While these won’t bring us certainty, they will enable us to account for interdependence between variables, uncover new insights and manage a dynamic marketplace.
In the end, our numbers will always be wrong. It is our choice whether we want to blindly believe or continue to test and refine them.