It’s The Math, Stupid
Are you data driven? Do you live by the numbers? If you are, then you’re probably wasting an enormous amount of time and energy. Even worse, you’re probably getting a lot wrong.
In the neverending quest for substantiation and certainty in corporate life, we have used numbers as a panacea. All too often, more numbers are considered better and fewer are considered worse. This is, for lack of a better term, really stupid.
In truth, what we really need is fewer numbers and a whole lot more math. Math is what the ancients invented when they ran out of fingers and toes, because they realized that they needed to start thinking about abstract relationships in order to advance. The good news is that math is much simpler than numbers, more elegant and more likely to be right.
The Tyranny of Averages
Imagine that you’re standing in a room with a group of people. In the US, most incomes are about $50,000. Then Bill Gates walks in. If you calculate the average in the way your grade school teacher showed you, you will probably conclude that the average income in the room went up by millions. But no one is really any richer, so what gives?
Average is a word that gets thrown around a lot, but most people don’t know what it really means. In school, we were taught to calculate an arithmetic mean by adding up all of the numbers in a set and then dividing by the number of entities. However, the term is often used to denote a median, which is a “middle value.” These are often very different.
A mean can easily be thrown off by extreme values if the data is skewed. A median is what mathematicians call a robust statistic. It doesn’t move around much even when there are extreme values, because it merely sets the point at which 50% of the data points are below and 50% are above. To see what I mean, look at the chart below:
A few unusual values can ruin the whole concept of an average as a central tendency by moving the mean far away from the most common value (i.e. the mode). That’s why most statistics that we see reported are actually medians.
There is a special case in which the mean, median and mode are all equal and it is known as a “normal” or Gaussian distribution (after Carl Friedrich Gauss, one of the first people to use it effectively), but is often called a “bell curve.”. It looks like this:
We see these types of curves when data are randomly distributed. Although true randomness is relatively rare, statisticians often assume errors in data are random (for reasons that will become clear soon) and therefore “average out.”
Accounting For Deviance
In my blissful youth, many people considered me to be a deviant (and to some extent still do), meaning that I very rarely did what was expected of me. In any data set, you can expect to find entities that are a lot like me, ones that refuse to conform to the average. Mathematicians have a way to account for this called standard deviation.
There’s a complicated formula for it, but you can find it pretty easily by simply subtracting each value from the mean, squaring each result (to get rid of negative numbers) and then averaging (i.e. mean) to arrive at the variance. After that, you can just take the square root to arrive at standard deviation.
The chart above shows how useful this value is. If we can assume that the data is normally distributed (meaning errors are random and therefore average out) then roughly 68% of values will fall within 1 standard deviation, 95% within 2 standard deviations and 99% within 3 standard deviations.
It is from this concept that we get the idea of standard error, because we can predict exactly how many values will fall outside a certain confidence interval. For instance, we can be 95% confident that any particular value will fall within two standard deviations; 99% confident that a value will fall within three and so on.
So if the variation within that area of the graph is something we can live with, we say it’s statistically significant. Again, there are some complicated formulas that they use to torture kids in school, but as usual there is an easier way, you can merely divide 1 by the square root of the sample size like this:
So for a sample size of 100, you can expect the total error to be 10% or +/- 5%. For some reason, that’s what many people consider to be the minimal “proper” sample size, but it’s not really true. You can decide for yourself how much error you’re willing to live with.
Progress and Regress
There was a reason why people spent some much time figuring all this stuff out. Guys like Gauss were trying to understand how planets and other celestial bodies moved around, but they knew that their measurements weren’t very good. The data was usually messy and looked like this:
You can imagine how frustrating it was to try to delve into the mysteries of the universe with messy data, so Gauss came up with a workaround. If he simply assumed that the errors were random, then they would be normally distributed and all of the stuff about standard deviations would apply.
So using the same concepts, he developed the method of least squares, in which he would find the line that would have smallest squared residuals (i.e. the amount of error), which effectively minimizes the variance. He could then even tell how good his line fit the data by calculating the R-squared value. This is now known as a regression analysis.
There is a little twist here, because many people think that R-squared and correlation are the same thing. In fact, they’re not, (correlation is the amount of change in one value you can expect from a change in another) but by a strange quirk, correlation is “r” so you can get to correlation pretty easily from R-squared as long as the model is linear.
Unfortunately, many things don’t follow a straight line, but curve. If they do, we can still use the method of least squares to “fit” a model. However, there is no such thing as “non-linear correlation.” Some people unfortunately use that term , but they are profoundly mistaken about the basic concepts of data analysis.
One last problem is overfitting, where people make the model curve around just to get a good fit (i.e. a high R-squared value). This is probably the best example of people losing the math in the numbers. Every model should tell a clear story and, if your story is too complicated, chances are your model is wrong no matter how well the numbers work out.
Always, always, use the simplest model that fits the data.
When Chaos Erupts
You might have noticed by this point that I’ve used the word “assume” a lot. More specifically, everything we’ve discussed to this point assumes that data is random, meaning that there is no interaction between entities and therefore no feedback.
But what if that assumption isn’t true? What if some people simply liked Justin Bieber because other people like Justin Bieber and that convinced even more people to like him as well? Or what if people tended to buy stock in companies when they were going up, but would sell them when they went down?
The chart above shows what happens, we end up with far more extreme values (also known as outliers) than conventional models would predict. So, for instance, if financial traders were evaluating risk based on the random assumptions of normal distributions, they would be far undercounting volatility and could cause a lot of damage.
And that’s the problem with numbers. They tell us a lot about normal situations, but very little about extreme ones. After all, it’s the outliers that are really interesting. We’d much rather hear about Justin Bieber than the “average” teenager singing in the shower, just like we are fascinated by companies like Apple, but most firms bore us to death.
Beauty in Patterns
We live in a technological age where computers juggle numbers at the speed of light, far greater than the relatively feeble 200 MPH that our brains tend to work at. They spit out numbers far faster than we can figure out what to do with them. Over-quantification is the chronic disease of the digital age.
We humans do have a secret weapon though. We recognize patterns very well, far better than computers can (at least for the next decade or two anyway). The great mathematician G.H. Hardy put it this way:
A mathematician, like a painter or poet, is a maker of patterns. If his patterns are more permanent than theirs, it is because they are made with ideas.
So for all the confusion about numbers, math is pretty straightforward. You look for important patterns that tell a good story and you keep that story as simple as possible. We should always strive to explain the maximum amount of variables in the fewest possible statements. That is what is meant by mathematical beauty and elegance.
Numbers often lie. Math never does.