The Downside Of Data
Today, we find ourselves in the midst of a data revolution so vast, pervasive—and young—that it’s hard to take it all in. It is likely to lead to a transformation no less consequential than the Industrial Revolution, creating new wealth, prosperity and convenience on a truly massive scale.
Yet like any major change, data is often misunderstood and that can lead to serious problems. Just as the Industrial Revolution led many to devalue basic precepts of humanity—an error which led to enormous social strife—the data revolution is leading some to abandon common sense in the name of expediency.
It’s relatively easy to collect information, aggregate it and apply algorithms to derive insights, but much harder to understand where the data comes from, what type of analysis is being applied and what types of error are involved. Insights, even when powered by impressive technology, never come easy. To get full value from data, we must understand its downside.
The Data Revolution Explained
One reason this new era of data is hard to fully grasp is because it looks so much like what we already know. After all, for generations we have used data for analysis and applied those insights to make decisions. Notions like “you manage what you measure” and being “data driven” are not products of the digital age, but came long before.
Yet data today are different for several reasons. First is the Internet of Things, which can collect information automatically from a vast array of sensors embedded in just about anything you can think of. Second are new protocols, like Hadoop and Spark, which allow us to aggregate that data and analyze it continuously from thousands of servers strewn across the world.
Finally, with virtually unlimited computing power and increasingly potent artificial intelligence systems like IBM’s Watson, we are able to analyze patterns from even unstructured data, like text, voice and even video. So our technology is now connected to the physical world and able to understand it in a way that would have seemed like science fiction even a decade ago.
This creates value for business in a myriad number of ways. For example, UPS has used sensors and data to tell it when its trucks need maintenance, saving the company millions. In one often repeated story, Target used purchasing data to figure out that a teenage girl was pregnant—and send her special offers—even before her own father knew about it.
Data’s Achilles Heel
These new data capabilities are impressive and real. However, as Zeynep Ton explains in The Good Jobs Strategy, they don’t tell the whole story. Even the most powerful systems require human input and judgment, which means that a purely technological approach is sure to go awry.
Let’s take another look at the Target example, where purchasing data was used to a predict buyers’ intent with amazing accuracy. However, for the data to be accurate and useful, cashiers need to ring up products with the right codes, shelves need to be stocked with the right products and salespeople in the store have to be able to help customers find them.
Yet in one study it was found that 65% of a retailer’s inventory data was inaccurate. In other cases, products are in the store, but not where they are supposed to be. Often, these discrepancies lead to inaccurate assessment of product demand, because customers are looking for products that they can’t find. The world can be a very messy place.
With such sophisticated systems, it seems unbelievable that these kinds of errors would be so pervasive. However, when you treat people as mere data points, pay them poorly, neglect their working conditions and cut back on training to save money, data quality suffers. Is it any wonder that overworked, poorly treated, ill-trained employees make mistakes?
Cognitive scientists call this problem the availability bias and it is data’s achilles heel. We tend to give more weight to information that is readily available—such as numbers floating across a computer screen back at headquarters—and less to that which is harder to come by, like the day-to-day realities involved in everyday store operations.
The Replication Crisis
It’s easy to see how retail could run into data problems, where people are poorly educated and often work for minimum wage. High turnover rates and questionable management practices only exacerbate the situation. However, the problem is not confined to low-cost, low wage retail, but inflicts even the most highly educated, most motivated professionals.
Consider this: In 2012, the prestigious journal Nature reported that a large majority of cancer studies can’t be replicated. In 2010, two Harvard economists published a working paper which warned that US debt was approaching a critical level. As it turned out, they had made a simple Excel error. These are just two examples of a disturbingly common problem.
Many are calling this a replication crisis. Duncan Watts, a Principal Researcher at Microsoft says “Journals are heavily biased toward novel findings. There’s very little incentive for a scientist to replicate another scientist’s work. It’s hard to get such work funded and it’s hard to publish it. It’s almost unheard of to build a career on checking other people’s work.”
These errors have important consequences. Invalid medical studies cost lives. The inaccurate data on public debt consumed public discourse and distorted policy. If our most highly trained and educated professionals can make these kinds of mistakes on issues of the highest importance, which data can we trust?
Data Divorced From Context
Another major trend is the use of data in journalism in which a new breed of reporters, rather than pounding a physical beat and cultivating human sources, immerse themselves in statistical data and policy papers. Sites like FiveThirtyEight and Vox have risen to the fore and have had an outsized impact on public discussion.
Yet data is not a panacea and true understanding requires real world expertise. As I pointed out in an earlier post about data journalism, you need to know what your looking at. This particularly struck me in Vox’s early reporting of the Ukraine crisis, in which it used census and polling data to show that Ukraine was a country divided by language and culture.
Ukraine is a bilingual country. Electoral posters are in both languages. Candidates switch from one language to another on political talk shows. The giant banners on government buildings that read “One Country” are in both languages. If you watch a soccer game on television you might notice that the man doing the play by play speaks Ukrainian while the man doing color speaks Russian: almost all Ukrainians understand both and most speak both. If you go to a coffee shop you might find a polite waitress who adjusts to the language she thinks you speak best. No country in Europe is more cosmopolitan than Ukraine in this respect.
Much of this would be immediately apparent to a reasonably observant tourist, but still eluded Vox’s foreign policy analyst as well as many others. The problem is that data divorced from context can be misleading. It only gives you part of what is often a very complex picture. Without real world experience, it’s hard to know what’s relevant and what isn’t.
We Can’t Outsource Our Brains To The Cloud
Marshal McLuhan, in the subtitle to his classic, Understanding Media, called media “extensions of man.” He meant that, while we can only see and hear so far, electronic media allows us to extend our senses across the globe, much like a light bulb—which he also considered a medium—extends our eyes into the dark.
Much the same can be said about data, which some have described as, “the plural of anecdote,” because it aggregates otherwise banal experiences into a quantifiable entity. Infused with technology, these experiences take on a new life, first as ones and zeros, then as columns and rows in a database and finally in charts and tables.
Yet with all the wonderful tools that we now have to capture, store and analyze data, we often forget that we need to do our part. The great gift of the digital age is that we can now effectively collaborate with immensely powerful machines and extend our human faculties farther than McLuhan had ever imagined. Still, all that is for naught if we cut ourselves out of the process.
And that’s the downside of data. It makes it all too easy to outsource our brains to the cloud, instead of extending ourselves out into the world. As the Richard Feynman put it, “The first principle is that you must not fool yourself and you are the easiest person to fool.”