What is Big Data and why is it actually kind of small?

In the era of Facebook and the all seeing eye of Google, techies and statisticians have made “Big Data” an actual thing. But is Big Data really that big? Or is a lot of this data still… small?

First, what exactly is “Big Data”? SAS defines Big Data as “a popular term used to describe the exponential growth and availability of data, both structured and unstructured.” But most of the growing and available data is usually centered around you, the user.

Big Data, thus far, has been used primarily to target advertisements to people through what essentially amounts to a gigantic relational database – connecting your name to an email address to your browsing history to what you buy. It’s been most noticeably found in Facebook sidebars and Google Adsense banners on blogs. But when I’m trying to find information on industry-specific market data around the globe, all of a sudden Big Data feels really small.

I want to know where labor is actually moving without having to use predictive economic models that may or may not be inaccurate. I want to know if governments are spending more money building (specific types of) cell phone towers or suspension bridges (and bridge data cannot be lumped under “roads” or “general transit”). Unfortunately, Big Data hasn’t fixed this problem yet because Big Data is about as narcissistic as the users it targets. Instead, Big Data is actually kind of small because it’s biggest achievements in growth has been information harvested about people’s interests, likes, and favorite coffee shops instead of the physical world around us: our production, output, and labor (the things we build and create).

It’s not like the data is completely missing or totally inaccessible. The World Bank Databank has a whole slew of fantastic information about, for example, roads – but this data was legally removed in March 2015 pending some licensing agreement with the International Road Federation. Things like this make Big Data smaller and smaller. But tech giants like Facebook and Google are quick to exponentially grow their databases with your personal information; this isn’t Big Data – this is just your data. And while road data is fairly thorough, there’s nothing in the database that distinguishes roads from bridges. This is important because bridges cost way more money than roads usually do.

There’s nothing groundbreaking about Big Data today because it’s essentially just a large dataset of information useful mostly to marketing agencies and, I suppose, the NSA. This is hardly information readily available to the average user. And the more you value Likes on your Instagram selfie, the more Big Data will be engineered for useless things as it is today.

Perhaps the only real headway it’s made has been in medicine. Big Data has been slowly making the scientific method obsolete. Instead of thinking of a hypothesis, testing its validity, and collecting data, the information is actually already there and ready to be compiled or conformed. With Big Data, we can cross reference lab results, medical records, and health apps to produce the most efficient ways to treat a problem.

What about other forms of engineering? Big data has also been used in “smart factories” with the manufacturing of, for instance, missiles – but this isn’t quite the Big Data that’s been described. This information is growing, but hardly available to anyone without security clearance or a bottomless wallet.

The United States Bureau of Labor Statistics has some fantastic information about American markets and labor mobility but some of this data is spotty and incomplete. Excel sheets downloaded from many of BLS’s pages are inconsistent and it becomes increasingly more difficult to just download one sheet with all the information you want. That also goes for data from the World Bank, OECD, and IMF – data is either incomplete or outright missing.

While too many of us see Big Data as an indicator of the future knocking on our front doorsteps, it’s not as big as I had hoped. Until more data is compiled more efficiently with completeness, it’s going to resemble the same excel sheets we’ve been sharing with each other for most of the past two decades.