Most discussions of big data focus on how to find the nuggets of valuable information contained in a company’s storage media. If you just apply the right algorithm with the right equipment, you’ll get a return on your investment—right? In some cases, yes, but will your journey into this realm yield big data value, or will it just result in big data hoarding?
The flood of data is growing: from increasingly high-resolution video (along with cameras everywhere) to various gadgets that record all manner of information, companies and consumers are awash in bits. And with the Internet of Things on the horizon, the problem is set to become much worse. Imagine every appliance, electronic device and electrical outlet in your home—along with many other items—fitted with a processor to enable measurements and intercommunication with each other and with the Internet. Now imagine all the additional data this scenario entails.
An EMC-sponsored forecast by IDC predicts that “[f] rom 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every man, woman, and child in 2020).” Obviously, that data won’t be evenly distributed—some companies (such as cloud providers) will store a larger share of it. Whether all of it can be stored economically is debatable, but assuming traditional storage is inexpensive enough or a new technology emerges to handle more data, another (perhaps greater) problem remains: what to do with it all.
Big Data Pack Rats
If you’ve ever lived in a house for a number of years and then moved, you have probably experienced the shocking realization that you have accumulated lots of junk. In a noble desire to avoid waste, we often accumulate worthless goods on the rationale that “they might come in handy someday.” For many individuals, getting rid of stuff takes a conscious effort. How much easier it is to be a pack rat when it comes to data, which can easily be moved out of sight.
The mantra of big data is that all these ones and zeroes contain information that can yield significant business value—if we can just figure out how to extract it. And therein lies the problem: most of the data is useless and is just taking up space. Much of it contributes little value. For some businesses, the value to be gained from analyzing massive amounts of information is insufficient to justify the costs of implementing a big data analytics system.
Even if a company decides not to pursue big data analytics, data-storage needs will continue growing unless it implements some approach to eliminating useless data. But how should useful and useless data be differentiated? Companies are thus in a difficult situation: the rising flood of data may be worth storing for analysis (which requires investment in a platform to process that data), but in either case, identifying the useful data is difficult.
Fixing Data Hoarding
If you’re not interested in big data analytics, but instead want to simply cut storage costs by eliminating useless data, you must determine a means of differentiating the good from the bad. (Eliminating useless data can also aid big data analytics in many cases as well.) But this is not a trivial matter. Although older data may tend to be less useful than newer data, simply deleting files older than a certain date is sure to destroy much valuable information. Attempts to differentiate by file type, frequency of use, location in storage, source, size and so on all run into the same problem. Going through data file by file is a tedious task that is probably uneconomical.
Some tradeoffs could be devised where, for instance, data of a certain type and older than a certain date is deleted on the assumption that the costs saved outweighs any value lost. Of course, short of quantifying ahead of time the value that is lost, justifying such approaches objectively is difficult. Individuals (consumers and employees) can take some steps to delete information they know to be useless (the 40 draft versions of a report, for instance), but this approach fails to address data created automatically by sensors, monitoring equipment and so on. Unfortunately, identifying and disposing of useless data may seem more costly (or at least more troublesome) than simply paying for additional storage and ignoring the problem. Data hoarding thus has momentum for many companies and individuals.
Big Data Analytics: To Analyze or Not To Analyze
Processing large amounts of data is costly: it requires primary storage, backup storage, capital costs in both storage and processing equipment and software, labor costs in implementing the system and ongoing costs to run everything. Implementing a big data platform should, like any business decision, be justified by a legitimate potential for returns. You need more than just a little bit of valuable insight from your reams of data—you need to cover the platform costs and provide enough of a return to justify focusing on this pursuit instead of something else.
Yes, your massive amounts of data probably have some useful insights to offer if you look long and hard enough, but you might also be better off simply deleting a large portion of that data. By doing so, you save on storage and backup costs—value that must be considered when deciding whether big data analytics has something to offer your company. As Baseline notes, deleting data (rather than storing it all) can also yield cost benefits in the event of litigation.
Determining whether big data analytics offers your company a source of valuable insights depends on a number of factors, including your industry, budget, goals and so on. For all companies (with, perhaps, a few exceptions), a campaign against data hoarding can help by reducing storage costs, as well as enabling a greater focus for those choosing to pursue big data. Unfortunately, determining what data to delete is problematic, largely because of the sheer amount of it.
The best approach to ending data hoarding—whether you’re a company or an individual—is to accept that you must make some tradeoffs. If you delete data, even if you’re extremely careful and do it by hand (a time-consuming and, arguably, wasteful approach), you’ll probably kick yourself one day over something of value about which you “should’ve know better than to delete.” Accept that this will happen, but consider also the costs saved in the meantime. By reducing your data load, you cut storage costs—value that could easily outweigh the tidbits of information you would have otherwise been able to use later.
Big data appeals to the “it might come in handy someday” attitude that so many of us harbor. Unfortunately, this attitude can be costly in numerous ways. Big data analytics is an area that may prove to offer enough value to justify itself, but it is also surrounded by a lot of hype. In the meantime, all your company’s data is soaking up money in storage and backup costs. The question each company must answer is whether the value of maintaining that data—and, perhaps, the value of big data analytics—is enough to justify the costs. Are you in search of big data value, or are you just big data hoarding?