Industry Outlook is a regular Data Center Journal Q&A series that presents expert views on market trends, technologies and other issues relevant to data centers and IT.
This week, Industry Outlook talks with Peter Godman about the explosion of data and how companies can manage it. Peter is cofounder and CTO of universal-scale file-storage company Qumulo, where he uses his expertise in distributed file systems and high-performance distributed systems to guide product development and management.
Industry Outlook: What’s the biggest challenge businesses face today when it comes to data?
Peter Godman: Data is growing at an explosive rate, roughly doubling every two years. It’s the digital currency of the global economy, yet much of it is effectively stuffed under the mattress in legacy storage systems. To deploy mission-critical workflows with breakthrough innovation, data-intensive organizations must be able to unlock the value of their data anywhere, at any time. They need the freedom to store, manage and access their file-based data in any operating environment, at petabyte and global scale.
IO: What major IT changes and advances have taken place over the last 10 years or so that have contributed to this situation?
PG: A global operating model for businesses has created new requirements for scale, including the number of files stored, the ability to manage enormous data footprints in real time, the global distribution of data and the need to take advantage of the cloud. As a result, businesses are looking for technology that can help them move and share file-based workloads between the data center and the cloud. An intelligent file-storage system designed that meets the demands of the modern enterprise by scaling both performance and capacity on premises and in the cloud, with no hard limits, is essential.
IO: Which industries are the most affected by this data crisis and why?
PG: Data-intensive industries where innovation is the name of the game—such as media and entertainment, scientific computing, telecom, life sciences and medical research, and automotive—are the most affected, as they must work with globally distributed data sets across time zones and locations, and up to billion-file scale.
For example, in media and entertainment, digital animation rendering for a motion picture can generate from hundreds of terabytes to many petabytes. A single film can comprise more than 500 million files and 250 billion pixels, and data-intensive simulations can range from small-scale sequences for video games to billions of data points and multi-gigabyte-per-second throughput requirements.
Likewise, scientific computing and imaging generate tremendous amounts of file data. Whether researchers are involved with 3D medical imaging, electron microscopy or models of natural phenomena, they’re using ever more-complex simulations and images of higher and higher resolution to make their breakthroughs—making the need to handle billions of files while maintaining high performance and gaining insight into the data more critical than ever.
IO: Exponential rates of increase are unsustainable, so eventually something will dampen growth of data storage. Do you see anything on the horizon that could do so—cost, technology limitations, physical space or anything else? About how long do you foresee this tremendous growth continuing?
PG: Storage capacity will continue to double every 2 years for at least the next 10 to 15 years. My friend Luis Ceze at the University of Washington and others have been working at encoding data in DNA with phenomenal density and longevity. (Imagine an exabyte in a test tube that could be decoded 1,000 years from now under room temperatures, and compare it to a tape that stores less than one percent of that amount and degrades after 20 years). A big challenge we’ll start to face is that the amount of availability delivered relative to capacity is dwindling rapidly. Fifteen years ago HDDs delivered 1,000 IOPS/TB. Today that number is more like 10–12 IOPS/TB. Ten years from now it will be about 2. So we’ll have a mountain of data but we’ll be able to actively process less and less of it. I have a mental image of Wall-E where all the mountains are made of Blu-ray disks.
The density of storage-class memories must at least double every four years to compensate for Rock’s law, which says that fab costs double every four years! Doing so seems easy; as with the switch to 3D, there’s opportunity both to stack higher on die and also to resume process reductions. That said, at the rate we’re going, semiconductor fabs will come with a price tag of $100 billion within 10 years, ensuring that a select few can afford to own and operate one.
It’s mentally taxing to decide what data to delete, and mistakes are painful. Density increases make it easy to keep everything, so as long as storage continues to become denser, people will keep filling it.
IO: What role does the enterprise storage industry play in helping businesses keep up? What innovations are under way?
PG: To span on-premises and public cloud storage at petabyte scale, we need an entirely new class of enterprise storage, one that allows companies to create a single global data footprint. This new class of enterprise storage will be to traditional data storage what banks are to safes.
IO: What role does the cloud play?
PG: The cloud provides several core benefits to the modern data-driven enterprise. It offers agility: provisioning and releasing resources takes seconds rather than months. It offers access to TPUs, GPUs and other esoteric computation resources on demand. Last, it offers elasticity, allowing the data-driven enterprise to employ enormous amounts of computation for short durations.
IO: Can you provide some picture or estimate of how much data is in the cloud versus on premises? Do you foresee some kind of limit?
PG: This depends quite a bit on how the cloud is defined. If it’s IaaS + PaaS + SaaS, the majority of all data probably lives in the cloud already. The growth of the data is at the edge, though, and the natural limit for how much data can live in the cloud is the speed of light. Peter Levine recently gave a presentation called Return to the Edge and the End of Cloud Computing that digs into this issue. Much of the data in the world needs to be separated from decisions by nanoseconds or microseconds—for example, in autonomous vehicles. A nanosecond is a foot, give or take, and five microseconds is a mile. The cloud is just too far away from where most of the decisions are being made. We’re going to be in a place where there are tensions over sovereignty, cost, latency and security for some time. Beyond that, there’s no practical limit for how much data we can store in the cloud.
IO: How big a concern will data ownership and privacy be as companies increasingly rely on the cloud for data storage? Will such concerns have a big effect on the transition, or just a passing one?
PG: Data-sovereignty laws are at once a big challenge for users of the public cloud and also a great opportunity for public clouds. Being able to easily move data to comply with laws, or audit accesses, through a standard API is powerful. For example, although tens of thousands of IT pros may still need to learn data-sovereignty laws, and although the cloud makes it possible to inadvertently violate these laws, the cloud also makes compliance one API away. In short, the cloud should make sovereignty and ownership easier by deduplicating the efforts of many IT teams.
IO: Do you have any data or estimates on the cost of data storage (per gigabyte or terabyte, for example) and, in particular, what the long- and short-term trends are?
PG: On NewEgg today I can buy a 10TB HDD for around $360, or $36/TB. For around $850 I can buy a 1.6TB SSD, or $540/TB. Flash capacity is still 10x more expensive than HDD, no matter what the whitepapers say (they invariably assume you can compress and deduplicate on flash but for some unstated reason not on HDD). LTO7 can be had for less than $100 for 15TB, or $6/TB. So tape is one-sixth of the cost of capacity HDD, which is one-tenth of the cost of capacity NAND flash. The spread between capacity HDD and NAND flash increased this year.
A couple of years ago it looked like HDD would just disappear as it was squeezed between flash and tape. Recently, though, HGST announced that it has made microwave-assisted magnetic recording practical; it predicted this technology would allow it to deliver 50% density increases year over year for many years to come. If that prediction turns out to be correct, HDDs will probably be one-tenth the cost of flash on a capacity basis for a decade. This situation places intense pressure on storage systems to continue to offer hybrid solutions that can deliver flash performance but HDD economics.
IO: How do you see the enterprise storage and data center industries changing over the next five years?
PG: Over the next five years we’ll see a rationalization of expectations about cloud and on-premises data centers. Companies will move many applications to public clouds, and, yes, they will move some back again. The tough thing about moving applications is moving their data. Storage products that facilitate that movement will thrive and storage products that deny it will lose relevance as organizations leave them behind in the course of an application migration. In the end, storage will be divided into broadly-used and relevant portable data management as well as less relevant storage point solutions.