It seems that we’re in the midst of another cycle in IT. The push toward outsourcing seems to be waning, as new technologies like virtualization and the cloud put more cost-effective control in the hands of IT leaders. New, next-generation thinking around data center design and operations—some of which is fueled by born-in-the-cloud companies like Google and Facebook—is also driving new perspectives.
The thinking has advanced, yet the inefficiencies that have plagued IT for decades remain. In fact, many of them have become more acute in the face of continued data growth, increased heterogeneity and complexity, and leaner operations. Whether you’re taking back control of a previously outsourced data center or dealing with a suboptimal current state, it’s important to understand the root cause of these inefficiencies and build a roadmap to mitigate them.
Of course, it would be better to eliminate these inefficiencies entirely. Yet I’d argue that improving efficiency is the wrong target. In a management course in my first year of business school, the professor went to great pains to articulate the difference between efficiency and effectiveness. “Efficiency,” he said, “tends to be process focused and is about doing things better, whereas effectiveness is about doing better things. It’s about the outcome.” In other words, we should be aiming towards a goal of a more effective data center. The continuous improvement efforts around efficiency certainly deliver positive results, but they will not provide the transformation that many IT operations demand.
Let’s unpack some of the primary barriers to achieving a truly effective data center.
Barrier 1—Copy Data Sprawl
It’s a fact that data growth rates continue unabated. But as IDC described in the white paper “The Copy Data Management Market: Market Opportunity and Analysis” (ID #241047), the story behind the story is the growth rate of copy data, or separate copies of production made and kept for different uses. The fact is that we’re spending the majority of our storage budget—hardware, software and operations—on managing copies of our mission-critical data versus the actual source data. The result: dramatically overbuilt storage infrastructures, orphaned capacity and limited visibility across the environment. The situation leads to longer lead times for provisioning, which creates a bottleneck for growth/innovation-oriented projects. It’s also an economic disaster, as the return on assets and invested capital is reduced.
Getting ahead of the copy data sprawl is about first understanding the magnitude of the problem. IDC suggests defining your copy data ratio (CDR), which is the amount of total data divided by the amount of production data multiplied by 100. IDC suggests that a score below 150 is optimum, but as the number increases, the situation becomes more acute and the need for action increases. A score over 700 is considered a crisis. Plot this number against the number of separate systems and/or tools in use to support these copies of data, and you get the second dimension of the copy data problem. This also details the roadmap for consolidation. As IDC indicated, there are a rising crop of new tools and vendors building solutions to address the copy data problem.
Barrier 2—Application Development
There is tremendous pent-up demand for new applications in the enterprise, from transforming legacy code to enabling new business initiatives. The tools to develop these applications have become easier to use and more powerful. Compute resources, either local or in the cloud, have also become more available and elastic. Unfortunately, the storage infrastructure underpinning the application development, test, QA and staging environments hasn’t necessarily kept pace. Consider a typical workflow in which a developer wants a copy of the production database to work against. Let’s assume that database is 2TB in size. The developer requests a copy of that database from the DBA, who in turn requests storage from the storage team. Given the typical turnaround time—measured in weeks—the DBA requests more storage than 2TB, just to be on the safe side. The storage team knows that the DBA pads their requests, but it also know that it’s likely the DBA will come back with a different request tomorrow, and the next day. So, they provision the storage and wait for the next request.
This workflow takes place over one to two weeks and creates interruptions and downtime in the development schedule. And this workflow repeats itself for the user-acceptance testing team and again for the preproduction/staging environment. The result: delayed projects that could have a direct impact on revenue and/or customer satisfaction. In other words, infrastructure inefficiency translates into lost revenue or unhappy customers.
The roadmap toward addressing this bottleneck starts by driving virtualization beyond compute and network, into storage and even the underlying data. Virtual data copies are a particularly important point in this value chain. Once the data is virtualized, the time to provision those copies can be reduced from weeks to minutes, or even seconds, depending on the size of the data set. It’s important to ensure that the solution chosen can support true read/write access by development teams and in a highly space-efficient manner. Otherwise, you’ve traded the problem of costly delays for another: storage overspending.
It seems that every week, a new survey is published that highlights the gap between the desired state of resilience and the actual ability to protect, recover and resume operations in a timely fashion. This is not an artifact of poor planning or sloppy operations. Rather, it’s a compounding effect of data growth, environmental complexity and business tolerance (or lack thereof) for downtime. Infrastructure sprawl, coupled with increased heterogeneity and complexity, and a spectrum of tools designed to serve point needs all lead us to a situation in which we’re having to make tradeoffs in terms of what applications get the highest—or even any—levels of protection. Unfortunately, that’s the reality that many businesses face today. There are no longer enough hours in the day, appropriate and/or cost-effective technologies, or people on the ground to deliver 100% resilience—zero data loss, instant recovery and geographic protection—for all of their applications.
But, given your business, do all of your applications need 100% resilience as defined? It’s vital to take inventory of your current and pending applications and develop an appropriate map of the service-level objectives for that application. For example, is the transaction volume frequent enough to require mirroring of the data? If you’re running payment processing or electronic exchanges, the answer is probably yes. In the case of a manufacturing operation for automobiles, perhaps the requirement can be relaxed. Can the business withstand a 24-hour time to restart the entire application infrastructure—server, storage, network, user access, support and so on? An evaluation of the tools currently in use against this service-level catalog will then identify gaps and/or overlaps in technologies—opportunities for decommissioning and savings. Ultimately, the technology implemented should support flexible SLAs to be applied to all applications and host types, both physical and virtual. This is the culmination of the application-defined data center, a vision in which applications orchestrate the underlying infrastructure on their behalf to meet SLAs. Many vendors are approaching parts of this problem through strategies under the heading of “software defined,” yet they still suffer from being infrastructure- versus application-centric. This will be an important area for innovation and investment over the next decade.
Energy has become an increasingly scarce resource in many cities around the world. One issue: legacy architectures continue to serve mission-critical workloads, while consuming more kilowatt-hours than their modern-era equivalents. Consider that VMware estimates that nearly 3,000 kWh are saved per year for every workload virtualized. The continued use of physical, standalone servers drives significant inefficiencies. Similarly, the energy consumed by a legacy 73GB FC disk drive is 94% higher than even a 1TB SATA disk drive. Considering the drive densities now are in the 4TB range, the energy savings can be significant by moving workloads from older drive technologies. And although flash drives consume less energy than their disk-based counterparts, they drive an order-of-magnitude greater efficiency in terms of performance. A modern-era flash array in a single rack can easily drive the same level of performance of an array that consumes three to four racks or more.
Transforming energy consumption is a multistep program that ultimately must be institutionalized into how the business measures itself. These programs at their most mature move beyond efficiency into true sustainability initiatives. In terms of roadmap, the first and most important steps are to drive virtualization and the corresponding consolidation in the environment. These themes are consistent across many of the barriers described, yet they pay real dividends, particularly when modernizing infrastructure. Included in this arena are intelligent power management, adaptive cooling, spin-down technologies and even leveraging off-hour compute time. The subject of energy truly spans both facilities and IT domains. The Green Grid has done an outstanding job of defining a maturity model to assist businesses in measuring their current state across both domains and in building a roadmap toward best practice.
To this point, the focus of these barriers has been infrastructure-centric—servers, storage, power and cooling. Perhaps the most important dimension of truly achieving transformational change to the data center rests in the people and processes (and implied automation). Data center operations are truly where the rubber meets the road. This is where the line of sight between the physical and virtual infrastructure in the pursuit of application/business service levels are defined and maintained. It’s also where the everyday, seemingly pedestrian issues related to individual hardware and/or software components can derail weeks of progress. This is also the point of convergence of a mountain of data—system and people generated—that can easily overwhelm even the most sophisticated operations.
The approach here centers on automation, but it starts with an outcome focus based on process discipline. The key questions to ask are the following: What are the key performance indicators for the business? How do those translate into IT objectives? And what are the optimized processes to meet or exceed those objectives? The automation strategy should flow from that logic, not start merely with the goal of reducing human touch. In terms of tools, it’s about choosing wisely, leveraging standards—RESTful APIs for programmatic access and extension of vendor products—and standard frameworks like ITIL to align and drive processes and skills.
Overcoming the inefficiencies in the data center can deliver dramatic improvements in time to market, operating and capital costs, return on investment, and quality of service. By adopting a continuous improvement approach toward efficiency gains, these improvements can be delivered stepwise over time. On the other hand, by focusing on effectiveness, a true transformation can be delivered. With today’s technologies around virtualizing compute, storage and even the copy data rampant in the enterprise, these projects can be started and completed with minimal disruption to the steady-state infrastructure, yet they can provide near-term consolidation benefits reaping significant rewards in cost and complexity.
Leading article image courtesy of Acoustic Dimensions
About the Author
Brian Reagan is the VP of Product Marketing at Actifio. He was previously CTO of the global Business Continuity and Resiliency Services division at IBM Corporation, responsible for the technology strategy, R&D, solution engineering and application development for all global offerings including cloud services. Before IBM, he was CMO for performance data storage firm Xiotech and also CMO for Arsenal Digital Solutions, which was purchased by IBM in 2008. A technology industry veteran, Brian also held senior-level strategy and marketing roles at EMC Corporation and MCI Telecommunications, and he has spent over two decades in the areas of storage and information management. Brian holds a B.A. from Bennington College and an MBA from George Mason University and is a frequent speaker at industry events and conferences.