The data center is at the core of every agile business: it must provide continuous delivery of data and services at the velocity required to proactively meet the dynamic requirements of the business.
Data center capacity is the aggregate capacity of all data center assets. These assets extend beyond the physical and virtual systems of an IT infrastructure to the equipment that powers and regulates the actual data center climate and environment. Capacity is not easy to manufacture, nor does it comes cheaply. Today, IT must rethink data center capacity management from a “what’s required now” mentality to a far more strategic vision of dynamic capacity allocation. Forced to juggle its priorities between business demands and increasingly complex data center provisioning strategies, IT is even more hampered by a flat or only marginally growing budget.
Under such tight conditions, data centers have no room for capacity mismanagement. Overprovisioning servers is an unacceptable money drain on the organization, and capacity shortages can cause performance degradation or even service disruption. Because business advantage and competitiveness depend so much on both an enhanced technology infrastructure and the optimized capacity to run it, the data center must shift pace and become more agile—ready to anticipate urgent new business demands and embrace new technologies while remaining within its current cost structure.
Achieving data center agility requires optimized capacity based on a unified view of data center performance starting at the individual asset level. Comprehensive asset visibility, analysis and control are the prerequisites for every agile data center.
Data Center Performance Management (DCPM): The Business and Data Center Agility Imperative
Data center performance management (DCPM) takes a unified approach to benchmarking and forecasting—against dynamic business needs—of the aggregate performance of data center assets. This unified approach is highlighted by Forrester Research analyst Jean-Pierre Garbani in his 2011 article “If you don’t manage everything, you don’t manage anything,” offering a clear message: failure to monitor one element can lead to the failure of the entire system. Specifically, Garbani draws attention to the failed design of an early-generation Citroen 2CV gas gauge as an analogy to explain why IT should focus on all components of the IT infrastructure: the minimalist approach of the highly efficient car used a dipstick, as opposed to a dashboard gas gauge, that often left drivers stranded when they forgot to check the dipstick for the fuel level in the tank. This is an example of “a great means of transportation [that] failed regularly for lack of instrumentation.”
Garbani emphasizes that application performance management (APM) must take a unified approach to managing IT infrastructure—hardware, software, virtual and physical—and ensure that all components perform to keep an application up and running in an optimized fashion.
Data center performance management takes this APM model one step further, measuring not only IT asset utilization and response time but also environmental metrics such as power utilization and peak demand against business objectives. Why? Failure of one underlying component supporting the IT infrastructure (e.g., maxed-out power capacity) could disrupt business-critical applications and processes. With the data center being at the core of every business-critical application, IT shortens the path to agility by taking a holistic approach to managing data center performance—by accurately analyzing forecasting and planning for system and environmental variables so as to ensure uptime and available capacity of every ongoing and upcoming project.
- Visibility. Broad, deep and continuous monitoring unites comprehensive asset visibility with performance trends to give IT both “in the moment” and historical performance intelligence about all the data center resources that support its business-critical applications. Myriad application performance and enterprise management software exist today, but they provide only a partial view of their attributes. What IT needs is a “manager of managers” approach to monitoring. This approach uses the integration of physical and virtual systems along with any enterprise/IT asset management and building management systems already in place for a unified view of the data center.
- Analysis. Continuous performance analysis correlates individual asset KPIs—such as computational utilization, memory, storage, network and power—and compares them against each asset’s capacity limit to determine available headroom, or it analyzes the historical maximum/minimum/average capacity for performance comparison. These measurements bring critical insights into each component’s current operating capacity, as well as highlighting potential bottlenecks and wasted resources that demand attention. Using DCPM, IT can pinpoint which applications are the major consumers of each resource and drill down into the root causes of resource contention or capacity shortages. IT can also investigate virtual-machine memory allocation as well as storage oversubscription or peak power demands of the physical environment by analyzing deviations from planned efficiencies. This analysis is a continuous, iterative process that must begin with baseline measurements of workloads, systems and equipment. Having a historical perspective on consumption, utilization and costs, IT will be able to analyze variances and take action to improve data center performance through accurate demand forecasts and what-if planning.
- Control. Capacity forecasting and what-if planning gives IT hands-on control to ensure data center agility. DCPM functions can accurately portray current resource usage and available capacity for new applications on the basis of a specific asset’s historical resource consumption and its ongoing utilization patterns. In addition, continuous capacity planning enables IT to forecast which physical or virtual assets can effectively serve specific applications or workloads, giving IT the ability to proactively shift workloads on the fly to more suitably assets and better meet the dynamic needs of the business.
In addition to forecasting asset utilization and capacity, DCPM provides what-if scenarios to help IT evaluate cost/performance tradeoffs for multiple hardware options and service deployment strategies: virtualization, cloud computing, server consolidation, hardware refresh and so on. Approaching this type of planning manually, using spreadsheets or a hodgepodge of planning tools, requires “an army of elves” to map, track and analyze change and its related impact. With DCPM, IT can automate data center capacity planning by focusing on the important objectives: providing accurate data center capacity forecasts and appropriate data center capacity to meet business needs.
DCPM With Six Sigma
By combining a DCPM platform with the Six Sigma DMAIC best-practice framework for capacity planning, IT and data center professionals can achieve continuous data center resource optimization via the following steps:
1. Define data center goals: IT initializes the capacity planning process by setting the overall data center SLA and key metrics against business goals, followed by identifying future requirements and trends.
2. Measure data center baseline: Through its extensive library of connectors and integration with enterprise management software, a DCPM platform delivers comprehensive data center infrastructure metrics for baseline measurements. Where asset connectivity is not directly available, virtual meters and inference engines could fill in the gaps, providing highly accurate inferred metrics. The resultant baseline measurements allow IT to compare actual data center performance with their overall performance targets, as well as to spotlight any variance from that baseline over time.
3. Analyze data center variances: A DCPM platform can provide rich and complete history by way of correlating data across multiple data centers. Using in-memory data storage and query techniques, time-series precalculations and various other performance optimizations, a DCPM can present root-cause analysis of these variances and help in the formulation of corrective actions.
4. Improve data center infrastructure: A DCPM can use what-if scenarios for assessing new workloads and application deployment strategies, evaluating new equipment, or determining capacity for VM placement. These scenarios not only help IT evaluate and select the best scenario but also accurately forecast and plan for the capacity needed to support new application rollouts or increased demand.
4a. Forecast utilization and capacity: A DCPM platform uses historical consumption and utilization patterns of assets to predict their future demand and available capacity.
4b. Plan with what-if scenarios: A DCPM platform should auto-populate key parameters with smart default values (which are calculated from real utilization, consumption and cost data), allowing IT to predict operating impact on the basis of asset changes.
4c. Select best scenario: A DCPM platform would allow IT to compare, choose and save desired scenarios.
4d. Execute on the selected plan: A DCPM platform routes the selected plan to service desk tickets (in HP, BMC, IBM Tivoli and so on).
4e. Check actual performance against plan: A DCPM platform compares baseline performance against the selected plan, providing IT with intelligent insights leading to corrective actions.
5. Control data center operations: A DCPM platform should provide real-time monitoring for new variances, enabling IT to repeat plan-do-check-act steps continuously to correct variances.
5a. Check variance against plan: A DCPM platform alerts and reports on threshold crossing.
5b. Analyze root causes: A DCPM platform drills down to details of failure and determines what to fix.
5c. Act to resolve problems: A DCPM platform should route issues to service desk tickets for problem resolution.
Through this intelligent capacity planning process, IT/data center professionals can see whether available physical and virtual IT infrastructure, power, and space capacity exist to support the upcoming application rollout. The capacity forecast and performance prediction reduce risk by enabling IT to analyze cascading impacts of upcoming projects before installing any systems.
DCPM is at the heart of the agile data center. With clear insight into downstream resource requirements, IT can focus on implementing the necessary changes to operate—continuously—at optimum capacity, meeting the requirements that support an agile business.
About the Author
Bob Ertl is senior director of product management at Sentilla Corporation (www.sentilla.com), headquartered in Redwood City, Calif. Before joining Sentilla in 2011, Bob worked in product management roles at Oracle, Hyperion Solutions and Brio Software. He is available for questions and comments at firstname.lastname@example.org.
Capacity management critical success factors identified by Information Technology Infrastructure Library (ITIL), an industry-recognized set of practices for IT service management that focuses on aligning IT services with the needs of business.