Power Capping Puts IT Back in Control
Part One of this series, “Energy Management Innovations Redefine DR Practices,” explained how data center technology advancements have changed disaster recovery practices. Those same advancements also enable introduction of proactive practices that can actually prevent many outages. Data center managers can accurately gauge power consumption and proactively identify issues before they escalate into outages. As an extra benefit, these energy-management practices can also optimize energy efficiencies during normal operation to reduce the budget impact of escalating energy costs.
Monitoring for Power Spikes and Hot Spots in the Data Center
Power monitoring has come a long way. In the past, real-time monitoring was impractical, and data center managers relied on modeling and estimations derived from vendor suppliers’ equipment specifications. Overprovisioning was a routine practice to avoid power spikes and hot spots.
As vendors enhanced products, IT managers were able to monitor some data points. This monitoring usually focused on return-air temperature at the air-conditioning (AC) units, and perhaps the power consumption for each rack.
Today’s new, holistic energy-management solutions offer much more fine-grained levels of monitoring. It is now practical to aggregate return-air temperature at cooling units and air handlers with server inlet temperatures, along with real-time power-consumption characteristics for servers, blades, power distribution units (PDUs) and uninterrupted power supplies (UPSs).
Real-time, fine-grained monitoring eliminates overprovisioning. Previously, an IT manager might test a fully loaded server and arrive at a requirement of 400 watts per server, or 4 KW per rack of 10 servers. Using real-time monitoring, the same manager can accurately measure the maximum power draw in the production environment. I have personally seen energy management solutions help data center managers boost rack densities by as much as 60 percent (or up to 16 servers per rack, in this example).
Middleware vendors can also automate the logging of the monitored data, and analysis and planning tools are available to extract valuable insights about data center patterns. Some energy management solutions use real-time and logged data to generate thermal and energy maps of the data center. IT managers get an extremely accurate picture of power at the server, rack, row and room levels.
The at-a-glance maps can also quickly highlight hot spots before they result in equipment failures.
Power Capping to Avert Failures
Once monitoring is automated, the next logical steps involve introducing more controls and preventative practices. Once again, data center equipment providers and middleware vendors give IT managers some new options today. In particular, power capping makes it possible to set power limits for each rack. Power thresholds can be automatically monitored to detect and block power spikes that would otherwise damage equipment. The accuracy of holistic data center monitoring solutions enables setting of limits that will not hinder normal operations.
Although simple and effective, power capping alone is still a crude preventative measure because server power directly affects server performance. IT managers need a more intelligent power-capping method.
Dynamic power capping is a feature that differentiates the leading energy management solutions, making it possible to fine-tune power and server performance such that CPU operating frequencies are adjusted on the fly. This approach calls for a highly integrated solution because it requires interactions with the operating system or hypervisor, as well as continual monitoring of predefined power and temperature thresholds.
Automated, dynamic power capping, combined with real-time temperature and power monitoring, build-in a layer of intelligence that can drive extremely effective power management. The resulting energy-management practices allow data center architects to adjust rack densities while staying within known ranges for normal operations and avoiding spikes during peak operation.
Over time, the data center team gains a knowledge base of power characteristics for various workloads and performance levels, which can then enable improved energy efficiencies and lower costs.
Because it gives data center managers the ability to manipulate power while protecting performance and service-level agreements (SLAs), power capping is gaining popularity. A broad range of use cases are documented, including examples of state-of-the-art energy management solutions for reducing server power consumptions by as much as 20 percent without any noticeable impacts on performance.
Companies like EMC have used power capping to meet energy efficiency and usage targets. EMC offers the Atmos service, a cloud-optimized storage (COS) solution, for which it employs power capping to align voltage and frequency to keep servers below a preset power cap. In a proof of concept, the company proved that the energy management solution met both power targets and end-user performance goals.
Baidu, the largest search engine in China, is another documented case where power capping and dynamic frequency adjustments were applied. Baidu’s goal was to increase server density and therefore decrease the charges the company paid for data center rack space. As much as 75 percent of its racks were unused, and power capping allowed it to increase density by 40 to 60 percent.
The areas with less to gain from power capping include HPC clusters and legacy hardware unable to respond to a power cap. In these two situations, efficiency is best measured by getting the most out of the hardware compute cycles while monitoring consumption and thermals to ensure business continuity.
The time is right for using advanced energy management practices in the data center, given the maturation of technologies and their proven ROI. Owing to the high cost of downtime, the ability to identify “hot spots” should be a high priority for most organizations as a way to avoid the risks to servers and other temperature-sensitive equipment. Furthermore, proactive energy management solutions provide insights into the power patterns that lead to problematic events, and they offer remedial controls to avoid wasted power, equipment failures and service disruptions.
In addition, today’s upward trends in energy costs and data center growth show no signs of diminishing, demanding the level of granularity and accuracy of energy monitoring and control technologies that yield high efficiencies. Clearly, the days of overprovisioning must come to an end. Armed with the insights that can be gained from advanced energy management solutions, most organizations can replace outdated practices with green, energy-efficient ones.
Using detailed power and temperature maps, along with data for servers, racks, rows and entire data centers, IT managers are also equipped to identify energy-wasting behaviors. They can educate the users behind the services about energy-conserving data center practices. The same energy management solutions that monitor servers and air-handling equipment can introduce controls that enforce green policies. For example, energy management solutions can automatically generate alerts and trigger power adjustments whenever a group or department exceeds predefined power limits.
Alternatively, a holistic energy management solution can facilitate tracking and charge-back mechanisms. Companies can use energy logs for power-based metering and energy-cost charge-backs. By raising awareness about energy costs as they directly tie back to specific data center services, companies can gain insight into the true cost of resources and ultimately develop more-economical and more-sustainable business practices.
An increasingly vital business resource, energy will always need to be managed and optimized. Therefore, best practices must continually evolve and use the latest technology developments.
In Part 3, the series will conclude with an overview of energy management best practices relating to the operation of data centers at high ambient temperatures.
Leading article photo courtesy of 123net
About the Author
Jeff Klaus is the general manager of Data Center Manager (DCM) Solutions at Intel Corporation, where he has managed various groups for over 13 years. Klaus’s team is pioneering power- and thermal-management middleware, which is sold through an ecosystem of data center infrastructure management (DCIM) software companies and OEMs. A graduate of Boston College, Klaus also holds an MBA from Boston University. He can be reached at Jeffrey.S.Klaus@intel.com.
 White Paper, PoC at EMC, “Using Intel Intelligent Power Node Manager to Minimize the Infrastructure Impact of the EMC Atoms Cloud Storage Appliances,” http://software.intel.com/sites/datacentermanager/atmospoweropt1_3c.pdf
 White Paper, PoC at Baidu, “Intelligent Power Optimization for Higher Server Density Racks,” http://software.intel.com/sites/datacentermanager/intel_node_manager_v2e.pdf