At the start of the year, we all resolve to take better care of ourselves and create plans to best maintain our bodies and minds. We recommend a similar New Year’s resolution for data center managers who are ready to make the critical switch from a “run until parts failure” approach to proactive maintenance for their uninterruptable power supply (UPS).
If a UPS unit fails, the results can be catastrophic. In December 2013, the Ponemon Institute released a study exploring the costs of data center outages. It found that on average, a single outage in 2013 cost organizations more than $627,000. By far the biggest single cause of a data center outage was UPS failure, accounting for one-quarter of the outages the Ponemon Institute assessed. UPS outages can be blamed for more than $11 million of the total $45 million in revenue losses resulting from data center outages. Many of these outages could have been avoided with a proactive maintenance approach.
In addition to protecting data centers from downtime, proactive maintenance keeps a UPS operating efficiently—saving money in replacement parts and energy costs—and allows data center managers to plan and budget for the future.
Not Just a Box in a Closet
Not too long ago the idea of spending time and money maintaining what was perceived to be an inactive “box in a closet” seemed unnecessary. What facilities managers didn’t realize, and what some still don’t realize, is that their UPS isn’t sitting dormant, waiting for a storm or earthquake to knock out power before coming to life. It is at work all the time, preventing normal variations in the power supply from damaging servers.
The UPS: A 24/7 Data Center Employee
In addition to the obvious job of preventing outages, a properly maintained UPS unit correctly regulates power from a constantly fluctuating grid. A 2012 study by The New York Times found that more than 80 percent of the power consumed by data centers goes to keeping servers at the ready in case of an increased demand for data or sudden heavy traffic. UPS units ensure a constant, steady supply of power and control the flow of energy throughout the day, every day.
This constant adjustment takes a toll on UPS parts, and compounded by the sheer scale of the load a UPS handles for even fractions of a second, this operation puts a tremendous strain on the components. According to a 2007 Department of Energy report, a single data center can consume up to 100 times more energy than a standard office building. Research commissioned by The New York Times surmised that worldwide, data centers use 30GW of power—the rough equivalent of the output of 30 nuclear power plants.
All that power is managed by that “box in the closet,” second by second. A proactive maintenance approach is critical to ensuring that your UPS equipment continues to keep servers running at peak performance without interruptions.
Proactive Maintenance Defined
What does it mean to be proactive when talking about maintaining UPS equipment? The first step is regular preventive service conducted by trained professionals. UPS infrastructures are complicated. Servicing and fixing them requires skill and confidence, as they cannot be turned off while being maintained. One false move during routine service could trigger a shutdown. The work is both technical and delicate. Field experience in servicing UPS systems is key to providing competent preventive service.
Good preventive maintenance will address the whole UPS assembly from the battery and semi-conductors down to the wiring, resistors, breakers, capacitors and fans. All UPS components require regular attention to run at maximum efficiency.
For example, fans have the critical job of removing waste heat from the UPS internal structures, and that waste heat can be significant. An Emerson white paper, “Reliability of Air Moving Fans, and Their Impact on System Reliability,” reports that a 100 kVA UPS system can generate 5 kW to over 10 kW of heat. All of that heat can cause parts to fail and reduce the efficiency of a UPS system, driving up energy and replacement costs. For example, fans dissipate heat from SCRs, IGBTs and power modules. A single IGBT can cost upwards of $1,200 if it is destroyed by excessive heat. Making sure your fans are fully operational and replaced on time can eliminate that expense.
Regular preventive maintenance of fans mitigates the possibility of both mechanical and electrical failures. A skilled field technician will begin by recording the ambient air temperature, which is a way to gauge overall performance, then note cleanliness and move on to a comprehensive visual check that includes overall cleanliness and parts. The motor coils will be assessed to avoid electrical failure, and mechanical issues will be mitigated by replacing filters, monitoring bearing wear, and ensuring that fan blades and housings have not been distorted. Regular checks on all the parts of the fan ensure that it will operate efficiently and will keep the rest of the UPS unit running at the optimal temperature.
Preventive Service Cuts Expenditures
Preventive service is a crucial, complicated job that cannot be overlooked or ignored, especially because in addition to maintaining uptime, it yields financial benefits. The most immediate financial gain comes in preventing outages, which can cost anywhere from $500 to $16,000 per minute. Second, a well-maintained UPS system requires few to no costly emergency service calls, in some cases reducing this cost 50% or more. Finally, when completed regularly and well, preventive service can extend the operational life of UPS components anywhere from 25% to 50% over manufacturer-reported lifespans, thus reducing expenditures on unit replacements. In addition, preventive service keeps the UPS assembly running at peak performance, which will reduce energy expenses. Over time, all of these realized savings become significant.
Keeping Track of the Entire System Ensures Uptime
For larger data centers, UPS redundancy is an important strategy to avoid costly downtime and maximize uptime. Tier III and Tier IV centers meet their operational uptime goals by keeping many UPS units in parallel with UPS-redundant configurations, providing backups for backups that have their own backups. Layering the UPS creates a highly complex infrastructure that requires rigorous monitoring.
The first step toward the necessary rigor is a detailed list of UPS constituent parts. Compiling such an inventory can be onerous and confusing, especially if there are multiple sites and numerous systems across a number of floors. DC Group developed its proprietary D-Tech software to help data center managers take stock of their entire UPS portfolio. It encompasses equipment details such as serial number, location, and dates of service, which are all linked in real-time to actionable information such as service history, operational deficiencies and online status. Such information gives a data center manager an easy way to implement a more proactive maintenance approach.
Figure 1 below shows a model UPS inventory for several data centers across the country owned by one company. A detailed inventory such as the one pictured allows the data center manager to understand the state of the entire portfolio from one screen or on the run via a mobile device. Managing a complicated system full of redundancies and various configurations requires a complete view of all the parts. Without one, needed replacements may be overlooked, and it can take additional time to find faulty UPS equipment in the event of a failure.
Once the inventory is created, managers can analyze their UPS however they need to, by location, part or age. Drilling down on individual parts, for example, can help a manager fully understand the state of each component. In Figure 4 below, a search for all parts with minor deficiencies yields seven units in locations that span the country, including Massachusetts, Colorado, Minnesota, Georgia and California. Vital information such as their date of installation and last maintenance visit is easily seen. This type of specific and in-depth knowledge is a critical tool in planning for future repairs, service and replacement parts.
Proactive Maintenance Improves the Budget Process
Planning for the future is key to success in data center management. Replacing a whole UPS unit is a costly decision for a company. Using a proactive maintenance approach, part replacements and overhauls can be planned and predicted, allowing for costs to be forecasted and built into budgets. In contrast, without the thoroughness of proactive maintenance, a part can break down unexpectedly, leading to an unplanned expense.
A Proactive Plan for All Data Centers
From entire buildings with an army of UPS units to a single, small unit in the basement, your UPS system—and your approach to it—is a critical part of a secure, reliable data center infrastructure. Preventive, planned service ensures peak performance, greater energy efficiency, reliable prevention of outages, and the tools data center managers need to plan ahead and increase bottom-line results.
About the Author
Jon Frank is the CEO and cofounder of DC Group, which was established in 1991. Named on the Inc. 500|5000 six consecutive years, DC Group provides uninterruptible power supply service and maintenance throughout the United States and Canada.
 Ponemon Institute
 Power, Pollution and the Internet by James Glanz, The New York Times, September 22, 2012
 Extending UPS Operations by Henry Hu and Jeff Donato in Electrical Construction and Maintenance. August 1, 2009.
 Ponemon Institute
 Based on DC Group data and results
 Based on DC Group data
 Uptime Institute.