The generally accepted formula for calculating availability is an inaccurate reflection of actual downtime in the field. The data center industry acknowledges this fact, but it typically lacks any other way to calculate the amount of time that a device is “in service” compared with the total amount of time it could potentially support the critical load.
The generally accepted formula for calculating availability is the following:
Ai = MTBF / (MTBF + MTTR)
where Ai is the inherent or implied availability, MTBF is the mean time between failure and MTTR is the mean time to repair.
The intent of this formula is to enable equipment owners to establish a reasonable expectation for the amount of time that their UPS system will be out of service. But an Eaton white paper titled “Maximizing UPS Availability” accurately states that there are a number of issues using this approach.[i] Most importantly, this method fails to account for time that the machine is taken down for inspection, general maintenance, troubleshooting, waiting for a technician to arrive or waiting for a part to be shipped in. Therefore, the actual amount of time that the UPS will be down and out of service is far greater than the formula would have you believe. In most instances, it’s extremely difficult for owners, facility managers and maintenance providers to realistically calculate the actual amount of time that a particular piece of equipment is either running or not running. The reasons for this difficulty range from the equipment being unable to record start/stop times to service organizations lacking the capability to report when work is actually performed. The result is a huge gap in accurately determining the period during which a piece of equipment is actually performing its intended job.
One way that manufacturers of static UPS systems get around the issue of availability is to incorporate into the equation a second utility source through a static bypass in the UPS.[ii] By doing so, the equation above becomes a dual supply. The availability of the primary source (rectifier, battery and inverter) is now coupled with the MTBF and MTTR of a static bypass. This scenario is fine under two conditions: your static switch must transfer and the backup source (utility) must be available. So although the implied availability is now greater, the actual availability is the same. If the UPS is offline, regardless of whether the load is fed from the bypass, the UPS is in fact unavailable.
The true measure of availability is operational availability, where the total time a system is in operation is expressed as a percentage of the total time it’s needed.
Ao = Uptime/Operating Cycle
This equation takes into account all factors that can interrupt service or delay equipment from going back into service. These factors can include everything from waiting for a technician to show up to waiting for a part to be flown in. The equation measures the amount of downtime a customer actually experiences, whether planned or unplanned.
Although it’s fine to estimate how long it will take before a part fails (MTBF) and how long it will take to fix (MTTR), doing so fails to offer realistic expectations of what can and will likely happen once a piece of equipment goes into service. Now consider all the additional components that constitute typical mission-critical infrastructure (paralleling switchgear, automatic transfer switches, switchgear and generators, for example) and it becomes clear that calculating the operational availability of a mission-critical system is extremely difficult.
Using the real-time data available on the Euro-Diesel dynamic UPS, it’s possible to calculate the operational availability of each module in every system. For this example, we use data from the first project built in the U.S. by E1 Dynamics. This project, conducted for the federal government, consisted of two phases. The first phase involved a single 2,000kVA/1,600kW diesel rotary UPS system (DRUPS) module commissioned in September 2009. A second, redundant 2,000kVA/1,600kW DRUPS was commissioned in August 2011. The system was initially designed to enable expansion, leading to the ease with which the second module was added.
DRUPS: Operational Availability
During normal DRUPS operation, the load feeds from the utility through a choke that connects to the DRUPS alternator. For the load to be considered “protected,” the alternator should be rotating. The alternator then has two power sources: the first is the utility and the second is the diesel engine. One or the other must be available for the alternator to rotate.
This is where the actual field availability can be calculated. The Euro-Diesel No-Break KS has the unique ability to track the exact time during which the alternator is rotating, the utility is present and the engine is operating. During this time, the load is protected. Using the data we are able to calculate the operational availability.
In the example above, it’s easy to calculate the operational availability. In 2014, DRUPS B was online for 8,752.1 hours supporting the load. The total hours available in 2014 was 8,760. These numbers yield an operational availability of 99.91%. In other words, the DRUPS was offline for a total of 7.9 hours that year. This number was the sum of all maintenance, repair, diagnostics and emergency service.
When the data is examined over a longer period of time, the operational availability over the life cycle of the equipment starts to become apparent. DRUPS B was commissioned in September 2009. Our data collection began five months later. Through June 2016, the unit operated a total of 55,400 hours online, where the total number of hours that it could have been operating is 55,536. The operational availability is 99.76%—just 21 hours of downtime a year for maintenance, diagnostics, trouble shooting and repair.
Exanimating the data further, we can look for anomalies that help us determine either life-cycle issues, operational problems, or service-related events. DRUPS A experienced such an event in 2014. The unit was taken out of service for a total of 153 hours. The manufacturer of the diesel engine required that the unit be taken offline to perform a factory recall. Afterward the DRUPS returned to service.
This type of event would have no impact on theoretical availability using MTBF and MTTR. It’s not part of the equation. Using operational availability as a metric, the impact can be calculated. Furthermore, adding 153 hours back to the total (8,572.1 + 153) yields an operational availability of 99.84%. Comparing that value to Module B establishes a good baseline of the total time during which a DRUPS is likely to be offline.
Operational availability is an important metric that enables evaluation of different technologies for use in mission-critical applications. When comparing one technology to another, it’s important to take into account all the areas that are affected by the change in technology. The larger capacity of a DRUPS makes a direct comparison to a static UPS system complicated. A 2,000kVA/1,600kW DRUPS lacks any static equivalent, oftentimes requiring two or three static units to perform the same job. The number of maintenance events and downtime is thereby negatively affected owing to the increased amount of equipment. Similarly, the DRUPS in this example resides outside in sound-attenuated enclosures. It has no need for air conditioning, spill containment, battery maintenance or, for that matter, a physical building to house and protect the equipment. All of these factors should be taken into account when measuring the operational availability of mission-critical infrastructure.
[i] Eaton, “Maximum UPS Availability”, January 2011
[ii] ABB, “Reliability of Uninterruptible Power Supplies” NW-MTBF/0610
About the Author
Brian Olsen is the Eastern Regional Sales Manager for E1 Dynamics, the North American distributor for diesel rotary UPS systems (DRUPS) manufactured by Euro-Diesel S.A. Focused on global business development, Brian excels at conceptualizing and providing executive oversight for large, complex assignments for critical infrastructure. He has worked with some of the largest data center owners, engineers and facility managers in North America.