The data center industry, facing exponentially growing demand for data and networking capacity, is challenging its power distribution and protection partners to provide electrical infrastructure topology solutions—including uninterruptible power supply (UPS) modules—with a wider range of power reliability to protect against utility or system power anomalies and failures. This level of reliability is being calculated not just in time (hours or days), but also by number of events (measured in “single events over years”). For the typical mission-critical data center, the number of failure events matters as much as the duration of the event.
The mission-critical power industry has responded with a wide range of UPS protection topologies that rely on layers of equipment and power-distribution redundancy. This redundancy certainly provides critical levels of reliability, load sharing and efficiency, but at an escalating capital expense (capex) and operating expense (opex) cost.
These redundant topologies (described later) can, at higher levels, provide reliability that the Uptime Institute estimates at less than one event per year and less than 0.8 hours of downtime per year for a Tier IV data center. But it’s fair to ask questions such as “At what cost?” and “For what kind of data center?” or simply “How can we right-size the critical power system to match the function of our data center?”
Right-Sizing Redundancy and Reliability
As the data center market diversifies, some segments and applications will require very little critical power protection (e.g., Uptime Institute Tier I data centers that handle cloud computing for social media or search engines). Others, such as colocation data centers with service-level agreements (SLAs) of 100 percent uptime, video streaming, e-commerce and financial/stock trading strive for Tier III/IV ratings for their mission-critical applications. There also are a range of data center applications in the middle of this tier ranking (Tier II/III) with varying requirements for uptime and reliability.
Each of these Uptime Tier rankings requires a different level of redundancy that must be delivered by the UPS system topology. Each of these topologies can be implemented in several different configurations. The selection of the optimal UPS system depends on important factors including redundancy, load power (in kilowatts, or kW), fault isolation, load sharing, asset utilization, capacity scaling and total cost of ownership (TCO) measured in capex and opex.
The N System Topology
The N system is the most basic critical power-distribution topology, where “N” is the load capacity measured in kW. These systems do not place UPS modules in parallel (or redundant) positions, thereby decreasing system reliability.
This system topology also has multiple “single” points of failure, with failure events of one to two per year, which makes it the least reliable. A single point of failure is defined as part of a system that, if it fails, will stop the entire system from working. For reference, the typical U.S. utility electrical grid averages 24 failure events outside the ITIC/ CBEMA curve per year. Again, for certain low-risk applications, such as internal information technology (IT) processes where failure has no impact on a business or large group of users, this N topology can be very effective.
The main advantage of the N system topology is the low initial acquisition and operational costs (excluding the costs associated with unplanned outages). Another advantage is high utilization rates of the system assets. UPS modules for an N system topology are sized to have a design load of 80 to 90 percent of the full load rating.
The N+1 System Topology
An N+1 system topology begins to add redundant components to improve reliability. “N” is, again, the load capacity and “+1” refers to one additional UPS in the system for redundant power protection. These systems operate UPS modules in a parallel configuration, but they still have multiple single points of failure, including the paralleling bus for the output of the UPS modules. An N+1 system also lacks redundant distribution paths and therefore has some risk of single points of failure with an estimated failure rate of one event per year.
This topology has seen widespread adoption for both call centers and colocation data centers with SLAs of less than 100 percent. It is also suitable for any enterprise with a low dependency on delivering Internet-based services.
An N+1 system topology, with fewer redundant elements and higher utilization rates, has low initial costs and low operational costs. Its higher utilization rate depends on the number of UPS modules or generators required for N load. UPS modules for an N load are sized with a design load of 80 to 90 percent of the full load rating with an additional UPS module and generator added to the system. For example, an N+1 system consisting of two UPS modules will have a normal module loading of 40 to 45 percent, whereas an N+1 system consisting of five modules would still be limited to module loading of 65 percent to 70 percent.
The Block-Redundant (Catcher) System Topology
Another variation of this parallel power architecture is the block-redundant system topology, commonly referred to as a catcher system. This approach is an economical way to improve system reliability without having a complete 2N system. It relies on static transfer switches (STS) and the ability of the catcher UPS module to instantly handle a sudden shift, or step load, by shifting the load from the affected UPS to the standby UPS. In most block-redundant implementations, however, the STS is also a single point of failure and, although the UPS module utilization is improved, it is still limited to 70–75 percent loading to ensure redundancy.
The Shared Redundant (4N/3) System Topology
A shared redundant 4N/3 system topology is very similar to the block-redundant topology, except the load is spread across multiple paths and all the UPSs are loaded to avoid the block loading of the “catcher” system. The 4N/3 and 3N/2 variations are the most common forms of the shared redundant topology, and the utilization levels of these topologies are in the 60–70 percent range. The shared redundant system designation, such as 3N/2, is the ratio of UPS maximum capacity—(megawatts (MW)—to maximum critical load (MW), so the UPS maximum loading utilization would just be the inverse of 2 MW (load) / 3 MW (UPS), which equals 67 percent efficiency.
As Figure 4 shows, this topology also requires a significant cable and distribution infrastructure, which increases the initial capital and installation costs and makes system scaling more difficult. In addition, the system can have single points of failure in the power distribution on the output of the UPS.
Both the block-redundant and shared redundant systems provide higher reliability than N+1 with estimated failure rates measured at less than one event per year. This performance is well suited to most organizations where real-time delivery of data or applications has no direct or significant impact on service delivery, revenue or even corporate reputation. The challenge with these systems is that the maximum utilization is limited to less than 70–75 percent, and the actual utilization is usually much less owing to limited capability to share loads across this power infrastructure. The UPS and critical power assets for block-redundant or shared redundant systems can become stranded and underutilized because the actual critical load often changes when the systems are deployed, thanks to IT loads/servers being added, removed, upgraded or moved during the life of the data center.
The System Plus System (N+N) Topology
A system plus system (or N+N) topology incorporates two independent and redundant electrical-distribution systems. This topology can be designed with either N components in each system or with N+1 components in each system. The two independent systems provide for concurrent maintainability and, in some designs, can be fault tolerant.
A system-plus-system topology provides very high levels of reliability, but it also has the highest initial cost and TCO combined with a low asset-utilization rate (40 to 45 percent of maximum design load). The topology is estimated to experience only one to two unplanned outages (load drops) in a five-year period. These designs are generally used in corporate or financial-services settings where high availability—measured in single events per five years—are core to the guaranteed services (such as an SLA for a colocation center), have a significant impact on revenue, or create corporate operational risk or liability.
The data center industry is dynamic and changing. Shifts in system reliability are required to match data center “mission” and “critical” deliverables. As the evolution of existing UPS system topologies demonstrates, the data center market has a range of systems that can provide optimum levels of high reliability (N+N), but at very high cost. Other options are systems that can reduce cost but with much lower reliability (N or N+1) or systems that provide a middle ground (block or shared redundant), forcing complicated tradeoffs on cost, reliability and utilization. The next challenge for the industry will be to push these boundaries to find new system solutions that provide the right level of redundancy and reliability while driving down both capex and opex and yielding lower TCO.
About the Author
Brad Thrash, product manager in GE’s Critical Power business, is part of a team of dedicated people who work with GE’s data center, communications, computing and content customers that are wrestling with the exponential and insatiable demand for ever increasing capacity. These customers challenge Brad to help them increase data capacity, build better and smarter data infrastructures and improve operational returns. Brad holds a B.S. in mechanical engineering and is a licensed professional engineer. He is a member of the Institute of Electrical and Electronics Engineers (IEEE) and the American Society of Mechanical Engineers (ASME). Brad is also on the Power Sub Work Group of The Green Grid.
 Uptime Institute, Tier Classifications Define Site Infrastructure Performance, W. Pitt Turner IV et al., 2008, p. 9
 Electric Power Research Institute (EPRI), Distribution System Power Quality Assessment: Phase II: Voltage Sag and Interruption Analysis, March 2003, Table 5-12, p. 5–25 http://www.epri.com/abstracts/Pages/ProductAbstract.aspx?ProductId=000000000001001678
 Information Technology Industry Council/Computer Business Equipment Manufacturers Association curve is the voltage immunity requirement that assists the critical power in understanding what technical problem the UPS is expected to solve.