This article is part 3 of the three-part series examining the main challenges of acquiring, implementing and using data center infrastructure management (DCIM). Part 1 presented a broad review on the different functions of DCIM in light of the operational challenges in the data center. Part 2 presented a possible method to expand DCIM from a data center management tool to manage IT, capacity, energy and cost. This final part addresses the applications of computational fluid dynamics (CFD) alongside DCIM.
Integration of DCIM With CFD
Part 1 of this series highlighted a number of instances where DCIM products tried to deliver add-on modules to their capacity-planning and analytics functions. Some of these modules can be dressed up as computational fluid dynamics (CFD), driven by primitive analytics, that will more often than not mislead the operator. CFD is a powerful and well-respected tool used by many industries to study airflow physics and heat transfer. In the data center industry, one can argue that CFD is often misused and misunderstood.
Part 1 also discussed the ability of DCIM platforms to integrate with other systems. CFD is trying to serve many functions in the data center, some of which are discussed in more detail here. Figure 1 shows a modified version of the DCIM workflow model (previously presented in Part 1), based on integration with a standalone CFD package.
Prediction of airflow and temperature using turbulence models and numerical simulations is a complex problem and the subject of continuing research. The turbulence models that represent the unresolved terms in the time-averaged momentum and continuity equations (Figure 1) have been the subject of research for the last 30 years and have led to development of more-robust CFD solver and meshing technology. More often than not, the DCIM add-on modules that look like CFD are based on interpolation (trending) to predict the impact of a change—for example, install of a server into a cabinet and predict the temperature effect at a click of a button. The interpolation model in many of these DCIM add-ons assumes fixed conditions for the previous state to predict the future state, which is as good as a “fingers crossed” approach and often yields erroneous results.
CFD and the Tetris Effect
The application of CFD at the design stage to study different cooling scenarios, load densities, new technologies, airflow containment and open racks is well understood. The challenges are at the operations stage, however. IT equipment can be deployed on the basis of the three criteria—that is, space, power and cooling. The availability of space and power is easier to determine, thus enabling identification of a cabinet with available U-slots and provisioned power capacity from a power-distribution unit (PDU). The third element, cooling, is difficult to determine.
Figure 2: The Tetris effect and its similarities to operational planning in data centers: (a) playing Tetris where the blocks and board are known upfront, and (b) playing Tetris with a sequence of random blocks.
IT managers will deploy and fill out servers in the space while remaining blind as to where the cooling capacity is. As a result, changes in IT configurations in the space are generally invisible to the engineering and facilities teams. The result is the Tetris effect (see Figure 2), where any plan for coordinating the blocks is invalidated as soon the game starts. Now imagine blindly carrying out a plan based on square blocks while ignoring the fact that the blocks are changing. Data center operators commonly follow a plan despite changes to the IT units (blocks). The end result is that the operator arrives at 100% of the data-hall cooling capacity while at 70% of design power capacity, shown in Figure 2b. As mentioned in the preceding parts of this series, the data center owner is concerned with using the maximum capacity in their facility. If they designed and built a 1,000 kW facility, they intend to use all 1,000 kW.
To effectively apply the space, power and cooling determinations, the intelligence and exchange of information between the facilities and IT disciplines must be addressed (see Figure 3). If this communication breaks down, the consequences are catastrophic, resulting in the Tetris effect in the deployment of IT capacity shown in Figure 2b.
The Complexities of Airflow in the Data Hall
The misconception is the air from floor grills will cool the servers. They may do so in the correct application of airflow-containment strategies, but the IT server draws air from the path of least resistance. For a typical floor grill (600mm square) the airflow can vary from 300 to 600 l/s. In general, the higher the static pressure under the grill, the greater the airflow, and the lower the static pressure, the less the airflow (see Figure 4). In cooling terms, this situation can correlate to 3–6 kW of cooling per grill, and generally anything lower than 300 l/s may provide insufficient pressure at raised-floor level to cool the servers in a cabinet; anything higher than 600l/s may simply bypass the cabinets altogether (overshoot).
Figure 5 demonstrates the principle. The cooling units on the left side of the data hall in Figure 5(a) induce a velocity jet in the floor void, which causes the low-static-pressure regions identified in Figure 5(b). These regions correlate well with the reduced net flow from the affected floor grills identified in Figure 5(c).
The concern here is in addressing the high-velocity jets from the scoops on the cooling units. Other common considerations include the management of floor tiles in relation to the IT load—for example, determining the appropriate number of floor grills to maintain reasonable static pressure from cold aisles in front of cabinets. Too many grills can reduce static pressure in the floor void and starve cabinets of air from the cold aisle, promoting the undesired effect of second-hand air cooling.
The second complexity that must be addressed is temperature. The temperature distribution in the data center depends on several factors, such as cooling-unit configuration, airflow, spatial configuration of servers and load density. Data center owners commonly install cold-aisle containment in their data halls. This approach is quite often sold to them with the energy-savings tag or to increase reliability at higher operating temperatures by reducing mixture of cold and warm air. More often than not such installations blindly eat into the availability of cooling capacity elsewhere in the data hall and may present only a superficial return on investment (ROI) to the data center owner. Figure 6 shows such an example, where the cold-aisle containment pods are overcooled to 18°C, whereas equipment in the legacy region experiences temperatures close to 24–26°C. In this case, the cooling units close to the legacy equipment are working less hard than those serving the cold-aisle containment pods, resulting in the temperature differential observed in the floor void
Challenges of Using CFD
Even the most established CFD packages face challenges. The first and most obvious is computational power, which was a more prominent issue 10 years ago. But with the development of turbulence models and smart grid meshing, time-averaged CFD can be conducted relatively speedily on a good modern laptop with a high-spec processor, plenty of ram and a decent graphics card to view the output. To be clear, we are referring to hours, not the seconds that some vendors claim. If the output takes minutes to seconds, it is likely using short cuts, primitive meshing and analytics, which can lead to unreliable results.
The use of unsteady CFD to model temporal behavior in the data hall—that is, analysis of the rate of increase of temperature following a site power interruption—can also be carried out using the same hardware, but at a greater computational time.
The next important challenge is calibration. In general the output from a CFD model can be very detailed, but it’s based on many assumptions. This situation may be acceptable during the design phase for an existing site, but following the construction of a model, the model needs calibration against the facility parameters to establish a starting baseline. This effort may include matching the airflow volume from the grills, followed by the temperature and flow parameters at the cooling units, against the model. Skipping the calibration process can result in the conveyance of unreliable information to the data center operator and can have harmful consequences on their operations.
The most common problem is a third party carrying out a six-month CFD analysis and advising on capacity planning, hot spots and so on. This approach results in repeated, unnecessary setup costs, which are usually passed on to the data center operator. Given the rapid changes in a modern data center, the expiry date on a calibrated model is no more than two to three weeks. Therefore, to get the full value of CFD, the analysis must remain in house and within the workflow of the facilities and IT teams (see Figure 7) and must continuously aid the decision-making process. The ongoing recalibration process is part of the engineering cause and effect that ties the IT and facilities teams into the data center politics chain (see Figure 2).
Using CFD as an Operational Tool
An important part of the engineering cause and effect are the changes outside the whitespace (i.e., equipment maintenance) that can result in consequences to the IT equipment, and vice versa. For example, the maintenance of a power-distribution board feeding the cooling units in the data hall, and the effect on the environment with N cooling units in operation during the maintenance mode. Use of CFD to assess the impact of such scenarios on the data hall, before implementation, is well documented. One of the most important operational processes is regular assessment of space, power and cooling, as well as determining where to install new servers.
In the example shown in Figure 8(a), the data center operator has a virtual model of the facility with the IT inventory. The IT manager identifies the requirement for new servers to be deployed, and installs them in cabinets on the basis of available space and power (Figure 8b). The virtual facility model, however, predicts that such a configuration will create cooling problems as Figure 8(c) highlights, where the circled region indicates average server air-intake temperatures greater than 27°C. The facilities personnel are then responsible for identifying the appropriate location for these servers using the virtual model and for coordinating with the IT manager. This effort may involve several iterations of the workflow model in Figure 7. The end result is an ever evolving cooling map of the facility as Figure 9 shows, with cooling availability across different cabinet locations.
Needless to say, in some situations—such as the uniform application of airflow containment or data halls with ducted hot-aisle return to cooling units (no mixing of the airstreams)—common sense can go a long way. For those scenarios with staggered equipment in open air and mixed airflow containment with legacy equipment, CFD as an operational tool is not a nice to have but a necessity to fully utilize the data center design IT load.
This article has highlighted some of the data center challenges addressed through the correct use of CFD. This technology addresses the bidirectional transfer of intelligence from the IT teams to the operations teams.
The discussion reviewed the importance of integrating with an established package, rather than an add-on module to the existing DCIM product. The solver and meshing technology lie at the core of the CFD engine. Although these elements are both hidden from the user, they are developed to be robust and adaptable to different data center environments. Even the most established CFD packages face challenges in solving the complex airflow physics problems in the data hall.
CFD is best kept in house with the data center operator as an operational planning tool, as opposed to outsourcing simulations. This approach allows the operator to regularly predict the physical response of the data center in different scenarios in a safe offline environment before deployment. But more importantly, it helps avoid the classical Tetris effect in IT-deployment planning. Most importantly, the continuous calibration of the model is pertinent to unlocking the full benefits of CFD for the IT and facilities teams.
About the Author
Ehsaan Farsimadan is Director of Engineering at i3 Solutions Group. He previously worked as a technical consultant for the Uptime Institute, being responsible for data center design and facility Tier certifications. Before Uptime Institute, he served at Romonet as Head of Modelling and Customer Engineering, being responsible for the company’s client engineering and consulting engagements worldwide. He is a mechanical engineer with diverse industry experience. Ehsaan also previously worked as an M&E design consultant at Cundall, where he was responsible for developing data center concepts to scheme design in addition to leading the modeling team. He has developed specialities in data center predictive modeling and IT services, and he has good knowledge of electrical systems. Ehsaan is a chartered engineer accredited by the IMechE. In 2008, he obtained his doctorate (PhD) in mechanical engineering and also holds a bachelor’s degree in mechanical engineering with aeronautics. He has made significant contributions to the field of turbulence research and the data center industry through publication of a number of journal papers.