The data center is a hub for stakeholders with backgrounds covering a broad range of disciplines: facility-management teams, server and storage admins, database specialists, and network engineers. The disconnect, however, lies in the fact that these groups are often siloed in separate organizational units. This situation makes orchestrated and synchronized collaboration a rarity in many contemporary data centers.
Although most data center infrastructure management (DCIM) professionals declare that bridging the gap between organizational silos and supporting agile methodologies across multiple departments are their primary goals, DCIM involves more than a simple merging of facility and server management. Considering factors such as the rise of innovations in virtualization and software-defined infrastructure (SDx), it is necessary to be mindful of multiple other stakeholders to ensure truly professional, efficient data center operation.
Rather than adopting a big-picture or long-term focus, note that most responsibilities are ongoing and operational concepts tend to evolve over time. Attempts to improve things through internal restructuring are often rendered dead on arrival by the constant influx of changing conditions, new technologies, acquisitions and/or expansions. Such an approach results in a major discrepancy between actual position-specific duties, organizational structures and meeting the demands for flexible, agile data center operation.
Silos Create Virtualization Problems
What do we really mean when we talk about a “data center”? Depending on the perspective, the definition can vary greatly. For some, it’s just the room that houses a company’s servers. For others, it’s the entire building—that is, the “white space” and associated IT rooms, the “gray space,” and various other facilities.
Just as language barriers create communication difficulties, the differing perspectives of individual stakeholders can hinder operational progress. This situation can reduce the efficiency of their processes and methods. In most cases, each team—or silo—implements its own proprietary set of methods and tools for managing the tasks and operating the systems they are responsible for. Although the server team may find it helpful to concentrate solely on its own assets, this insular view isn’t really productive for the organization as a whole. Planning, management and monitoring in a single system are the key to modern data center management.
System virtualization and new programmable/software-defined networking have both caused enormous gains in service-delivery speed. Yet room still remains for improvement in these methods and in the technologies’ planning and introduction phases. What must be addressed is the fact that the deficiency is caused by the whole discussion about “virtual” and “software-defined” technologies. Such discussions often overlook the fact that these assets aren’t entirely virtual and abstract, but rather based on physical hardware resources.
Every private cloud system runs on a physical server, and every data package travels through numerous cables and patch panels. But it’s impossible to detect passive components using autodiscovery tools, and these components cannot be controlled through software. This physical layer also has its own life cycles, its own particular rhythm and—occasionally and unfortunately—its own faults. Without adequate documentation and planning of the real-world relationships, identifying the physical cause of a fault is extremely difficult and time consuming. It’s also impossible to conduct a full contingency analysis for a planned change—or a reliability analysis that addresses each individual layer in the stack.
Adopting a Coordinated Strategy for Digital Transformation
The situation described above intensifies when factoring in anticipated increases in virtualization, cloud, private-cloud and hybrid-cloud services. The ongoing digital transformation will place even greater pressure on data centers to quicken service delivery. Clients are no longer content to wait as long as six months for a bare-metal system by any means. Now, most cloud clients consider a wait time of six minutes for a virtual system to be too long.
Of course, these expectations will continue to rise. They probably won’t stop rising until we reach a point where service delivery is truly instantaneous. The digital transformation not only creates new application scenarios and business models, but it also creates new methods and models for greater agility, flexibility and speed. Twenty-year-old DCIM processes are not always practicable today. Technology evolves, and it's time to challenge the vitality of established operational processes and management methods.
To measure tangible improvements in the use of IT, we must look beyond optimization of individual subsystems and instead focus on the entire system. A classic example of subsystem optimization is deployment of UPS systems with better PUE (power usage effectiveness)—or, rather, pPUE (partial power usage effectiveness). The problem is that improving an isolated subsystem doesn’t guarantee an overall system upgrade. In many cases, the investment costs end up exceeding the benefits. Improving the PUE of an asset can often be difficult to justify if it’s accompanied by more-expensive electric bills. PUE optimization runs the risk of becoming an end in itself, overshadowing all other efforts to reduce costs or dooming them to fail through lack of appropriate means.
Choose a Holistic Approach Over Partial Optimization
To understand partial optimization in the IT environment, we need only consider a transition from one server technology (or one server supplier) to the next.
Just as pizza-box servers were replaced by blade servers, now many data centers are switching to converged systems. Regular tendering for new server hardware often results in a change of supplier. Typically, this change is viewed in isolation, with comparisons limited to the previous server generation or the previous supplier. The assessment covers the procurement and operating costs, a brief examination of the new hardware in terms of processing power and perhaps space requirements or power efficiency, plus maybe the impact on licensing costs.
Many operators, however, neglect to conduct a complete and detailed examination of the changes in current density (i.e., power requirements per height unit in the rack or area unit in the room), heat emissions (heat-flow volume and outlet temperature) or cooling requirements (required air-flow rate, optimum temperature range, air velocity and pressure differences). Yet precisely herein lies a major risk during both the rollout and operational phases.
Another problem with even the most successful partial optimization is that neighboring teams or users are often completely unaware of the change and may inadvertently work against it. Even the most efficient server will still consume too much power if left running in idle mode. Similarly, a basic test system need not run in a Tier III environment if a simpler alternative is available.
The aim should be to optimize the data center’s overall performance. To do so, it’s necessary to consider every part of the organization. That includes all IT, each of the data center components and every stack layer. The senior management team must also approve the objective and add it to the agenda. Rather than focusing on a stylish new office building, for instance, it often makes better long-term sense to restructure the internals or optimize the operational processes.
There also exists a need to rethink redundancy in light of the latest NFV (network functions virtualization) technologies and methods, given that system reliability will increasingly be assured on the application level. Doing so would include, but would not be limited to, placing fewer demands on the data center’s physical infrastructure. Large-scale service providers, who rely on self-healing and auto-scaling at the application layer, have long understood that potential savings are possible. For example, A and B power feeds need not be provided through a UPS, as this layer has a lower required level of reliability. Without planning and monitoring, a management team has no objective and no insight into the facility’s current state.
New Demands on DCIM Solutions
Fortunately for companies that are unable to develop and deploy their own management solutions, a wide range of powerful DCIM products are available. These standardized tools enable planning and operation, process automation, fault prevention and better resource utilization. In doing so, they merge the three major areas of DCIM: planning, management and monitoring.
It’s easy to see why these three components are a natural fit and why no sophisticated data center management system can do without them. When you separate monitoring from management and planning, it’s purely reactive. The operator waits until a threshold value is breached or an alarm sounds and only then starts to think about what to do. Obviously, it’s better to assume a preventative mindset and set measures that will prevent this type of situation from occurring. If possible, this approach should not merely be done through lower threshold values but rather through proper planning, ensuring better operating conditions and an active stance. Conversely, planning without monitoring means there is no way of comparing planned and actual states, verifying implemented changes, or detecting potential deviations before they become a problem.
A proven DCIM solution is the only way to ensure better resource use and faster processes while maintaining or improving system reliability. Using an appropriate data model across all layers—from the smallest fuse to the server, VM, application, and business service—ensures that data center stakeholders have access to planning and analysis/optimization options. The addition of workflows helps to improve efficiency by simplifying recurring tasks and preventing errors while also meeting compliance and security requirements. The result is a reduction in workload, freeing up staff to focus on non-automatable tasks and planning.
Involve the User, Enjoy Success
Bridging the gap is only possible by involving all stakeholders. When all participants use the same database and can access the same up-to-date information, planning reliably and assessing situations correctly then becomes a realistic goal. Share all planning steps with colleagues before making a decision. Defaulting to an email word-of-mouth approach is not the preferred method for synchronizing processes. A spreadsheet program is also not an orderly, revision-proof means of documenting critical infrastructure.
Even though rolling out a DCIM solution to a large number of stakeholders isn’t always easy, it’s an obstacle worth overcoming. Cost-based objections are no longer valid today, as numerous studies indicate a very reasonable ROI of less than one year in some cases.
In short, the question is no longer whether data center operators can afford a DCIM solution, but whether they can still afford to do without one, since inefficient operation will end up costing you more in the long term.
About the Author
Oliver Lindner serves as senior consultant for server management at FNT Software. He has over 20 years of industry experience working as a system analyst. At FNT, he oversees business-line data center infrastructure management.