Are You Positioned for Success When Your New Data Center Begins Operation?

February 16, 2012 2 Comments »

Data Center Operations

Once approved, a new data center project carries high expectations and becomes one of the
most visible objectives you will be measured on during your career. A major interruption to
processing within the first year of operation will cause much more than embarrassment for the
organization, particularly with today’s level of public visibility. Yet, this initial break-in period
typically presents the highest risk. Have you adequately prepared your Facilities team for the
successful start-up of your new data center facility? A carefully implemented strategy will
significantly enhance your chance of success.
Over the last fifteen years, the industry has quantified the importance of devoting considerable
attention to a Facilities Operations Plan when a new data center design project is initiated. Even
though electrical and cooling system designs are more robust than ever, typically allowing for
system maintenance without a data center shutdown and often eliminating most single points of
failure, these facility designs do not eliminate the risk of interruption.
Human error is consistently found to be the primary cause of facilities-related computer
downtime events. As Robert McFarlane, principal and data center design expert at Shen Milsom
and Wilke stated in the June, 2011 edition of Search Data Center.com: “Reputable studies have
concluded that as much as 75% of downtime is the result of some sort of human error.”
An example is found in this article excerpt from the August 10, 2007 issue of Computer
Weekly.com: “The Cisco website went down for three hours on Wednesday… Cisco
experienced some facility issues that impacted Cisco.com…Cisco confirmed that the outage
was due to human error at one of its data centers.”
Creating and implementing a thorough plan for Facilities Operations is the best means to apply
this knowledge and achieve optimal systems performance. From experience gained through
more than 150 data center facilities consulting engagements and in managing the start-up of a
critical data center facility, the following strategy is recommended.

Staff Design

The design and structure of the department that will operate the electrical, cooling, and fire
detection/suppression systems in the data center is the first step. For example, a minimum of
two trained individuals per shift on a continuous shift schedule are required to effectively
intervene when a generator or cooling system fails to start automatically.
This number also ensures personnel on each shift will be able to perform productive work,
instead of simply serving as shift “watchmen.” Counter-intuitively, employing two individuals per
shift will actually show a cost savings compared to a single-person-per-shift plan, due to the
elimination of some contracted work. A sample organization chart for this level of support
follows.
Annual objectives for this group should include collective goals for consistent facilities systems
uptime and successful/safe completion of all assigned PM tasks and customer requests.
Individual objectives should vary by position, allowing ownership of specific systems, tasks, and
projects to be clear. The Facilities team should report to the internal organization that will ensure
it receives the best ongoing communications, funding, and support in order to meet the data
center’s specific objectives.
If your company can afford an interruption of computer operations due to facilities system
failures roughly once a year, the industry average, you should be able to operate with a
substantially smaller Facilities staff. Once an appropriate staff plan for your business
requirement is developed and annual objectives are defined, a schedule for hiring is needed.

Hiring Schedule

Most owners fail to hire the Facilities team early enough in the design/construction process.
Involvement in construction monitoring will pay off over the years you operate the new facility.
For example, root cause analysis during a system failure will be greatly enhanced with detailed
knowledge of system construction and configuration. This personal observation during
construction can often make the difference between outage avoidance and the need to explain
“what went wrong.” Several of your Facilities team members should be hired in time to
participate in factory witness testing of equipment, as well as systems commissioning¾once the
equipment is installed at your new facility.

Procedures and Training

With a fully developed staff plan and a hiring schedule in place, your next objective should be
defining site-specific procedures and training programs with detailed schedules. Just as airline
pilots must be trained and certified on specific models of airplanes, data center facilities systems
training must be customized to the unique systems configuration at each site. Many owners
assume the general training provided by equipment manufacturers will enable the Facilities
team to confidently operate new systems without error. Although critical facilities-experienced
individuals should be sought as you make hiring decisions, they will need the benefit of
procedures and training specific to the system configurations they will be responsible for.
Depending on the complexity of your facilities systems configuration, the number of emergency
response and system transfer procedures required will range from 50 – 200. If you have not
contracted with your design engineer(s) or commissioning agent to develop these, an
operations consultant should be engaged to develop, or assist one of your staff with developing
the needed procedures before the building is completed. Regardless of who is responsible for
development, each procedure should be “tested” individually with your Facilities staff members
for clarity before they are finalized. The process to create and test procedures is normally a
three to six month endeavor.
A clear, concise and consistent procedure format should be employed; one which includes a
means to “check off” each step as it is completed. One team member must read aloud the
desired step and the second individual must repeat back what they are about to do before
proceeding (as a pilot and co-pilot would). Failure to follow this simple process is the cause of
an alarming number of downtime events.
Training programs for your Facilities team should include:
· Initial testing of new procedures as they are developed
· Systems overviews provided by design engineers
· Manufacturers’ provided training on individual systems/components installed
· Participation in integrated systems commissioning
· Scheduled practice time
· Repetitive site-specific training (monthly sessions)
Your Facilities group should manage and dictate the schedule for each of these training
programs. Testing of procedures should be spread evenly among your Facilities staff, who will
each be working with the person responsible for creating and refining the documents. Systems
overview training should be presented to your team as a group, prior to the manufacturers’
training, so that “how all the pieces fit together” is understood first. Manufacturers’ training
should be spread over multiple weeks and follow a consistent format, so your group may better
retain the information (no more than three sessions in one week). As your group participates in
commissioning, they will visually and audibly experience how systems perform individually and
collectively. If done professionally, video recordings of systems overview and manufacturers’
training sessions will be a valuable resource for re-training and future new employee orientation.
By far the most valuable training opportunity is a 1-2 month window of practice time scheduled
between the completion of commissioning and the “go-live” date for the new computer
operation. Organizations are increasingly recognizing the value of this process in imparting real
confidence in the normal operation, system transfer, and emergency response methods for all
facilities systems, while there is no risk of downtime. There will never be another opportunity
once operation has begun.
Ongoing monthly site-specific training sessions should be developed by your Facilities team
with a focus on which emergencies, typically system failures, they wish to be most prepared for.
Emergency response procedures and systems overview documents will be the basis for these
sessions. Individual “system experts,” such as manufacturers’ installation technicians, design
engineers, service providers, and some of your own staff members should serve as initial
trainers.
This program will provide an annual chance for your team to simulate the desired response
when an emergency occurs. Similarly, system transfer procedures will be the basis for another
form of training, as planned preventive maintenance activities are conducted throughout the
year.
January Generators, controls, and fuel systems
February Life safety and evacuation
March Cooling towers, chillers
April UPS, UPS switchgear, batteries
May CRAHs, AHUs, VFDs, VAVs
June Fire detection and suppression
July Water treatment, heat exchangers
August BMS system
September Pumps, valves, motors
October MCCs, grounding, TVSS
November Power distribution, PDUs, RPPs
December EPMS system
Confidence instilled through practice will pay off. Without repetitive training, your staff will be
trying to “land the plane” from only the memory of the initial training provided when construction
was completed.

Operations Budget

Following completion of a plan for procedures and training, the next step will be to define an
appropriate operations budget. Expenses for individual data center facilities vary widely due to:
· Cost of electric utility service
· Levels of systems redundancy and concurrent maintainability
· Load variances due to the variety of computer hardware models installed
· Salary and benefits for the Facilities Operations team
· Union vs. non-union staff
· Single shift vs. continuous shift coverage
· Number of employees per shift
· Outsourced vs. in-house staff
As a result, your first year’s operations budget will necessarily be less than accurate. If your
company has operated a data center previously, that expense history should be utilized when
developing the new facility’s initial budget. Historical expenses within the same region will be the
most useful. Benchmarking with other data center operations in the local area can also be
helpful.
Utility expenses are consistently the largest operating expense in a data center facility. Your
electrical design engineer and your electric utility representative can help you estimate your
initial year’s utility costs. The second highest cost category will be labor and related expenses
for Facilities Operations staff, whether in-house or contracted. Collectively, contracts for
preventive maintenance on major systems will be the next highest expense, depending on the
amount of in-house work you are staffed to perform.
Remaining categories will include:
· Training programs
· Procedures refinement
· Spare parts, materials, tools, uniforms, supplies (first confirm what the construction
budget will cover)
· Janitorial (include sub-floor cleaning)
· Grounds maintenance
.
A more accurate and predictable operations budget will become evident after one year of
operation. Because each data center facility is unique, operating cost comparisons with other
facilities cannot be truly meaningful. Only year-to-year comparisons for the same facility, after
factoring in load variations, will permit a comparison of operating efficiency. The most important
measurement of a successful operations budget continues to be: Adequate funding to perform
consistent preventive maintenance, while employing a staff sized and trained to effectively
respond when a system fails.

Additional Processes

In addition to a well implemented Facilities Operations strategy, every new data center
operation will benefit from carefully articulated control processes that apply to all who work in
the facility, including:
· Specific data center work rules, which are thoroughly reviewed and signed by each
individual before entering the facility the first time (and again annually)
· Limited access – minimize those permitted unescorted access
· Shipping/receiving only on a planned basis – unscheduled deliveries turned away
· Computer hardware installation planned in advance by a team of IT and Facilities
individuals
· Power and network cabling connections performed only by designated and trained
individuals
· Team development
o Clearly defined IT and Facilities relationship, mutual expectations (or SLAs),
shared incentives
o Defined Data Center Facilities and Office Facilities relationship (if other buildings
on campus)

Summary

A data center facility will operate successfully if the Facilities team is provided management
support, appropriate resources, site-specific systems experience, and the resulting confidence
level. Effectively deploying this Facilities Operations strategy and the additional recommended
control processes will provide for a much higher reliability potential in the first year of operation.
Continued application will dramatically increase continuous facilities systems operation potential
over the life of the facility. With these practices in place, you may realistically achieve multiple
years of continuous facilities systems availability¾a multi-million dollar savings when compared
to the average operating experience in the critical data center industry.

Author’s profile:

David Boston is President of David Boston Consulting, a firm specializing in Critical Facilities
Operations consulting. While Facilities Manager for GTE Data Services from 1985 – 1995, his
department achieved multiple years of uninterrupted uptime for a 100,000 square foot data
center facilities operation with a continuous operation objective.
Boston was responsible for helping member companies achieve continuous availability as
Program Director for the Site Uptime Network, a critical facilities consortium, from 1996 through
2009. During the same period, he assisted data center management teams with Facilities
operations objectives and led a team of electrical and mechanical engineering consultants in
providing facilities system assessments as a consultant representing the Uptime Institute.
Since 2006, David Boston Consulting has independently assisted clients with the creation of
critical facilities staffing strategies, the development and testing of comprehensive training and
procedures programs, the facilitation of detailed working agreements and improved interaction
between Information Technology and Facilities Operations groups, and the functional layout of
new data center facilities work spaces. He may be contacted at 727-595-3039 and
dfboston@DavidBostonConsulting.com.

2 Comments

Leave A Response

You must be logged in to post a comment.