High availability is the ultimate goal of all cloud-service providers (CSPs). It’s a testament to their reliability and standard of service, providing a measurement of their ability to remain continuously operational. But in their quest to attain the highest number of “nines” in their service-level agreements, players in the cloud industry may have been neglecting the true meaning of availability all this time.
The cloud storage industry has traditionally approached availability from the inside out—it measures downtime according to how long the infrastructure stays running. But given how we have trouble at home keeping even our lights on constantly, external factors that are out of our control may pose even greater problems.
Therefore, the way the cloud industry measures availability needs to evolve. It’s no longer sufficient to measure availability according to the cloud-service provider’s infrastructure, but through the perspective of the customer instead. This approach takes into account all issues that could affect availability, in what ServiceNow defines as “real availability.”
Real availability captures the real user experience from end to end. That includes everything in our control (such as our infrastructure and network), and it also factors out of our control (such as customer or third-party providers).
Even if they achieve a 100 percent uptime in their own network, cloud-service providers still must recognize that services being used by the customer are only as good as the weakest point in the process. For this reason, it’s insufficient to simply consider the factors in our own infrastructure that might lead to more down time or further disruption.
Hardware failures on the customer side or an outage at the Internet service provider are factors that can also reduce the overall availability of the services. And although you should do all you can to avoid being the weak link, from a customer’s viewpoint, a disruption is a disruption regardless of the source.
The industry must fine-tune the way it measures (and subsequently addresses) availability. It’s no longer enough to measure availability up to the provider’s infrastructure—CSPs must expand their definition of availability to look at the root causes of incidents, be they on the customer’s end, third-party provider’s end or the CSP’s end.
For instance, even if the root cause of downtime is a fault on the customer’s wireless network, having nothing to do with the provider, real availability will still take into account the outage time associated with the incident. This is so because even if the root cause doesn’t originate from the CSP, the framework takes into account the availability that the customer experiences. ServiceNow adds these incidents to its real-availability calculation because it’s the actual service availability that the customer sees.
A Change in Perspective
By shifting your focus to see the situation as the customer sees it, and by providing a real-world view of their availability, CSPs can take the necessary steps to change the way the industry looks at and addresses issues related to availability.
Earlier this year, Singapore’s Infocomm Development Authority (IDA) released a set of cloud outage incident response (COIR) guidelines to outline resilience measures in preparation for cloud storage downtime. The responses in these guidelines are tailored according to the severity of impact to the customer, which varies greatly from that of an air-traffic-control failure to that of downtime on a general information website.
In a similar fashion, to determine real availability for customers, providers must look at every incident that precipitated each unique customer disruption. In our experience, incidents in a customer’s network fall into one of the following four categories:
- Incidents caused by service provider’s infrastructure: This category includes any and all disruptions that occur in the service provider’s infrastructure.
- Incidents caused by software on a service provider’s platform: Falling in this category are additional software programs from the service provider that experiences a glitch or outage.
- Incidents caused by third-party provider: This category comprises third-party solutions such as a customer’s Internet service provider, data center management or hosting-services provider.
- Incidents caused by the customer: Here are internal customer network issues, authentication issues and customer use of service-provider offerings in ways that affect service.
By thinking of availability in the above categories, we can measure incidents and not synthetic transactions. This calculation assesses how availability is being experienced more effectively and accurately than error logs generated by a monitoring system, because it’s availability from the customer’s perspective.
More importantly, it sends a signal to customers that you are working in partnership with them to identify and resolve their issues, even self-inflicted ones.
Moving From Supplier to Partner Is Good Business
It’s crucial to help customers manage the situation when disturbances occur, including identifying the source of the disruption. By considering all points of the process when identifying factors that could lead to downtime, you are working with your customers in an active and collaborative way. This partnership and transparency is critical to your customer relationships and will dramatically improve the customer experience.
Evolving from a role as a supplier to a partner dedicated to a customer’s success also makes good business sense. Although many cloud providers focus on acquiring new customers, industry studies show it can cost seven times more than customer retention. Broadening your focus to take into account the real availability and health of a cloud service can pay off in the long term.
About the Author
James “Jimmy” Fitzgerald has been the Vice President of Asia Pacific & Japan at ServiceNow since 2013. He had joined ServiceNow in 2011 as Global Vice President of Professional Services, and previously spent five years in Asia working for major players in the industry.