Information technology operations analytics (ITOA) has emerged as a valuable practice that enables tech managers to increase efficiency. It uses data-science principles to perform pattern discovery, correlation, anomaly detection and root-cause analysis against data collected from underlying infrastructure and applications.
More simply, ITOA provides a way to retrieve, analyze and report data to improve the outcomes of IT operations. No single product or vendor can be a silver-bullet solution; rather, ITOA is an overarching activity that enables IT teams to become the insight engines for their organizations, potentially leading to higher budgets and greater influence over time.
ITOA relies on machine learning to understand behaviors, discover patterns, provide supervised and unsupervised learning for event correlation and anomaly detection, and perform root-cause analysis. This approach creates a way to forecast probable end states that could negatively affect IT-service performance.
Trace3, for example, has evolved the concepts of ITOA to apply across all of IT operations. This feat owes to a “system of action” that breaks IT operations into six main areas or component layers, as the diagram below portrays.
ITOA system of action.
The following is a description of each area:
Monitoring ecosystem. This foundational layer of the stack underlies the entire ITOA framework. A monitoring ecosystem collects telemetry about what’s happening in real time across IT systems. The ongoing activity involves recording and transmitting the readings from data center and network gear, making this bedrock layer noisy, with lots of data percolating up all the time.
System of automation. This orchestration piece allows teams to enact changes in the various networked systems. DevOps automation tools such as Puppet or Chef can be deployed to recognize specific events. When a certain event occurs, the system of automation can trigger the proper response to correct any problem behaviors through self-healing.
System of engagement. This event-management layer can be thought of as a “manager of managers” because it consumes events from across the organization. The system of engagement serves as a window into events such as hardware failures and software crashes, which are then reported to the higher layers.
Data management. This piece sits alongside the system of engagement to collect and store data for longer periods. By assembling these larger data sets, managers can conduct forensic analyses that tease out meaningful patterns and identify performance anomalies. Think of data management as an institutional memory that tracks the contextual history of IT operations. So, if a router were to go down at 8 AM today, it would be flagged by the system of engagement. If the same router goes down regularly at 8 AM every morning, however, the data management piece would signal which related components or issues could be causing the pattern of failure.
System of record. This part is the ticketing platform that generates records of customer0service levels for the operations team. The system of record can create a ticket for any outage, and it can also show network configurations and software settings to enrich the system of engagement. Another facet provides feedback to end customers about what’s happening to their services, as well as updates about the status of outages.
Visualization. This top layer provides the ability to extract all the underlying components to report on vital metrics such as outages, consumption models, total costs and monthly comparisons. The visualization layer is usually a dashboard that’s accessible through a web browser, and the data is presented on the basis of each end user’s role. For instance, a utility-company technician may receive updates about systemwide performance and outages, whereas the consumer would see metrics about home energy use.
Taken together, IT operations analytics requires a choreographed interplay of people, processes and technologies. In many cases, the weakest link is the people. To succeed, they must have the proper technical skills, of course. But equally important, people need to recognize that comfort and change are mutually exclusive. One cannot continue to do the same things in the same ways after they no longer work. IT managers who refuse to change will become obsolete and be surpassed by their nimbler rivals.
All ITOA processes must be clearly defined in terms of IT service-management levels, types of measurements and overall accountability. In many ways, the technology becomes the simplest part. Although thousands of hardware and software products are available, it’s easy to select the proper tools on the basis of a customer’s IT environment and business objectives. Technology sits at the center of all ITOA initiatives, surrounded by the various people and processes.
The benefits of ITOA become clearer as more quantifiable metrics surround IT operations.
Applying ITOA to the customer environment greatly improves the quality of life for IT managers by allowing them to get ahead of looming problems and even predict when such problems are likely to strike—without waiting for end users to report that something has gone wrong again. In this way, IT teams can speed up the mean time to discover outages and the mean time to restore downed systems, thus improving customer satisfaction and increasing their organization’s competitive edge.
About the Author
David Ishmael is the Director, IT Operations Analytics, at Trace3, a technology consulting firm. Having nearly 20 years of consulting experience in operations and analytics using a myriad of enterprise applications, he has been involved in various lines of business that consist of research, system architecture, network engineering and design, performance management, system administration, software development, process development, and project management of highly complex environments for diverse customers in both the private and government sectors.