If you’ve spent any amount of time working in infrastructure operations or engineering, you’re likely to be familiar with the term noisy neighbors. The generally accepted definition of a noisy neighbor is a cotenant or application that monopolizes bandwidth, disk I/O, CPU and other networked resources and that can reduce other users’ application performance. Bandwidth, for example, carries data throughout a network, so when one application or instance uses too much, other applications suffer from latency or poor response times.
The noisy neighbor issue is ubiquitous, striking corporate data centers along with software-as-a-service and infrastructure-as-a-service solutions. Any host that may serve over a hundred virtual machines (VMs) creates a condition ripe for a “rogue” VM to command more than its share of CPU, memory and/or bandwidth, much like how neighbors in an apartment building can cause more disruptive noise than in houses separated by gardens or yards. This misallocation of resources deprives other VMs—as well as mission-critical apps—of the infrastructure services they need to deliver the high performance that users expect.
Creative strategies have arisen to help infrastructure managers mitigate the application-performance issues created by noisy neighbors. These solutions include, but aren’t limited to, implementing dynamic provisioning of VMs and load balancing techniques to democratize access to bandwidth, massive overprovisioning of infrastructure resources, deploying a bare-metal cloud to create a single-tenant environment, and making sure your flash storage arrays are in the best shape they can be. But what if you could identify the troublemakers before the rest of the neighborhood suffered?
The way to identify disruptive cotenants and prevent application-performance issues can be likened to establishing a neighborhood watch in your local cul-de-sac. The warning signs that your neighbors are planning a party are hard to ignore—if you know what you’re looking for. It’s easier to ask your neighbor to turn down the music while the night is still young than it is to pick up the beer cans from your lawn and repair the broken window the next day. Implementing an infrastructure-wide application-centric monitoring tool that analyzes workloads in real time can help find and flag those warning signs without disrupting the neighborhood’s day-to-day activities.
Context Is Critical
The kind of visibility you need to be able to prevent noisy neighbors in the first place requires a level of real-time monitoring that provides context regarding applications’ performance and dependencies, not just measurement. For example, seeing that your neighbor has two or three cars in the driveway may not be cause for alarm, but what if you live in a duplex with a shared driveway and you know your neighbor lives alone? Those two or three extra cars, then, would seem excessive. The same idea applies to your infrastructure. By not only monitoring the performance of your applications but also providing context into how and where they’re being used as well as how this utilization compares with historical data, you can ensure that the warning signs are accurate.
Let’s take a real-world example from a large communications-service provider. Thanks to the widespread customer acceptance of a new mobile-phone launch, response times for the provider’s critical cell-phone activation process degraded rapidly. Consider that when someone orders a new phone, either online or in a store, a multitude of critical applications and processes go into motion: account setup, service provisioning, billing, credit check, inventory management, warranty notification and so on. This end-to-end process relies on different interdependent applications running in environments that may be shared with less critical applications.
This communications-service providers’ highly-regarded application-performance monitoring (APM) tool showed evidence of a slowdown, but it was insufficient to find the root cause of the performance problems the provider was encountering. Likewise, its network-performance monitoring (NPM) tool showed hints of the slowdown but wasn’t capturing the metrics that would have uncovered the problem.
Because this service provider considers IT to be a competitive edge, it was under pressure to maintain superior customer-facing application performance, as activations are frequently the first opportunity for customer engagement.
By implementing an application-centric infrastructure-performance monitoring (IPM) solution, the IT staff discovered that a previously scheduled backup application started during the periods of high demand for the applications that supported activations. As is typical, they didn’t instrument every application with their APM tool, owing to the expense, and the backup application was one of the uncovered apps. Because the process occurred on the storage network, the NPM tool failed to detect it. Once the backup was rescheduled, performance immediately improved and further issues were successfully avoided.
To bring it back to our analogy, the cars are starting to clog the street and the music is getting louder. It’s time to intervene, but you don’t necessarily want to get the police involved. Because you’ve kept close tabs on the activities leading up to the party, you can take preventative action before the problem becomes too big to handle. Some preventative measures can include offering additional driveway parking for cars parked in the street (reallocating bandwidth to more-important services), asking your neighbors politely to turn the music down (scaling back non-mission-critical applications) and helping the neighbors break up any loud arguments (shutting down power-hungry services temporarily to ensure the performance levels of other applications).
Don’t Get Lazy
You’ve successfully managed to get through the night without a 911 call or barrage of car alarms, but the work isn’t over yet. Just as you wouldn’t disband the neighborhood watch after a burglary is resolved, don’t stop monitoring your infrastructure after you address a performance issue, even if it was only a potential performance issue. As IT environments continue to increase in complexity and neighborhoods (data centers or multitenant cloud environments) become more crowded, the noisy-neighbor issue is likely to worsen.
By remaining vigilant and continually monitoring for resource-hungry applications, you can detect and expunge noisy-neighbor problems now and ensure that your organization’s mission-critical applications will be undisrupted. After all, a peaceful neighborhood makes for happier—and hopefully quieter—neighbors.
About the Author
A 25-year-plus product-marketing veteran, Jim Bahn specializes in developing and implementing go-to-market strategies for SMB and enterprise markets through both direct and indirect channels. He’s held roles in product, channel and solutions marketing, as well as sales, business development and engineering management. Jim specializes in marketing for storage, virtualization, private/hybrid cloud, high-performance computing and data-protection solutions. He serves as the Senior Director of Product Marketing at Virtual Instruments.