“Once upon a time, there was a little network...”
No, there wasn’t. Your network was never little, or if it was, it didn’t stay that way for long. Corporate environments are meant to grow, and despite careful planning, they grow in unexpected ways. We may keep a tight rein over our firewall rules only to lose sight of the explosive growth of wireless devices on the factory floor. We may carefully manage our remote-location designs but end up blindsided when the central office runs out of switch ports.
There may have been a simpler time, but it has passed. Recently, Honeycomb.io founder Charity Majors observed that the time has come when “you can’t hold the entire system in your head or reason about it; you will live or die by the thoroughness of your instrumentation and observability tooling.”
We live in a world where “the network”— by which I mean the entire confederation of devices, virtual machines, spontaneously instantiated containerized microservices, and the routing and switching that acts as plumbing to get data from one point to points unknown—is too big, fluid and dynamic to be diagrammed, listed on the command line, or, as Charity says, “reasoned about.”
What’s a poor tech pro to do in such a world? In short, you need to up your monitoring game. You need tools and solutions that are just as dynamic as the environment they watch over. You need software that can employ all the pillars of observability, including monitoring, visualization, tracing and logging.
To paraphrase the dictionary: to be scalable, something must be easily expanded or upgraded on demand. The important word here is easily. The other important word is on demand.
The hard truth we must accept as we take our first tentative steps toward a scalable monitoring solution is that nothing—neither baking, dogsitting, party planning nor building a network—is inherently scalable. Scalability requires planning. It requires the designers to think not just about the immediate goals of the project in front of them, but also about potential downstream needs. While envisioning those needs, they need to allow for scalability by creating ways to expand and change, which is no small order.
In terms of IT life, scalability means allowing for additional ports on a switch, additional slots on a rack, additional space in the data center floor plan, and additional power drops and cooling capacity. It might mean provisioning a chassis with two more slots than you currently need, to support more VMs in the future.
But designing for scalability doesn’t just mean more power. It means adding flexibility up front, along with documentation and standard procedures that take advantage of that flexibility. Scalability as a concept applies to monitoring solutions in two distinct ways:
- The monitoring solution should be able to easily increase in capacity.
- As the monitored environment grows, the monitoring solution should accommodate the larger quantity of devices, applications, services and so on (with or without scaling itself).
It’s important to understand these two aspects because although the second relies on the first in some cases, it’s occasionally necessary to scale the monitoring solution, even when the enterprise environment has experienced no measurable change.
Like most IT work, scaling your monitoring solution isn’t a project that exists in a vacuum. Will you be designing scalability into a new solution or scaling out an existing one? Is the scalability issue you’re facing related to the number of devices, the quantity of data or the breadth of technologies you’re trying to cover? Depending on what you need to accomplish, you must consider things like the following:
- Device count
- Device type
- “Element” count
- Data sources
- Data volume
- Polling cycle
- Device location
- Firewall rules
Specific to the monitoring solution, you’ll want to have a solid grasp of things such as
- Database design
- Hardware requirements
- Upgrade cycles
Meanwhile, I see clients fall into a common set of pitfalls when building their monitoring solution. Things like the following:
- Promoting a proof-of-concept system to full production
- Failing to keep the solution patched and current
- Allowing interrupt-based messaging volume (i.e., trap and syslog) to overwhelm the solution
- Being “penny wise but processor foolish” by failing to add new hardware as needed
(If it seems like I’m just rattling off a laundry list of items here, it’s partially true. The details of all of these elements would easily consume five or six posts. Instead, I invite you to read the whitepaper version.)
Technologies such as the cloud, hybrid IT, the Internet of Things, and containerized services are upon us, with obvious benefits to the business but also consequences to technology professionals as we try to wrestle each one into a form that works for our organization. At the same time, near-future tech such as machine learning and artificial intelligence hold the promise of huge gains in the C-suite but an equal promise of disruption and headache in the data center. And all those trends point to an explosion of new infrastructure, device types, software and platforms.
My point is that as monitoring experts, we seldom have control over how, when or in what direction the environment expands. Our best course of action is to ensure that our solutions are scaled for today’s reality and are both capable of and prepared to scale when the visions of tomorrow materialize. Hopefully within these pages you’ve found some resources, techniques and philosophies that will serve you in that effort.
About the Author
Leon Adato, SolarWinds Head Geek and longtime IT systems management and monitoring expert, discusses all things data center in this ongoing series.