Any parent of a two-year-old appreciates the power of speaking a common language. Nothing is more frustrating to my two-year-old son than his inability to communicate what he wants. Learning to say things such as “milk” and “applesauce” has transformed the breakfast experience.
I see this difficulty in my world as well. Millions of dollars are spent in the software world translating from one “language” to another. In fact, a whole industry has popped up around shared application programming interfaces (APIs) to standardize how systems communicate. Despite this trend, more emphasis seems to be on the communication than on the data. Data-analytics products in particular seem happy to shove all types of data into one mold, since the output is the same.
It’s important for data analytics to take the road less traveled. Data must be treated “natively”—in other words, don’t try to shove a square data peg in a round systems hole. Just like speaking a language natively changes the experience of travel, speaking the native language of data transforms the analytics experience. In particular, data analytics must be at least bilingual, speaking the native language of both raw logs and time-series metrics. And here is why it matters—according to my toddler.
Answer My Question—Quickly
Toddlers are well acquainted with the frustration of not being understood. Everything takes too long, and they need it now. And you know what? We have all been in that situation. Not being understood is torture. This problem is elemental in the world of analytics. If you store data in a nonoptimized format, every query must be translated from one language to another, and it takes forever.
Let’s take logs—essentially, records of something that happened—compared with performance metrics, which are measurements of something over time. When you log data analytics, you optimize for searches looking for text and patterns by indexing on those things. Doing so enables the equivalent of speed reading. That approach is completely incompatible with performance metrics, where you want to optimize for grabbing lots of data points at once and processing them as a unit, since you are typically looking at the behavior of that metric over time. You can try to use a log system for storing millions of data points, but eventually the laws of physics intervene. You can only make it so fast. Instead of one or two simple operations (“I want this metric for the last three days”—“sure, here you go”), the log system must do a few orders of magnitude more operations (“I want this metric for the last three days”—“what is a metric?” “search for this text”; “let me grab all of those indices”; “let me extract that data”; and so on). It takes too long. And like my toddler, you will just stop asking the question. What’s the use in that?
Cleaning Up Is Hard
I’m always amazed at how my two-year-old can turn a nice stack of puzzles or a bucket of toys into a room-size disaster zone. It’s the same components, but vastly different results. Storage optimization is essential in the world of operational data. A natural assumption lies underneath a true log-analytics system: we assume on some level that each log is a special snowflake. There is, of course, a lot of repetition, but the key is to be flexible and optimize for finding important terms very quickly. To be exact, you need to allow the log to be unstructured—not conforming to a strict, predetermined pattern—while also making sure that you can find the logs quickly. So, high-performance, scalable log-analytics systems must be enormously flexible and resilient and must allow for all of the complexities of log patterns. It’s worth mentioning that a trend these days is to make the problem simpler by forcing structure on the logs and so forgo the complexities that a flexible approach involves. The problem with that approach is that by presupposing structure you inevitably force too many decisions early in the process of designing your data structure. And it’s too late to change things when you realize your mistake at 2 a.m. during an outage.
Metrics, on the other hand, are repetitive by design. Every record of a measurement is the same; in other words, metrics are structured. So once you know you are collecting something (say, system CPU performance on some server), you need not capture that reference every time. You can optimize heavily for storing and retrieving long lists of numbers. Storing time-series metrics as logs is, of course, extremely wasteful. You can incur anywhere from 3x to 10x more storage costs, as well as much lower analytics performance. To achieve the same performance that most metrics systems can reach, you are looking at 10–20x higher storage costs. For this reason, no pure log-analytics products are used for performance metrics at scale; the immense costs just don’t justify the tool reduction.
I Want to Play With My Cars Anywhere
One of the funniest things about my son is how he plays with his toy cars. He has racetracks, roads and other appropriate surfaces. He rarely uses them. He prefers to race his cars on tables, up walls and on daddy’s leg. The flexibility of the wheels is essential. He has other “cars” that don’t roll—he doesn’t really play with them. The same is true with data analytics. Once you have high performance with cost-effective storage, uses just present themselves. Now you can perform complex analytics without fear of slowing the system to a crawl. You can compare performance over months and years rather than minutes and hours, because storage is so much cheaper. Innovative use cases will always fill up the new space created by platform enhancements, just as restricted platforms will always restrict the use cases as well.
So, it’s 2 a.m. Your application is down. Your DevOps/ops/engineering team is trying to solve the problem. They can either be frustrated that they can’t get their questions answered, or they can breeze through their operational data to get the answers they need. It depends on the tools you do or don’t choose. So choose wisely.
About the Author
Ben Newton has spent the last decade and a half in the world of IT. He is a principal product manager for Sumo Logic and is part of a team focused on a ground-breaking approach to machine-data/big-data analytics. Before Sumo Logic Ben worked at LoudCloud and BladeLogic. He is interested in current discussions about DevOps, big data, machine-data analytics and so on because they recognize the need to address both technological and organizational challenges. Follow him on Twitter @benoitnewton.