Industry Outlook is a regular Data Center Journal Q&A series that presents expert views on market trends, technologies and other issues relevant to data centers and IT.
This week, Industry Outlook asks Stephen Goldberg, CEO of HarperDB, about what’s happening in the IoT database industry. Stephen has previously founded two startups and most recently was CTO at Phizzle, managing product, engineering, product marketing and support. He has worked at companies large and small including Red Hat, where he led infrastructure for their Global Support Services division. Stephen has four pending patents and has been a speaker at both Salesforce.com’s Dreamforce as well as SAP’s Sapphire.
Industry Outlook: What are the biggest challenges facing the IoT database industry when it comes to real-time data analytics?
Stephen Goldberg: The industry lacks a database solution for IoT that’s simple, secure and scalable and covers the entire data value chain from ingestion to action. Organizations know their data is valuable, but even after building out complex and costly infrastructure to manage that data, they’re finding it’s still not usable. The challenge with most big-data solutions is they’re too inflexible to solve business problems in real time. Data ingestion and collection occurs at the edge, but the transformation and analysis occurs in the cloud. As the amount of IoT applications gains momentum, these solutions take on a similar architecture, which is quickly becoming the standard. When it comes to real-time usability from your data, this transaction takes considerable time and infrastructure and becomes incredibly expensive.
IO: How are organizations addressing these issues?
SG: Most organizations have resorted to using a multi-tiered database architecture, but doing so requires multi-licenses, servers and countless resources, with different skills, to maintain and support. They potentially have a light database or caching mechanism running on the edge, an IoT gateway and/or a NoSQL database for handling big-data ingestion and transactions, an SQL database to run the business applications, and data lakes such as Hadoop or Spark for more-advanced analytics. Middleware and infrastructure tools then tie it all together. These database architectures, therefore, essentially rely on two things: first, that you’ll buy multiple database products to support your data value chain, and second, that you’ll build out a large and expensive team to support this massive big-data footprint.
I find the most successful IoT infrastructures are using a hybrid transactional/analytical processing (HTAP) model that can ingest high volumes of data while also enabling simultaneous analytical processing. Database scale historically came through vertically scaling a server, creating a single point of failure. HTAP—and modern database architectures—avoid this pitfall by employing clustering and replication. Clustering your IoT devices with an HTAP database and native REST API provides the ability to network across regions, warehouses or continents, granting businesses a seamless view of their organization. This capability allows organizations to best manage costs and innovate quickly while internally managing what’s most important.
IO: Can you expand on the single-model approach versus multi-model and the impact it has on cost, efficiency, footprint and overall effectiveness of data?
SG: Both NoSQL and SQL have incredible value to contribute to the technology landscape. But to support both of these database capabilities, many have adopted a concept called multi-model. This approach works by storing data using one primary NoSQL modality and then transforming that same data when you want to perform SQL operations. These database architectures seemingly provide flexibility to handle data needs, but they suffer from inflated memory- and storage-infrastructure costs. It’s expensive from the perspective of resource utilization, and it’s risky, as it means your data is duplicated. Moreover, these multi-model solutions often lack support for full ANSI SQL, or they incur a major cost in the form of memory and time, preventing organizations from running complex SQL queries and driving them to use data lakes.
Using a single-model NoSQL and SQL architecture, you’re supporting both SQL and NoSQL transactions in one storage mechanism. You can therefore harness the power and scale of NoSQL while using full ANSI SQL without duplicating your data in memory or on disk. This approach can in some cases eliminate the need for map-reduce and data-lake solutions. Additionally, thanks to a smaller data footprint, it can run on microcomputing devices, making it ideal for IoT.
IO: Discussions of HTAP have been increasing. Can you explain what it means and how it addresses these larger IoT concerns?
SG: In many IoT applications you’re dealing with enormous amounts of data, and that data is often dynamic. Furthermore, real time is becoming more and more important to organizations with IoT and industrial IoT (IIoT) applications. In public safety, for example, seconds matter. When designing a database, you must make some hard decisions, and they have consequences in the form of tradeoffs. Many products are built with flexibility and ingestion speed in mind—mostly NoSQL products. Other products are built with read and analytical capability in mind. In the former case, you’re trading analytical capability for write speed; in the latter, you’re often trading ingestion speed for analytical capability.
HTAP databases strive to find the perfect balance of these two paradigms. The goal of an HTAP database is to ingest data at high scale while analyzing that data in real time without crashing the database.
IO: Why is it important to have a database that can be installed on micro-devices and run on the edge? What industries are adopting this technology?
SG: It’s important for multiple reasons and applications, but for IoT in particular it’s a game changer. First, using existing devices rather than relying on a server or cloud cluster keeps costs down, as IoT projects employ the network of devices. Most IoT and edge base solutions are light versions of an existing product. Having the same product with the same features, as well as the code base deployed on both the device and server, allows organizations to make the devices more independent and intelligent, and it enables distributed analytics on the edge.
We’ve seen particular interest in this technology from the health-care, logistics and retail industries, but it expands to new industries every day.
A great example would be a chain of convenience stores maintaining an IoT network where each store on the edge is recording sensor data on inventory, refrigerator temperatures, point-of-sale data and so on. Corporate HQ can continuously monitor this information across stores, making the organization more agile with just-in-time manufacturing capability, advanced health safety and the ability to improve the customer experience by ensuring safety and product availability. Furthermore, by using the edge for distributed query and storage, organizations can drive down costs while also having a more fault-tolerant system in the event of Internet outages.
Similarly, major hospitals can install database solutions on micro-devices to track real-time patient data with the goal of increasing resource utilization as well as improving quality of care and patient safety. In a clinical setting, they can monitor real-time patient information in a secure, HIPAA-compliant environment.
IO: With new IoT initiatives popping up every day across every industry, the need for real-time analytics is apparently becoming a requirement. What’s next for the industry?
SG: I believe the data-value chain will move directly to the edge over the next few years as more and more organizations see the need for real-time analytics directly on an intelligent edge. The issue with hosting a database or maintaining large cloud-based environments is that you’re duplicating your resource spending on the edge and the cloud, and throughput is becoming increasingly expensive for organizations as well. Add in security concerns and lack of transparency into what’s happening with your data and you have a growing trend of moving to a hybrid-cloud model. Cloud providers will continue to excel at providing incredible services for AI, facial recognition, and other APIs that require massive compute and complex processes. The hybrid cloud will continue to grow as cloud-service adopters realize that strategically carving architecture into their own hosted environments saves capital and enables a more fluid infrastructure.