Industry Perspective is a regular Data Center Journal Q&A series that presents expert views on market trends, technologies and other issues relevant to data centers and IT.
This week, Industry Perspective asks Samer Forzley about what companies can gain from big data. Samer is Vice President of Marketing at Pythian, a global data management consulting company that specializes in planning, deploying and managing mission-critical data infrastructures. Samer is a veteran practitioner of blending marketing with customer relations. His years of experience enable Pythian to develop and manage marketing strategies and programs that increase customer satisfaction and support.
Industry Perspective: Companies are awash in information, but what exactly does the “big” mean in “big data”?
Samer Forzley: Big data means different things to different people. At face value, big data refers to large amounts of data, usually multi-terabytes or petabytes. But the “big” in big data does not just refer to the size of the data but also the nature of it. Big data is unstructured. Typically, the data that companies keep in their database follows a high-integrity structure that must be maintained for the system to work. For example, when you sign up to a receive an online newsletter, your information is entered into a database following a predetermined structure that includes your first name, last name, email address and so on. Big data, however, is unstructured, meaning that different data sources have different formats that do not follow a predefined structure. Data included in a log file is different from data created by a mobile device, which is different from data generated on social media sites and so on.
IP: Is there really value to be garnered cost-effectively from all the data businesses collect?
SF: Absolutely. For starters, the cost of storage is low, so the value required for a return is also low. Second, there are cost savings to storing data using big data technologies instead of traditional means. Let me explain.
Although storage costs are often low, the cost to operate a traditional database can be expensive. And it’s not hard to understand why. It needs to be instantly responsive to customers. If online retailers’ websites take longer than three seconds to load, they risk cart abandonment.
Here is one way you can think of it. Let’s say you have an urgent letter that needs to be in Japan by tomorrow morning, and you will earn a million dollars in revenue if this letter is delivered on time. Paying a $500 delivery fee to fly that letter first class and ensure it arrives on time is well worth the return on investment. Traditional databases must return data like a first-class delivery. When a customer makes a transaction, the database response must be fast and on time to confirm the order, the inventory, the shipment information and so on.
Big data technologies, however, offer an advantage when there is data in your database that does not need first-class delivery. Imagine an online retailer with 10 million products in its catalog, each with different images, descriptions, details and reviews. This information could all be stored in a first-class, structured traditional relational database. But what if that product catalog is from 2009, and some products are no longer available or have gone out of style? The question becomes, what do you do with all this data? You can continue first-class deliveries in a traditional database or you can offload it to technology like Hadoop, a much more cost-effective big data option. This approach has come to be known as data landfilling.
There are also several industry examples of big data analytics generating more business. Walmart’s well-documented big data project, for example, discovered that during the threat of a hurricane, shoppers stock up on Pop-Tarts and beer. This insight allowed the retailer to shift inventory accordingly to meet customer demand.
IP: Are organizations alone in trying to gain benefits from their masses of data, or can they follow an outsourcing model in this regard?
SF: Big data is still emerging, and there are a lot of solutions in the market. Companies will benefit from working with a third party to help them navigate the big data world and select and implement a solution that best fits their business objectives. By using a big data outsourcing provider such as Pythian, companies can receive round-the-clock advice and consultation from experts who work on big data projects daily, so they can focus on running their business.
Fundamentally speaking, the promise of big data is that you can use data and statistics to unlock insight. That is why there has become a need for data scientists who have a strong command of data and statistics and can discover these trends and correlations. Good data scientists, however, are extremely difficult to find and keep, and they typically cost a lot. That is why most companies struggle to justify having a data scientist on staff and would rather work with a third-party that can offer this expertise and knowledge.
IP: Who needs big data solutions, and what value do these solutions offer?
SF: Big data is not just for companies that have large sets of data. Companies can use big data tools for smaller data sets as well to gain insight and increase revenue and customer satisfaction. The most important thing is to first understand your business objectives, and then map out what solutions can help you achieve those objectives. The solutions vary—some provide cost-effective storage, some pull out insights and uncover patterns, and some provide analytics in real time. In the end, a clear understanding of your business objectives will drive your big data solution choice.
IP: What is the role of the database administrator when dealing with big data?
SF: The database administrator (DBA) plays a critical role in big data. The promise of big data is to help uncover insights from all your data sets—data in your high-integrity relational database as well as the unstructured data your company collects. The DBA’s role is to ensure that data is flowing between your traditional environment and your big data environment so that full insight is realized and the insight is leveraged in the core delivery of service.
DBAs are well-suited for handling big data because they are generally highly-skilled in managing systems with many configurations, all of which have a critical impact on performance and availability.
There are actually new roles that are emerging in many organizations—for example, the Hadoop Cluster Administrator. This individual is responsible for the performance and availability of the Hadoop cluster and the data it contains. Hadoop administrators need troubleshooting skills; an understanding of a system’s capacity, storage, networks and so on; strong knowledge of Linux; and, of course, Hadoop skills. The job shares similarities with that of a DBA, so it’s a role that can be filled by a well-skilled DBA who has undergone some training.
IP: Big data is currently the subject of lots of hype; how can companies see through the hype to find real value?
SF: To start, companies should seek advice from a trusted source. This could be an industry colleague who has recently implemented a big data project, a consulting firm like Pythian that implements big data projects on behalf of clients and can help companies find real value, or an analyst firm like Forrester or Gartner that can offer a good overview of the market.
Companies considering big data do not have to go “all in” on day one. A proof of concept or small-scale big data project can be implemented to make sure the solution will address their business needs first. Companies can even deploy small projects in the cloud at a reasonable cost. Once a proof of concept yields answers for the basic business objectives, companies can then expand their projects and include more analysis. Big data does not have to mean big budget.