Dynamite Data Improves MySQL Performance and Access to Comparative Online Product Information with OCZ Deneva 2 SATA-Based SSDs

August 16, 2012 No Comments »
Dynamite Data Improves MySQL Performance and Access to Comparative Online Product Information with OCZ Deneva 2 SATA-Based SSDs

Application Focus

  • Reduce/eliminate disk I/O bottlenecks
  • Improve server application performance
  • Accelerate access to stored data

Return on Investment (ROI)

  • Improved database application read performance by 430%
  • Improved database application write performance by 400%
  • Improved overall database application performance by 420%
  • Improved the average MySQL queries per second by 430%

User Quote

“We put the call out to transition our core databases to solid-state, and only OCZ could give us the reliability and performance our applications demanded.”

Kristopher Kubicki, Chief Architect

Dynamite Data

Introduction

Dynamite Data provides unprecedented retail channel intelligence for merchants, manufacturers and consumers, offering access to the most comprehensive, timely and actionable information. Through its proprietary technology and sophisticated big data collection and manipulation techniques, Dynamite Data automatically extracts real-time data, analyzing and distributing over 30 million extractions per day from nearly a half billion buy pages and more than 3,000 online merchants worldwide. It empowers clients by delivering the most accurate real-time information on channel pricing, product assortment, map violations, ratings and reviews, and stock availability for the ultimate competitive advantage.

Dynamite Data’s comparative shopping service was originally established to compile electronic and high-tech products sold through online merchant buy pages (“e-tail”), but was expanded to include home appliances, sporting goods and children’s toys. Invariably, any product sold online could be developed as an individual Dynamite Data database and be available for comparative shopping or information retrieval. With three billion buy pages available online globally, Dynamite Data’s objective is to data warehouse all e-commerce products for its retail channel.

As part of its original system, Dynamite Data developed a robot (aka “Internet bot”) to automatically read and extract product data from online merchant site maps. The patented Internet bot was capable of extracting approximately 20,000 buy pages per minute or 30 to 50 million buy pages per day, and as of this writing, it has nearly a half billion buy pages stored in its MySQL database. The products in the system center on IT and networking equipment as well as consumer electronic devices, and over time, Dynamite Data was storing approximately three to four terabytes of data each week. These storage demands became very taxing on the initial network and storage infrastructure.

As more products were added to the Dynamite Data retail intelligence data and the web pages were organized into user-preferred formats, data traffic increased significantly, causing I/O bandwidth bottlenecks. The amplified downloading of product comparisons and pricing scenarios resulted in considerable strain on the conventional enterprise hard disk drives (HDDs) they deployed, resulting in suboptimal system performance. It was apparent that an infrastructure upgrade was needed.

Though Dynamite Data deployed additional HDDs to solve disk I/O bottlenecks, the results were still not optimal. The more hard drives that were added, the higher the costs rose for power consumption, cooling and maintenance. And since HDDs inherently have endurance and reliability issues, they required Dynamite Data to implement RAID capabilities so a redundant data path would be in place in the event of hard drive failures. The storage segment of Dynamite Data’s enterprise became expensive and inefficient, and adding HDDs to improve performance was not the solution.

This case study addresses storage challenges that Dynamite Data faced initially in upgrading its retail channel infrastructure to new and improved solid-state capabilities that deliver flash data caching, enterprise-class endurance and reliability, scalability, and real-time access to product data. By adding Deneva 2 solid-state drives (SSDs) developed by storage leader OCZ Technology, Dynamite Data achieved significant improvements in server application performance and data access while reducing overall system costs—solidifying it as a leader in the industry with a trusted and indispensible tool for high-profile merchants and manufacturers.

Dynamite Data’s Network Infrastructure

Dynamite Data created its business model to be the world’s first and premier central repository of all retail channel intelligence, and it uses state-of-the-art big data collection and manipulation techniques to compile comparative information, as well as actionable insights. It does so in Linux-based MySQL; being open source, it enables web page data to be obtained freely. Instead of the merchant providing data feeds for Internet bots to parse, the Dynamite Data Internet bot identifies products and their street prices organically using advanced auto-adjusting natural language processing, giving clients the ability to search for fresh, accurate, comprehensive data, and any product, at any time.

To develop this repository of e-commerce data, Dynamite Data selected MySQL since it is the world’s most used relational database management system (RDBMS) and literally operates as a server to provide multi-user access to a number of the product databases developed by the company. In a comparative retail channel business, user contention must be eliminated and data access must be immediate; this became the mantra for the Dynamite Data upgraded network.

The majority of the web buy pages are created as unstructured MySQL data that must be structured for segmentation within appropriate databases. In turn, this requires write-intensive enterprise performance to store the data. In contrast, as thousands of subscribers access Dynamite Data’s comparative shopping system concurrently, they too will require read-intensive data access performance, eliminating those unpopular delays as subscriber computer screens populate the data. To provide a heightened comparative shopping experience for its subscribers, improvements in read/write performance, as well as an ability to increase the average MySQL queries per server, is required.

The initial Dynamite Data retail channel infrastructure was composed of four AMD 2x six-core 2.2GHz servers that were configured as a read server cluster. The server cluster supported two main applications including MySQL version 5.5 for data warehousing and retrieval, and Sphinx 2.0.4 for full indexing and stored data searches in MySQL. Sphinx works like a database server, and a typical cluster can scale billions of documents and tens of millions of search queries per day powering such top websites as Craigslist and DailyMotion.

From a storage perspective, the original system configuration included four 10,000 rotations per minute (rpm) HDDs per server, all supporting RAID0 striping, resulting in 16 HDDs total for the four-server cluster. The HDDs resided in a dual-socket SuperMicro SAN array connected to the server cluster via SATA. Generating 30 to 50 million buy pages per day challenged the system and created a number of problematic issues. In addition, the initial Dynamite Data network had been operational since 2007, so an IT refresh to improve server application performance and data access was required.

Infrastructure Issues

The excessive downloading of product comparisons and pricing scenarios by subscribers, database loading at 30 to 50 million web pages per day, and network I/O activity (addressing connectivity, communications, management, etc.) caused considerable disk strain in enterprise storage, and using conventional HDDs penalized system performance, resulting in disk I/O bottlenecks. In addition, the MySQL databases did not adequately scale when stored in conventional HDDs, requiring more drives to satisfy the server performance demands. The more drives that were added, the more power and associated cooling was also required, not including maintenance, support and replacement costs, all of which drove up the total cost of ownership (TCO) for Dynamite Data’s initial enterprise.

Although servers can handle millions of input/output operations per second (IOPS), a conventional HDD can only deliver between a 100 and 200 IOPS. As more and more subscribers use the Dynamite Data system concurrently, in addition to the system loading databases with millions of web buy pages daily, the HDDs within the SAN array simply could not keep up with the server workload demands. In addition, HDDs have physical limitations that require their mechanical heads to move for every instance that data is requested from a different location in the storage array, limiting each drive’s physical ability to quickly access random data. Each HDD head movement takes time, and the read/write I/O performance, as well as latency, slows considerably until the data is accessed. As a result, the speed and quality of Dynamite Data’s data collection and manipulation techniques were compromised.

Solid-State Drives

In contrast to HDD storage, solid-state drives store data using NAND flash memory, and with no moving parts, they handle random data access effortlessly. In fact, a single host-based flash SSD can typically deliver random IOPS performance comparable to a large SAN array with hundreds, if not thousands, of HDDs incorporated. Dynamite Data realized that adding SSDs to its enterprise would significantly improve its MySQL and Sphinx database performance. But the real benefit would derive when “hot” data could be cached on SSDs residing in the host physical server, providing significantly faster data access to database indices.

In preparation for adding SSDs to its network infrastructure, Dynamite Data required SATA-based SSDs with leading-edge performance, enterprise-class reliability and endurance, and support for open-source Linux-based MySQL and Sphinx databases. It was Dynamite Data’s intent from an IT perspective to have the entire networked system and applications on SSDs, without having to rely on hard disks.

After researching competitive enterprise SSDs, Dynamite Data selected SATA 6Gbps-based Deneva 2 SSDs from OCZ Technology, one of the leading providers of solid-state storage solutions in the world. Since databases need to be transactional so that data becomes instantly available, the Deneva 2 SSDs are ideally suited to the Dynamite Data databases, delivering 80,000 IOPS (random 4K writes) with a maximum throughput of 525MB/s and supporting 480GB capacities and 2.5-inch form factors. They also provide power loss data protection, best-in-class endurance (e.g., minimal write amplification, intelligent block management and wear-leveling), and advanced encryption and error correction coding (ECC), making these drives ideal for enterprise applications.

“We put the call out to move our core databases to solid-state, and only OCZ could give us the reliability, scalability and performance that our applications demanded,” said Kristopher Kubicki, Chief Architect for Dynamite Data. “With these improved enterprise capabilities, the implemented Deneva 2 SSDs will enable us to collect and manipulate even more data while providing our clients with heightened experiences.”

Testing and Implementation

Before implementation, Dynamite Data conducted extensive performance tests to gauge the improvements that OCZ flash-based SSDs provide, especially with the ability to cache data on flash for significantly faster data access to stored database indices. The test environment was a scaled-down version of the actual Dynamite Data infrastructure and included the following:

  • 4 Read Servers: AMD 2.2GHZ, 2 x 6 core CPUs
  • Key Applications: MySQL v5.5 and Sphinx 2.0.4
  • 2 HDDs 10k RPM with RAID0 redundancy
  • 1 SSD OCZ Deneva 2 with 480GB capacity

One set of tests conducted by Dynamite Data tested the read and write IOPS performance of the OCZ Deneva 2 SSD versus a small hard drive array. The tests, which were run for at least one week to determine a performance average, included constant downloading of product comparisons and pricing scenarios, database index loading of millions of web pages per day, and general network I/O activity. The results are as follows:

Drive

Read

Performance

Performance

Improvement

Write

Performance

Performance

Improvement

Total

Performance

Performance

Improvement

HDD Array

(10k RPM)

325 IOPS

125 IOPS

450 IOPS

400%

1900 IOPS

420%

Deneva 2 SSD

(480GB cap.)

1400 IOPS

430%

500 IOPS

With MySQL being the dominant database in its infrastructure, Dynamite Data tested the average MySQL queries per server over a one week period between the OCZ Deneva 2 SSD and the small hard drive array. A typical MySQL query could range anywhere from downloading product comparisons to generating pricing scenarios, or identifying discount, rebate or shipping cost scenarios, and so on. The average MySQL query results (per server) are as follows:

Drive

Average MySQL Queries (per server)

Performance Improvement

HDD Array (10k RPM)

400 per second

430%

Deneva 2 SSD (480GB capacity)

1725 per second

Conclusions

As outlined in this case study, the intense downloading of product comparisons and pricing scenarios by clients and the loading of millions of web pages per day, in combination with general network I/O activity, caused considerable disk strain in Dynamite Data’s enterprise, and using conventional HDDs caused disk I/O bottlenecks and penalized system performance. The MySQL database indices did not scale very well in conventional HDDs requiring more drives, more RAID, more power and cooling, and more maintenance, all of which drove up TCO for Dynamite Data’s enterprise.

The storage challenges that Dynamite Data faced required solid-state capabilities to enable flash data caching, enterprise-class endurance and reliability, scalability, and real-time access to information. In an online comparative retail channel business, user contention needs to be eliminated and data access must be immediate. By adding OCZ Deneva 2 SSDs to its infrastructure, Dynamite Data achieved significant improvements in server application performance and data access while reducing overall system costs. But the real benefit enabled data to be cached on SSD flash, providing significantly faster access to database indices, and implementing SSDs into the infrastructure would hopefully improve system performance four to five times over the previous HDD configuration.

To provide a heightened comparative retail channel experience for its clients, improvements in read/write performance, as well as an ability to increase the average MySQL queries per server, was required. After extensive testing, Dynamite Data achieved its system performance goals versus its hard drive model:

  • Read IOPS performance improved 430%
  • Write IOPS performance improved 400%
  • Total IOPS performance improved 420%
  • Average MySQL queries improved 430%

“The OCZ Deneva devices were unique in that they were the first available eMLC solutions with rock star controllers, and featuring enterprise-class write durability, eliminated the drive endurance issues and associated costs that were typical with our initial disk storage infrastructure,” said Brian Stratman, Systems and Database Engineer for Dynamite Data.

With three billion buy pages available online globally, Dynamite Data’s objective to data warehouse all retail channel products and provide immediate access to product comparisons and pricing scenarios is now a reality with the addition of OCZ Deneva 2 SSDs to the infrastructure. The ability to cache data on flash provides much faster access to information and an improvement in server application performance, ultimately enabling Dynamite Data to provide its clients with unmatched critical, competitive and actionable insights on retail channel big data, while delivering the optimal user experience.

Add Comment Register



Leave a Reply