One of the most compelling advantages of cloud computing is that (many) companies no longer need to get their hands dirty with all that messy underlying hardware. Using virtual machines, hardware can be anything—and anywhere—that it must be to best serve a business. No longer is a crystal ball necessary to determine the correct quantity of processors, RAM and drives that will satisfy future requirements. Nor is there a need to then be constrained by those decisions if the predictions don’t pan out. And no longer must in-house IT teams install and maintain these infrastructures.
Instead, using virtual machines in the cloud, the specifics of a company’s hardware can be changed very simply and easily with the press of a few buttons. This process can be to meet changing needs or to optimize systems. Often enough, small changes can deliver massive performance improvements—and they wouldn’t have been attempted if it had meant experimenting with cumbersome physical-hardware installs. In this way, the cloud offers power and flexibility that would otherwise be unavailable. So, to summarize what most of us already know, the cloud is good.
As a cloud-service provider, we select hardware so users don’t have to. Or, said another way, we don’t have it so lucky on our side. All those virtual machines must run on physical hardware at some level, and it’s been our undertaking to determine what hardware ought to be on those shelves in the data center. As we initially embarked on our beginner’s foray into the wild blue yonder of cloud services with our first OpenStack implementation—the DreamHost Cloud Beta cluster—we had a lot to learn about the data center hardware that would be the best fit for our customers’ needs. Needless to say, it was an education.
Four years ago we launched (somewhat blindly, we can admit at this point) into the cloud, guided by our best brainstorming and guesswork as to which hardware our DreamCompute Beta cluster should use. Designing a back-end cloud infrastructure to supply virtual machines to enterprise users while working within our software limits required making a series of choices—and, as it happened, finding out what would have worked much better, then adapting. Customer feedback was quick to show us where we went astray in our efforts. (Those pesky customers and their propensity for always being right.)
Here’s a sampling of what we learned as we sought the right mix of hardware for our cloud-computing product.
Testing the Limits (Then Sticking to Them)
It’s said there is wisdom in knowing what we can and can’t change. That certainly proved true when building a cloud-computing cluster. We began with certain unalterable physical and software-based limits. The racks in our data centers are nine feet (58 rack units) – tall, and they have two 60-amp three-phase 208-volt power strips installed. That foundational physical infrastructure isn’t going anywhere. The operating-system software for this OpenStack implementation is Ubuntu Linux on the hosts and Cumulus Linux on the switches, so we could work with Linux-compatible hardware only.
Also, as a business matter, customer demand for processor/RAM/disk usage remains in a certain constant ratio. We can alter our offerings and pricing, but the ratio is a fact of life that we must respect as a hard limit. Death, taxes and customer-usage ratios.
Strengthening Our Core: Quantity or Quality?
For the DreamCompute Beta cluster we used AMD Opteron 6200-series processors, which deliver an impressive 64 cores for each machine. The only downside was that they were slow and underpowered relative to other options. The large number of cores allowed us to put a lot of virtual machines on a single host and provide outstanding specs, on the order of a 32-CPU, 64GB RAM virtual-machine size. Although customers certainly liked having many cores, however, feedback was lukewarm because having more-powerful cores was even more essential to customers’ needs. Lesson learned: in the design for the next cluster, we were much more careful about balancing core power and quantity.
Two Cords, One Machine: Not for the Faint of Heart
Speaking of power issues, the density of processors and RAM sticks in each of our hypervisor machines (coupled with our setup of housed two systems on each chassis) created a situation where a server could not run with a single power cord but needed two 1,600W power supplies. It was therefore impossible to build redundancy into this system. As a result, our hypervisors were not resilient. They would lose power during any maintenance event, power fluctuation, PDU failure and so on. Making this situation worse, the power cables included with the machines happened to have C13 connectors that were on the small side, so they easily slipped out of the sockets on the PDU. That, in turn, meant the hypervisors would also lose power. For the new cluster, we were sure to invest in hardware that made power redundancy possible.
When Choosing Hardware, Don’t Cluster Storage Clusters
When it came to building block storage for the DreamCompute Beta cluster, we used Ceph, the object-storage software initially developed at DreamHost. At the time, our only reference point for what sort of hardware to use was our large object-storage cluster DreamObjects, so we put the same type of dedicated storage machines in place.
It turned out that large object-storage clusters and cloud-computing storage clusters have different needs. Large object-storage clusters are intended to hold a sizeable amount of data that isn’t accessed very often, making it fine to use low-end processors, a small amount of RAM and a simple RAID card. In contrast, the data back end for an OpenStack cloud needs the power to handle a variety of concurrent and ongoing processes: virtual machines are spinning up and down, MySQL is performing operations and so on. Again we faced an issue of balance: we had nearly 10 times the storage space we actually needed, and pretty poor performance. The lesson clearly presented itself once more, and we aimed for immediate access speeds with the next attempt.
In the end, the iterative process of trial, error, frustration, customer feedback and informed improvements has led us to have the right hardware in place. The lessons of how to strike the right balance between power and breadth in our hardware capabilities has gradually improved the quality of our product, and they continue to guide us.
About the Author
Luke Odom is the Data Center Manager at DreamHost, a global web-hosting, domain-registrar and cloud-services provider.