I feel your pain: Your current backup application came out when Pong was all the rage in computer games, and you are keeping it running with spit, glue, duct tape, bailing wire and many long nights and weekends. You have no real disaster recovery (DR) strategy to speak of other than shipping tapes off site, and the last time you really tested DR with any reasonable hope of success was when your kids were still in grammar school.
The amount of data you need to protect is doubling, your tape drives work occasionally and the person who wrote all the scripts to provide at least some level of automation just found another job. Happy New Year. Although this sounds like a comedy of errors, these are the cards that are dealt to many a backup administrator.
An organization’s data is its lifeblood. Corporate and government executives understand this but still are unwilling to allocate the right budget to the IT department to protect it. Why?
Backup and DR are insurance policies. They do nothing to enhance a business organization’s bottom line or its mission. The process is a necessary evil, just like going to the bathroom. If IT managers could, they would enjoy the meal and pay someone else to go potty for them. Why do you think cloud computing is making an impact? It allows IT to outsource the hard stuff to someone else so the core team can focus on the mission at hand—which brings up the reason for this article.
To understand whether outsourcing backup and DR to a cloud provider makes sense, you need to determine how much it costs you today to provide a similar service level. This also holds true for the cost of that new solution when outsourced. If the solution will replace your old one and greatly improve your quality of life, how do you justify the cost to accounting? The goal is to create a win/win message for finance. “Yes, the solution does cost $X, and it will enable me to stay home at night and drink beer, but I can prove it will save us $Y in the long term while providing a much more robust recovery paradigm for our applications.”
Here are a few handy tips to validate any new technology solution to determine whether the cost of implementing it would be justified.
Top Five Issues in Analyzing Backup and DR Costs
1) The backup target: Tape sucks for recovery, but it’s great for archives. For every 20 terabytes of production data with a 2 percent change and 3 percent growth rate, and using a traditional backup operation of daily incremental and weekly full backups, you need to manage 110 terabytes of tape media. Therefore, a shop with 100 terabytes would need 550 terabytes of tape (100 x 110 = 550). Your goal should be to include tape as an archive storage tier so you can minimize the need to recover from tape.
2) Infrastructure and time: The time required to back up your data center is equal to the current capacity of all the applications needing protection divided by the performance in terabytes/hour of your current backup solution. The performance metrics must include the entire data path: network, backup server, storage and backup target (tape or disk) as a whole. You could buy the fastest backup target on the planet, but if you can’t feed it fast enough, it doesn’t help a bit. Remember that the time to recover using traditional approaches is usually twice the time to back up.
3) The network: Your wide area network (WAN) needs to be robust enough to handle the amount of data change you encounter on a daily basis. WAN bandwidth should be measured for peak loads so that critical batch-processing periods, such as end-of-month or end-of-year processing data, are not at risk. This is why WAN costs can be the most expensive part of a good DR strategy.
4) The building and the plan: The DR target site must be able to handle peak workloads, even if in a degraded fashion, for all the critical applications the company requires; and it must be available in a timeframe that supports continuity of operations at minimal impact to the business. This is a hard number to determine, so using readily available information from industry analysts can help you determine the hourly cost of downtime for your industry.
Here is an example of lost-revenue data based on publicly available information. It may be outdated, but it is useful for at least determining a baseline outage cost. More-accurate and up-to-date data can be obtained from the major analyst firms.
|Industry||Lost Revenue per Hour (U.S. dollars)|
5) Intelligent recovery: You need to have the right infrastructure at the DR site to handle operations. And you need to understand how to restore applications that have crashed so they actually work. There are typically many interdependencies between application servers and databases for most modern applications, and they need to be brought up on the right platform in the correct sequence. Clients must be able to connect to the new servers as if they were connecting to the old servers; so infrastructure servers such as directory services, network routers and phone lines, if required, need to be available.
Let’s consider each of these issues individually. We’ll use a phased approach to optimize the process and reduce current costs, so you can justify a new solution to replace your existing systems.
Phases to Optimize Backup and DR
1) Add data deduplication to your backup operations to reduce backup costs and optimize replication.
2) Implement snapshots and continuous data protection (CDP) to eliminate bulk data movement for backup and DR.
3) Use WAN optimization and delta versioning with encryption to reduce risk and WAN requirements by 90 percent.
4) Use CDP to reduce recovery times to a few minutes and eliminate data loss.
5) Virtualize your storage and servers to reduce infrastructure costs.
How to calculate the benefits:
Phase 1: Add Dedupe
These calculations do not cover the cost of replacing backup software, as it is difficult to rip and replace. Just remember that, using standard backup software, for every 20 terabytes of production data you may need 110 terabytes of tape depending on your data retention requirements
Assuming LTO3 drives and 20 terabytes of production data:
- 110TB/400GB (capacity of LTO3) = 1,126,420/400 = 282 tapes
- 282 x $70 = $19,740
- 80MB/s (speed of LTO3) = 6.91TB per day
- To backup 20TB in a 12-hour window, you need 6 drives
- 6 drives = $4,500 x 6 = $27,000
- Total = $46,740 for each 20TB.
- Price to provide 40TB of tape capacity = $46,740 (base costs) + $19,740 (media) = $66,48
Dedupe to the rescue:
Implement a single 4TB dedupe appliance that will hold 40TB at a 10:1 ratio = approximately $17,700;
And $66,480 – $17,700 = $48,780 in savings, or 375 percent! Purchase another dedupe solution to replicate the data off site to eliminate off-site tape movement, recall costs, eliminate array-based replication licenses for that data and reduce the WAN requirements to replicate that data by 90 percent, which is a 10:1 dedupe ratio. Deduped replication savings:
- Cost of off-site tape storage contract, $17,700 = net savings
- Cost of array licenses and storage for replication = array license, storage and maintenance savings
- WAN costs to replicate 20TB of data versus 2TB of data (10:1 dedupe ratio) = annual WAN savings
Phase 2: Implement Snapshots and CDP
Let’s say your current backup window is 12 hours for your 20 terabytes of production data to tape. Adding more tape drives may not fix the problem if the network is your bottleneck. Implementing snapshot-based backup will reduce the backup window from the current 12 hours to a couple of seconds. Moving from traditional bulk backup to snapshot backup eliminates the physical data movement from Point A to Point B to copy the data, which means the physics of the process changes. With snapshots, you can back up more often, and data recovery is typically much faster. CDP takes that paradigm even further by moving from more frequent periodic backup to always backed up, which means zero data loss.
Now let’s calculate the benefits of implementing Phase2. Assuming the cost of downtime is calculated using the numbers provided in the chart for a media company and assuming an average recovery time of only four hours to rebuild a critical server, the calculations are as follows:
- $340,000 x 4 = $1,360,000 is the cost of new CDP solution (assume 2 sites at $100,000 per location = $200,000).
- $1,360,000 – $200,000 = $1,160,000. Recovery time for CDP solution = 30 minutes ($340,000/2 = $170,000).
- $1,160,000 – $170,000 = $990,000 savings per outage.
Phase 3: WAN Optimization
If you have already purchased the dedupe solution in Phase 1, you may not even need to purchase anything else to reduce WAN requirements. Many have found that disk-to-disk-based replication from a CDP or dedupe solution enables them to turn off their current expensive array-based replication systems. CDP solutions tend to replicate data very efficiently, and dedupe reduces WAN requirements by 90 percent or more. Let’s say you can turn off just half of the current WAN bandwidth you need to replicate your 20 terabytes—that’s a 50 percent WAN savings over the current solution.
Phase 4: Use CDP to Speed Recovery and Eliminate Data Loss
The benefits of Phase 4 are already outlined in the savings of Phase2. The added benefit of reducing data loss to zero is also difficult to calculate. The savings depend on the type of business. As an example, in a hospital loss of data may mean loss of life. If you are running a trading application for a major stock exchange, a single second may encompass thousands of individual financial transactions worth millions of dollars. If your company writes software, an outage may mean hours of irretrievably lost coding. I recommend using an industry chart to determine the benefits of reducing the possibility of data loss and the improved recovery times provided by CDP.
Phase 5: Virtualize Servers and Storage
Implementing virtualization can have a huge impact in multiple areas:
- Server virtualization commoditizes servers and enables server consolidation.
- Storage virtualization commoditizes storage and enables complete data mobility.
- The ability to replicate data between unlike storage reduces storage costs by up to 50 percent.
- Consolidating multiple applications onto virtual servers reduces infrastructure costs by 90 percent.
- Dense virtual storage and servers reduces data center power, cooling and floor space requirements.
For example, a single blade server may be able to run 50 to 100 applications, versus 50 to 100 physical servers. Virtual storage simplifies provisioning and enables storage pooling and tiering.
With the right data, you can justify your purchase of innovative technology that can optimize your IT infrastructure to help the business and its mission. And an additional benefit of that purchase just might be reduced stress and some free time to take in a game over the weekend!
About the Author
FalconStor VP of Enterprise Solutions Christopher Poelker is a highly regarded storage expert with decades of experience, including positions as storage architect at HDS and lead storage architect at Compaq. Poelker also worked as an engineer and VMS consultant at Digital Equipment Corporation. Recently, Poelker served as deputy commissioner of the TechAmerica Foundation Commission on the Leadership Opportunity in U.S. Deployment of the Cloud (CLOUD²).
Photo courtesy of Dan Century.