Niemann Capital Management (NCM) is an innovative investment management firm distinguished by its tactical asset allocation and rotation methodology. With preservation of capital as the cornerstone of its philosophy, NCM offers a range of conservative, moderate and aggressive managed account strategies that use mutual funds and exchange-traded funds. Since 1991, the firm has used a proprietary methodology and disciplined process to analyze daily current world market conditions, seeking the greatest potential return with the least possible risk for investors.
The Challenge: Enable Quick Recovery of Business Operations During Downtime Events
Niemann Capital Management is headquartered in Scotts Valley, California, and uses a Silicon Valley–area Equinix facility as its primary data center. A secondary site is located in Carson City, Nevada, for recovery. The company’s IT infrastructure is largely Windows based and virtualized, with at least 150 virtual servers. NCM was doing backups directly from production machines to disk, and it also supplemented these with tape backups that were stored offsite to meet the retention requirements of current compliance legislation. Numerous feeds come in daily from financial institutions to NCM’s mission-critical SQL servers, providing the information necessary for advisers to make investment management decisions. It was important for the company to be able to recover quickly from a downtime event, as well as have a backup process in place that did not affect the production environment.
“When you’re managing other people’s money, you’re highly accountable for everything that goes on, and the liability can be pretty extreme,” said John Etheridge, IT Director, Niemann Capital Management. “In our business, if the servers go offline or we have downtime at the wrong time in the day, that hinders our ability to trade, distribute funds, liquidate and so forth, so it’s critical for us to be able to recover our data and maintain operations. In the past, if I were to lose a site, the best I could hope for was to get back online in two days. It would then take another 12 days to get everything running at full capacity, and that just wasn’t acceptable. My goal was to find a solution with the recovery capabilities we needed, within the parameters we had to work with.”
The Solution: Application-Consistent Recovery With InMage
As he began the search for the right recovery solution, Etheridge analyzed such factors as the number of IT personnel, financial commitment and time commitment that each product would ultimately require.
He initially considered the strategy of fault-tolerant sites, but in the end decided the return-on-investment was not compelling enough.
“I call it the 100-mile limit,” said Etheridge. “Even if we have a fault-tolerant scenario with two sites running real-time data, any time you get past that 100-mile limit, you have a significant problem to deal with because of the distance the data has to travel. Fault-tolerant sites are very expensive and require a lot of maintenance, work and personnel. The ROI just wasn’t there, so we decided to look for other solutions that focused on bringing our primary site back online.”
“There wasn’t a single solution out there that we didn’t evaluate,” continued Etheridge. “We looked at the very mature products, yet they still had limitations, and the cost factor was enormous. Other products were fully automated, but they still required a tremendous amount of maintenance to make sure they worked properly, and that wasn’t a task we wanted to take on. Customer support was often lacking as well. When Hitachi Data Systems introduced us to InMage, we found what we were looking for.”
NCM implemented InMage Scout, a disaster recovery and business continuity software solution from Santa Clara–based InMage Systems. InMage’s unique hybrid recovery technology enables granular recovery capabilities that can meet the most stringent recovery point objectives (RPOs) and recovery time objectives (RTOs) while completely eliminating backups as a discrete operation. InMage cornerstone technologies include continuous data protection (CDP), asynchronous replication, application failover/failback and WAN optimization.
“The strategy was to replicate data from our primary site in California to our secondary site in Nevada, and be able to perform a manual recovery that met our RTO of two hours and our RPO of 30 minutes,” says Etheridge. “Most manual recovery solutions are extremely onerous – a lot of things have to happen at the same time in order for them to work properly. With InMage, we could do point-in-time recovery with application-consistent recovery points. This was a huge benefit, since we work with multi-tier applications. Whether we go back 10 minutes or to the exact same second, we know that our data and applications will come up properly.”
InMage Scout uses application-specific APIs to mark application-consistent points (referred to as AppShots) in the data stream. AppShots are most often used for recovery because they support the shortest RTOs, but InMage can reliably re-create any previous point in time, regardless of whether it is application consistent (like an AppShot) or crash consistent (any other point in the data stream).
Other advantageous features for NCM included InMage’s replication processes, bandwidth utilization and bandwidth optimization. Etheridge had previously looked at several bandwidth optimization products, but he found the learning curve and constantly changing compression ratios to be troublesome.
“The compression we get from InMage is phenomenal,” says Etheridge. “We didn’t need to spend weeks or months learning how to use it. The throttling works well, so we’ve always been able to use only a fraction of the bandwidth we thought we’d need. Instead of over 200Mbit, 50Mbit ended up working just fine for us.”
Another particularly useful feature for NCM was InMage’s sparse retention, especially when it came to compliance mandates. Sparse retention retains long-term CDP information on the disk while minimizing target storage utilization. As time progresses, the application consistency bookmarks are maintained at less frequent intervals. For example, a sparse retention policy could be specified as follows: retain all changes for the last three days, one recovery point per hour for four days beyond that, and one recovery point per day for older data. Users then have the ability to recover to application-consistent points further back in time without consuming significantly more target storage beyond the first week.
“Sarbanes-Oxley is a major mandate in our industry,” says Etheridge. “We use InMage’s sparse retention as part of our backup strategy and our primary methodology for recovering a file. Since the SEC requires seven-year retention and proof that files have not been amended during that timeframe, the simplest and easiest way to accomplish that is sparse retention. We can go back to a designated point in time, and our compliance department can quickly satisfy any requests that arise. Sparse retention is incredibly important in meeting many of our needs, and has proved especially beneficial for SOX guidelines.”
Putting It to the Test: InMage Meets RPO/RTO Benchmarks
InMage enabled Niemann Capital Management to meet its recovery time and recovery point objectives with benefits including application-consistent and granular recoveries, bandwidth and capacity optimizations, and remote administration so additional IT personnel were not required to manage the solution.
“We recently conducted a DR test, and our failover time from the moment we stopped operations at our primary site until we were completely operational at the secondary site was 26 minutes,” says Etheridge. That included all validation that everything was working properly and that we could perform all required functions. Our goal was for mission-critical applications to be functional within two hours, and it only took us a fourth of the time to meet that RTO. That really proved to us the value of the InMage solution. Failing back over to the primary site was also incredibly simple in a test scenario. Another thing that’s advanced about InMage is its file system-aware replication. With most products, replication is at block level. If you have large blocks and only need one tiny bit of information, that’s a lot of wasted bandwidth. With InMage’s file system-aware replication, we have cut back on bandwidth even more and really sped things up. Our RPO is 30 minutes but most of the time we end up under two minutes, which is phenomenal.”
“The product speaks for itself, but the cooperation we’ve received from InMage has also been second to none,” concludes Etheridge. “We need to know we have a partner that takes our needs seriously and makes sure trained people are available in the event of a disaster. I personally test relationships, and the one we’ve had with InMage is the best support relationship I’ve ever had. It does a lot for our confidence level. If our entire IT department was unavailable, I’m certain InMage could perform a full site recovery. The bottom line is that we have a system for a fraction of the price we would have had to pay for a more conventional method.”
Photo courtesy of MCS@Flickr