How One SMB Slashed Disaster Recovery Time
Midmarket firm Hit Promotional cut its backup window from around 17 hours to two. The CIO explains how it streamlined disaster preparedness in its virtualized environment.
Hit Promotional Products CIO Eric Shonebarger about his biggest pain point and you'll hear a familiar answer: Data, and lots of it.
The branded promotional gear wholesale distributor has grown rapidly in recent years. At the same time, the business has increasingly asked IT to connect disparate systems--even if those systems were never intended to interact. Compounding the challenge: Shonebarger's 10-person team, most of whom are focused on development, serves 900 employees and counting. All told, Shonebarger's department must manage about 24 TB of data, triple the amount it had three years ago--no small chore for a midmarket firm.
More SMB Insights
- Windows 7: Tips and Best Practices for Simplified Migration
- Lightweight Application Development Systems Ride the Cloud: IDG Report
"When you're dealing with both [unstructured and structured data], it's just a massive undertaking," Shonebarger said in an interview. "To make sure the data is up-to-date, to get it where it needs to belong at the right time, to store it, back it up, retrieve it--all of that is getting more complicated."
Virtualization is another part of the story--more than 90% of Hit Promotional's IT infrastructure runs virtually, and that will soon hit 100%. That wasn't always the case. The company first dipped its toes into the virtual pool back in 2007 when it turned an extra server into a virtual host to test the technology. "Before we knew it, we had 15 or 16 [virtual machines] running on it, and no redundancy," Shonebarger said. In 2009, Hit Promotional made a significant investment to take everything but its email system virtual, deploying a VMware environment on Dell servers, backed up to an EqualLogic storage area network (SAN). For all of virtualization's benefits, the transformation shined a light on some flaws in Hit Promotional's data strategy: Like plenty of other small and midsize businesses (SMBs), it tried to retrofit existing technology and processes to a new world order.
[ Automate your billing process. See Deadbeats 2.0: How SMBs Use Cloud To Get Paid. ]
"We were treating those virtual machines just like they were physical machines," Shonebarger said. That was particularly true of Hit Promotional's existing backup and recovery methods--and therein lies an early lesson learned: A significant change like moving from physical to virtual infrastructure has wider implications than a few servers or desktops. The company had been using Symantec's Backup.exec software, but Shonebarger encountered a significant problem after the virtualization makeover. The backup proxy required a direct connection to the storage logical unit number (LUN).
"That's really, really dangerous to have, say, a Windows box with full-blown access to do whatever it wants to VMware LUNs" Shonebarger said. In that scenario, based on an actual one that occurred at Hit Promotional, if an administrator were to provision more storage locally, those LUNs would appear in Device Manager as if they were the same as USB drives. "Hey, there's a 2 TB LUN here, would you like to format it? All it takes is a single admin to say 'yes' and your whole LUN's gone. And VMware's resilient enough that it won't even tell you there's a problem until it runs out of disk space, can't put stuff into cache anymore, and takes a complete dive on you," Shonebarger said.
The breaking point: Shonebarger realized that a total IT disaster requiring a complete restore would likely take days to recover from, a scenario his business couldn't afford. Hit Promotional tried out a few different options and settled on Veeam, which builds its backup and recovery platform specifically for virtual environments. The first key gain: In that total restore scenario, Shonebarger said data interruption to his users would be a matter of seconds rather than days. That's enabled by Veeam's vPower feature, which will run a VM directly from Shonebarger's disk-based backups while the SAN is restored simultaneously. It becomes a see-no-evil, hear-no-evil proposition for end users.
"[As] IT, we're sometimes aware of problems or mistakes that, if we can keep it from the users and they can still have their normal experience, they don't care if it takes me a day to recover from failure," Shonebarger said. "They just want to know: Is my data available, is the system up and running, and can I get my job done?"
Similarly, Shonebarger has cut down a 17-to-18-hour window for a full SQL database backup--which rendered the database effectively unusable--to around 2 hours. The Veeam platform has also enabled significant compression of Hit Promotional's growing data; the company is in the process of upgrading its SAN, but can currently back up its 24 TB to a target that only actually holds 16 TB, with 5 TB still free.
Shonebarger offers some parting shots of advice for fellow SMB execs: He beats the recovery testing drum, for one. In his view, too many SMBs back up data without ever pondering the total disaster scenario--and what it would take to recover those backups effectively and efficiently. More importantly, he thinks there's a common misconception that replication equals safety.
"Don't confuse high availability with backup and recovery," Shonebarger said. By that, he means deploying a replicant SAN, for example, and thinking your job is done. While redundancy can be a good thing, it's no use if the original data is bad. The aforementioned LUN-formatting scenario is one example in the bad data category, and there are plenty of others.
"That's not smart," Shonebarger said. "That can come back to bite you in a pretty bad way."
Most external hacks of databases occur because of flaws in Web applications that link to those databases. In this report, Protecting Databases From Web Applications, we'll discuss how security teams, database administrators, and application developers can work together to improve the defenses of both front-end Web applications and back-end databases to prevent these attacks from succeeding. (Free registration required.)