Infrastructure // Storage
News
2/26/2009
12:15 PM
Howard Marks
Howard Marks
Features
Connect Directly
RSS
E-Mail
50%
50%
Repost This

New Options Power Always-On Apps

Know your systems to find the failsafe failover that works best.

Enterprises of all sizes demand 24/7 application delivery. Server failures, maintenance downtime, and acts of nature are no excuse. If keeping key applications online is your job, you should consider yourself lucky: You have more options for keeping apps up than ever before.

With the tools available today, organizations have few excuses -- not even budgetary ones -- for relying on the hours-long process of manually restoring mission-critical apps from backup.

InformationWeek Reports

Application failover approaches run the gamut, from basic clustered server "ping and a prayer" software to complete virtualized systems and application-specific schemes. Finding the one that's right for you will involve more than a glance at the price tag, which runs from $1,500 to $10,000-plus per protected server. You'll also need to consider ease of use, speed of failover, bandwidth consumption, and how much data is at risk.

When most system administrators look to improve application availability, they start with server clusters. Failover clustering has been available in Windows Server's Enterprise Editions since Windows NT 4 was state of the art in the mid 1990s, but it developed a well-deserved reputation for being finicky.

Windows clusters used shared storage, which of course made the storage subsystem a single point of failure, until Windows Server 2008 was released. Microsoft insisted on only integrated server and storage solutions, so users that had Hewlett-Packard servers and EqualLogic storage, for example, were out of luck when it came to support. Most significantly, applications had to be cluster-aware to smoothly fail over from one server node to another.

Even before Microsoft added clustering to Windows itself, vendors like Double-Take Software released solutions that combined data replication, which eliminates storage as a single point of failure, with automatic failover. Early versions of these products required a lot of setup and tweaking, including installing the OS and applications on both servers. However, the current crop, such as SteelEye's LifeKeeper, CA/XOsoft's WANsync, NeverFail's Continuous Protection Suite, and, of course, Double-Take, can clone a production server to the standby server, both speeding setup and ensuring the servers are similarly configured. And some of these offerings support Linux clustering as well as traditional Windows clusters.

In a generic cluster or high-availability system, the failover server, or servers, monitor the primary host by exchanging heartbeat messages across the network (see diagram, "Two Ways To Keep Apps At Your Service"). If the primary host doesn't respond within a given period of time, the standby server assumes the primary host's identity and starts processing data in its place.

This method can prevent data loss due to a complete failure of the primary host and allows manual failovers for patching and other server maintenance, but it can't detect more subtle failures of services and daemon processes. Vendors including SonaSoft and Marathon sell more app-aware offerings, which check the state of services or connect directly to applications to ensure they're running.

Products also use different methods to allow a standby server to assume the identity of a production server in the event of a failure. The simplest way is to assume the production server's IP address and start appropriate services. A more sophisticated approach used by NeverFail and others is to hide the standby server behind an internal firewall to prevent users from accessing it until it's called on to take on the primary server role. At the top end of the product spectrum, Marathon's EverRun runs the primary and standby servers in lockstep in a virtual environment. Each server processes all data, but users access only the primary one. The backup server waits in the wings until something goes wrong.

diagram: Two Ways To Keep Apps At Your Service
In a high-availability cluster (left), a data center's standby server can adopt the primary server's IP address and identity when it doesn't respond over the heartbeat link. Effective disaster recovery (right) demands a more sophisticated scheme: In the event of server failure, the data must fail over to the standby server, often in a remote facility, via wireless LAN.

Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.