What are the most common mistakes you're likely to hear a Linux sysadmin talk about, anyway? Here are a few that you'll see on almost everyone's short list of unpleasant learning experiences:
- Tripped Up Running Root. You should only log in as root or use root privileges when it's required to do a specific task. Simple mistakes, such as typing "rm "fr" when you're operating as the superuser, are fixed only by restoring your system from a backup -- not by saying "whoops," which is all you'll have to do if you're not running in root at the time.
Whether or not you meant to run as the superuser, think at least twice before hitting "return" when you see the telltale '#' prompt on the command line. Instead of becoming another sysadmin statistic, you can laugh about your brush with disaster over a beer -- or enjoy a good single-malt Scotch. (I recommend Dahlwhinnie, at least 15yo.) And remember: Friends don't let friends see '#' at all, if it's avoidable.
- This Is Only A Test. Or Is It? Confusing a test system with a live system is a silly mistake, but it happens to just about everybody -- once. Don't modify a working system unless you have the old system (i.e. the one that worked before your "harmless" tweak made it inoperable) fully backed up. Which brings us directly to our next mistake:
- The Blown-Off Backup. "I was going to make a backup next! Really!" That's sure to look good on your resume -- the one you'll need to search for a new job.
Seriously, make a decent backup plan and adhere to it. Make a full system backup right now. Hard disks are cheap these days: Pick up a spare, perhaps a USB or FireWire removable drive, and put it to good use. Even burning essential data to CDs or DVDs is better than doing nothing.
When you make a backup -- the one you're working on right now -- arrange to store it off-site if possible. Then, when you make the next backup, using another disk or other media, begin to rotate the disks. Also, after you complete every backup, be sure to confirm that it "took" and created a usable copy of your data. You'd be shocked to find out how many sysadmins learned the hard way that it didn't -- and there's rarely any way besides the hard way to learn something like this.
Finally, don't blindly rely upon RAID to protect your data. RAID is a great idea and a good backup policy -- if it's designed and used properly and consistently. Using RAID improperly or infrequently is just a waste of time, and it will either make you go blind or grow hair on the palms of your sweaty sysadmin hands.
- Patch Now Or Pay Later. It is better to reboot a server every once in a while, if possible when no one is using it, than to have it down all day when people need it the most. Applying a kernel patch or libc security fix may cost you some uptime, but not applying it could cost you a lot more.
It's a given that you'll want to update your system every now and then, whether for a minor desktop GUI tweak, a kernel upgrade, an application update, or a security patch. Don't avoid updates (fully backed and at your convenience, of course), but don't get in too much of a hurry to install them, either. When you find out about an update for your system or application software, you should generally avoid being among the first to install it on a live system. Let somebody else play the guinea pig and try it out first: There's no real need to rush most upgrades, anyway.
There is an exception to the laid-back approach: security-inspired upgrades. If there's a new rootkit (hacker parlance for an exploit that gives an intruder root privileges or the root password) or any other security fix available, you should install it with as little delay as possible. That is, install it after you've backed up your system <hint> you know, the one you've been meaning to make anyway </hint> and only after you've avoided another mistake:
- Don't Order 'Server Surprise.' In spite of the near-panic that ensues when a security-related release goes public, you should still make sure it's appropriate and actually fixes the problem it addresses. For production servers, this type of software testing should include deploying updates on a backup server: an identical machine with an identical software configuration.
Remember: Having a skeptical attitude and planning for the worst-case scenario is what really separates wizard sysadmins from newbies. Even if you carefully test software before you deploy it, you should always have a contingency plan in the event an upgrade doesn't go smoothly -- they rarely do. One such plan: Run the old and new systems in parallel until you're sure the new system works "as advertised." And did I mention backing up your system before you deploy new software?
Ross M. Greenberg has been sysadmining and enjoying good single malt since before there was a Linux. He's enjoyed a lot of Scotch and has made more than his share of stupid mistakes -- but not at the same time!