Whether you want to customize Knoppix, respin an existing distribution of the open-source operating system, like Puppy Linux, or are intent on creating your own package from scratch, we'll walk you through the process.

Serdar Yegulalp, Contributor

January 23, 2008

21 Min Read

DIY: Do It Yourself. That's how Linux got started. A group of volunteers, inspired and led by Linus Torvalds, created the greatest DIY operating system the world has ever seen. You, too, can create your own Linux distribution. Here's how.



(click image for larger view)


The Simple Remaster script in Puppy Linux lets you take an existing installation and respin the results out to an ISO file.

View the image gallery.

It sounds daunting, and there's a lot that's definitely not for beginners. But in terms of what you can learn along the way and what you end up with, it's on a par with building your own PC from scratch. Granted, as with building a PC from scratch, there are still plenty of reasons to buy something off the shelf -- or to grab an existing distribution and simply use that. That said, there are a few solid rationales for rolling your own distro:

For education. Sometimes the best way to get to know something is just to stick your hands under the hood and get 'em dirty -- with a little guidance, of course. There's no guarantee that you'll get to know everything, but you will gain a fearlessness about the process of learning and a sense of where to go to get answers about something (and how to get those answers) that you may not get by simply installing a populist's distribution and getting quotidian work done.

For specific problem-solving or filling a gap. If there's some oddball hardware configuration that you want to work with, for instance, you can create a customization of a given distribution to work well with that hardware. This might be something as simple as adding kernel-level support for a given device, or something more elaborate. Or maybe you want to create a distribution that fills a very specific need. (A while back, I wrote about Hikarunix, a sadly-now-discontinued distribution devoted to Go players; it doesn't get more specialized than that.)

For fun. I call this the "why not?" rule. Linux exists to be tinkered with, so tinker with it, and have a blast. Obviously you won't want to trust any production data to a system you're doing such work with, but that's no reason you can't have fun with it. And if something gets messed up, you can always wipe the slate clean and start over. (After all, you're not doing any of this in a production environment, right?)

Caveats

A few things are worth keeping in mind before you dive in and start rolling.

Learn a little something about Linux first. Before you attempt to create your own custom distro, make yourself at least moderately familiar with Linux if you haven't done so already. In short: use it at least a little bit. If you don't know the basics, there's little point in trying to create a custom distro.

Have patience and pay attention. Any project of this scope, even when automated to some degree, is likely to be long and frustrating. Don't expect to get it all done in a day, and be prepared to take notes as to what everything does and why.

Tweak at your own risk. Be mindful that any modifications you make could have unexpected repercussions. For instance, if you decide to disable kernel support for PC Card devices (perhaps because you're not using your custom build on a notebook and don't want to include those modules), you'll want to document that in the event anyone who uses a notebook and depends on such devices won't be stuck if they use your custom build. Even if you don't plan on distributing the resulting build, it's still a good idea to document any major functional changes. You'd be amazed at what you can forget further on down the line.

Go with stable over speedy. One of the possibilities you might discover along the way is the possibility of using compiler optimizations to speed things up. This means compiling the source code for your distribution's kernel or tool chain to use instructions implemented on a specific variety of processor, such as the multimedia MMX / SSE / 3Dnow extensions. The bad news is that implementing these optimizations can be a flaky affair, and sometimes you don't find out just how flaky until you've gone fairly far down into the build process. The default compiler flags should work just fine for beginners. As Jim Gifford says in the above-linked document, "The fact that I don't have any problems compiling everything with [a certain instruction set] doesn't mean you won't have any problems either."

Don't eat your own dog food -- yet. The term "eating your own dog food" means using what you create in a quotidian fashion to test how good it is. The roll-your-own-Linux version of this sentiment would be to use the system you've created as your daily, production system. Unless you're working on copies and not original data, this is not the hottest idea on the world. Existing, publicly vetted distributions already have been given a thorough shakedown by the community and are less likely to have showstopper issues. Creating your own distribution comes with far less of an assurance that you'll get a given level of stability or feature completeness. But sometimes running into such walls headlong is a way to learn about them firsthand, if you have the time and inclination for it.

Respin An Existing Distribution

There are three basic ways to roll your own distribution, depending on the scope of what you want to accomplish and the level of technical expertise you have to bring to the project. The first, easiest, and possibly the most immediately useful to most people, is remastering an existing distribution.



(click image for larger view)


Starting a Gentoo installation process with the Gentoo Live CD. The regular command-line version of Gentoo's install CD will work just fine as well, but some people may be more comfortable with a GUI.

View the image gallery.

Remastering, or respinning, involves installing a given distribution, customizing it, and then recompiling the distribution, modifications and all, back into an image file (typically an .ISO). In the last couple of years this approach has become much easier thanks to collections of community-created tools and scripts to automate the process, so it's something that is rapidly becoming a native function for many distributions. If you're just getting your feet wet with Linux and want to try your hand at creating a modified distribution, this is the best place to start.

One of my most personally beloved distributions, Puppy Linux, can be remastered in this fashion in a number of different ways. The most basic way to do this is through the built-in Puppy Simple CD Remaster script, which recompiles everything in the current live file system to a CD. The script will pause and prompt you as it goes along, letting you know when and where to make any changes you'd like to apply -- e.g., adding hardware customizations to the /etc directory. (Tip: Puppy's default file manager (ROX-Filer) lets you explore an .ISO file like a read-only directory, so you can peek manually at the results once you're done.)

Note that if you want to do this with a hard-drive-based install of Puppy, your best bet is to create a "frugal" install of Puppy on the hard drive, make whatever changes you want, and then respin that. The "frugal" install allows Puppy to coexist with other operating systems on the same partition (mainly other Linux installations). It stores the entire contents of the Puppy install as five big files, one of them an image file that represents Puppy's filesystem. This is as opposed to the "full" installation, where Puppy requires a whole partition unto itself and writes out all the files conventionally. The Simple CD Remaster script will not work properly as-is if you have a full installation, although allegedly it can be forced to do so.

The Simple CD Remaster script is probably the best place to get your feet wet, since it gives you some exposure to how all this works in a fairly controlled fashion. A more complicated but technically advanced approach is the third-party HackyRemaster script. This script takes the contents of the Puppy CD or .ISO file, expands it to a working folder (which can be hosted anywhere), and lets you make any changes you like directly to the file system. Once finished, the whole thing can be recompressed and remastered back to an .ISO file.

Puppy Unleashed involves taking a large (1.5 GB) archive of all the available packages for Puppy and using that to build a custom distribution. Obviously, the downside to this approach is that it requires that you download the whole package archive and know a fair amount about what you're tinkering with. Another possible starting point with Puppy is Empty Crust, a heavily stripped down version of Puppy Linux 1.0.7. It's admittedly several revisions behind the current version of Puppy when you take it out of the box, but still useful for this sort of work.

I've used Puppy as the major example for remastering/respinning Linux because it's one of the simplest, but there are many other major distributions that have similar functions. The ever-useful Knoppix, the live-CD distribution from which many others are commonly built, has a walk through that describes how to customize Knoppix quite thoroughly, from adding or removing packages to customizing the look-and-feel of the system. And Ubuntu, too, has a method for customizing its install CDs, although the tutorial in question isn't very automatic -- there are, however, community-written scripts to make the job that much easier.

As a side note, I should mention a clever online tool I encountered for creating your own custom distribution: the Custom NimbleX CD builder, which generates a custom-assembled .ISO of the tiny-but-useful NimbleX distribution. It's not as flexible as actually assembling a distribution by hand, but it's still quite useful.

Linux From Scratch

The next step up for those who are a little more ambitious about rolling their own distribution is Linux From Scratch.

LFS (as I'll abbreviate it herein) is both a distribution and an online guidebook for creating your own Linux distribution. The LFS LiveCD, which gives you a thoroughly spec'd-out environment for building your own Linux installation, includes a full copy of the book itself on the CD, and contains the sources you'll need to perform all the builds. It's to Linux what Heathkit was to radios and early personal computers.

LFS assumes that you already have a fair amount of working knowledge of Linux. At the very least, you should be able to find your way around the command line and follow directions. That said, one of the beauties of the LFS approach is that every single command you use to build the whole distribution is documented from the inside out, so you aren't just blindly following a set of instructions. The implications of everything you're doing -- every command, every syntax switch -- are made clear to you all along.



(click image for larger view)


Preparing a Linux From Scratch session via one of the automation scripts. This isn't a substitute for knowing how to build a distribution in LFS; you still need to know how to conduct the build process from beginning to end.

View the image gallery.

The actual creation of the new LFS Linux system is done by using an existing Linux installation, a "host," as an environment in which to do the work. Most of the time, you can just grab the LFS Live CD and use that, since it includes an environment that's been specifically tailored for this kind of work and reduces the number of variables that might crop up.

There' some parallels between erecting a building from scratch and using LFS to build a Linux distribution, and since it's a metaphor that the LFS authors also have employed throughout their book I'll also use that as a metaphor when it’s convenient.

1. Preliminaries. The first several steps are sort of like breaking ground and pouring the foundation for a new building. You'll be walked through setting up a file system (4 GB or so -- I'd say devote 8 GB or more; space is cheap), grabbing the basic set of packages needed to get things running, and setting up a few other prefatory bits like the user account you'll be using for most of the LFS work.

2. The Temporary System. The temp system is a little like the scaffolding for the building you're putting up -- it's not the building itself, but is essential to erecting it, and it will be removed when we no longer need it. The temp system consists mainly of the tool chain -- a set of utilities that you build that will in turn be used to build the distribution proper, such as the GCC compiler. The tools in the tool chain are themselves compiled from source -- a nice way to get some crash-course exposure to the concept of compiling from source, which is pretty indispensable when dealing with Linux and open-source software as a whole.

3. Building And Booting The System Itself. Here we actually get to begin constructing the distribution proper -- i.e., raising the building. As before, all this work -- like creating the directories most commonly used by the system -- will be done "by hand," with details along the way about what everything is and why it's implemented in this particular fashion. Then comes creating the boot scripts, which control the system startup process, making the system bootable, and (finally!) starting up your newly created LFS system.

The project doesn't end there, either. There are several other entries in the LFS "family" of distribution-building projects, which you can use as the next step up. First and most likely is the appropriately named Beyond Linux From Scratch, which delves into the nitty-gritty of customizing just about every aspect of your newly created Linux distribution. This is where you want to go if you either have ambitions to turn your Linux distribution into something more upscale and usable by others, or if you just want to learn all the more about what goes into your typical Linux distribution (the various X servers, the different command-line shells, etc.)

Hardened Linux From Scratch lets you create a security-conscious version of Linux from the ground up, although the project is still somewhat in flux (in their words, "This book may be broken in some places, but less broken than before.") Cross Linux From Scratch lets you perform the LFS build process using cross-compilation: to quote their example, you could build a Sparc toolchain on an x86 machine, and then use that toolchain on the Sparc to build a Linux distribution entirely from source there. This is probably the most advanced of the LFS projects, and also the one with the narrowest scope of appeal, but still fascinating in its own right and useful if you're trying to build a Linux distribution for some exotic brand of hardware.

Yet another approach to the original LFS construction process is Automated Linux From Scratch, which gives you a high degree of automation for the LFS build process. The way this is done deserves some kind of Nobel Prize for cleverness: the entire LFS book itself is downloaded, and the script commands in the text are extracted and run automatically. This way, it works directly from the most recent version of the book, whatever it may be, and can be used to build any of the LFS projects listed above. Note that this is not a substitute for reading the book; you still need to perform a certain amount of work to prepare the system, and have an understanding of the LFS build process to begin with.

Gentoo

No discussion of creating your own distribution from scratch would be complete without at least some discussion of Gentoo. Like LFS, Gentoo is built from the ground up using source code, not just during the initial setup process but later on down the line: packages can be obtained as source and compiled specifically for your machine's architecture, on-demand. And because of this, pretty much every aspect of the entire system -- the packages used, the optimizations used to compile them, and all the rest -- are at your disposal.

That's both the good news and the bad news. Yes, it means you have the power to configure the entire system from top to bottom, to make it yours in a way almost no other distribution can be fine-tuned -- a perfect tool for rolling your own. It also means you have the power to totally mess things up, or at the very least get stuck at any number of points along the way. Gentoo is definitely for experts only. As a friend once put it, "If you have to ask how hard it is, you probably shouldn't try it." That said, Gentoo can, and is, used as a production environment by the careful and knowledgeable.

The process for creating a Gentoo system is fairly similar to creating an LFS system. First, you obtain one of several possible installation CDs, depending on the machine architecture you're working with. The 32-bit x86 version of Gentoo, for instance, comes on a different CD than the 64-bit edition. (Note that the links from here are to the x86 version of the documentation for Gentoo; you will need to look in a different place for other architectures.)

After booting the CD and configuring the network and installation partitions, you then download what's called a stage3 archive or stage3 tarball. The name "stage3" comes from the fact that earlier versions of Gentoo required that you go through up to two earlier stages of the installation -- bootstrapping the buildchain. The stage3 archive contains everything you need to get started as quickly as possible for your particular machine architecture without having to go through those first two stages. The next step involves making use of Gentoo's Portage package system to assemble the rest of the pieces and bring the system up to date. Because Gentoo revolves around source code as closely as possible, Portage doesn't grab precompiled packages from a repository -- it downloads the most recent source code for a given package, compiles it "on demand" for the platform you're running on, and then adds it to your system. Because compiling can be slow, especially if you're compiling a whole system's complement of A-list programs, some major applications (Firefox, for instance) are available in a precompiled package.

At this stage you also have the option of tweaking how Portage compiles packages, such as what optimizations are applied, although you'll generally want to keep this minimal the first time out. A similar amount of tweaking can be applied to building the kernel itself, so that features like multiprocessor/multicore or PC Card support can be enabled or disabled. Again, be warned that trying to tweak everything the first time out may only make things worse -- save it for when you've run through this whole process at least once with as close to a stock configuration as you're planning to use.

Once you've created the system proper, one of the other key elements you may work with to further shape Gentoo is USE flags. These flags, set in an environment variable, are used in conjunction with building packages -- they let you include or exclude support for specific features from various packages as a way to streamline or expand your Gentoo build. For instance, if you're creating a system that doesn't need support for X11 (like a headless server that you're only administering from a command line), you can use the -X flag to negate compiling in support for X11.

To create a system that you will actually be distributing to others, you need to get some experience with a Gentoo tool named Catalyst, which is designed specifically for building components of a released Linux distribution. The stage3 tarball you installed by hand earlier is one of the things you build with Catalyst. It's also possible to build a live CD using Catalyst as one of the tools to accomplish that (or build one from scratch without Catalyst, which allegedly has slightly more flexibility).

Share And Share Alike

So what's the next step after creating a distribution of your very own? Aside from "eating your own dog food," as described above, you might want to share the love: release it to the public.

It's an optional step, but a hugely useful one, and you may realize that the microdistribution you cobbled together has an audience after all. If your audience is thoughtful, you'll get valuable feedback on what works and what doesn't. And even if you have no intention of releasing the distribution broadly, you may learn things that help you improve what you're trying to do.

There's a few ways to get a distribution out there. One of the most common is to submit mention of it to DistroWatch, probably the single most well-trafficked site that deals with the panopoly of Linux distributions out there. As I write this, its submissions form is offline pending some reworking, but DistroWatch can be contracted via e-mail; also note that it doesn't accept submissions for certain types of distributions at all (such as those hosted within Windows).

Actual hosting space for a distribution is another story. SourceForge generally doesn't accept hosting for Linux distributions due to the size involved, although it can be used to host the bug-tracking or discussion lists for such a project as long as the data itself is kept somewhere else. Fortunately, Web space has become amazingly cheap as of late, although to make the distribution a little less onerous you may want to use BitTorrent as a way to offset the bandwidth burden.

Remember to distribute the source tree as well. I'm assuming for the sake of argument that the distribution you'll be creating, and its component packages, are GPLed, so remember to also distribute the source code. The source doesn't have to be provided in the same package as the binaries, especially since the source tree can be quite large; several gigabytes isn't unheard of. What matters is that you make it available and document that fact, especially if you end up making any changes to the source.

Finally, speaking of documentation, that's something else you may want to provide with your new distribution. Whether it's just a simple set of README pages or a full wiki, you'll want to take the time to talk about what's special about your distro -- quirks you've noticed, things to try (or not try!), and ideas for where you're going with it in future builds. A distribution is, after all, always a work in progress.

About the Author(s)

Serdar Yegulalp

Contributor

Follow Serdar Yegulalp and BYTE on Twitter and Google+:

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights