Operational Dynamics
Search
Procedures for Change   |   Systems Review   |   Research & Open Source   |   About Us   |   Talks & Papers
You can read an abridged version of the article online, or better yet, rush out and purchase a copy of Linux Journal today!

A far better reading experience would be had by downloading the article in its original format (it wasn't written to be a web page):


PDF

See also:

Technical Magazines

Home - Reference - Magazine Articles - Unusual reasons to use Gentoo Linux

Gentoo for all the Unusual Reasons

cover of Linux Journal issue 130

Using a source-based Linux distribution to solve real world problem of dealing with newer versions of software.

In the February 2005 issue of Linux Journal.


You can download a PDF of the article at right, see it online at Linux Journal's web site. Or, read on!


Introduction

I have a confession to make. I use Gentoo Linux. All my colleagues at the various Linux User Groups I attend think I'm nuts.

“Everyone knows” that Gentoo is a “source based” Linux distribution. Gentoo's reputation (in large measure pushed by the people who develop the distribution) is that it's for people that want super crazy optimizations, and really only suitable for those who use desktops.

It turns out that Gentoo is really ideal for a whole bunch of other, unexpected, reasons - and that, much to my surprise, there are actually people using Gentoo in production environments for these very reasons.

Speed

Before I can move on to the weird, wonderful, and totally non-standard reasons I'm actually using Gentoo, I need to address a bit of a religious issue: optimizations, and just how much of a performance [speed] gain you get from using them. Unfortunately many people get all wrapped up about this and don't see past it.

Since they're based around binary packages, the bulk of the other Linux distributions (not to mention Microsoft Windows) are limited by their desire to support the lowest common denominator. This is not a bad thing – indeed, the fact that there is binary compatibility across all the descendants of the original i386 processor allows prepackaged software to be run on so many systems. It does mean, however, that they are unable to take advantage of any new optimizations that your fancy [expensive] CPU might offer, which is a pity.

One of the primary ways that Gentoo achieves its performance goals is by optimizing for the processor the system is running. Since Gentoo is a built-from-source distribution, you are able to specify compiler flags to be used when building software for your system. gcc in particular allows one to specify what kind of CPU you're going to build the code for. By specifying the processor type (Intel Pentium III, AMD Athlon Thunderbird, Sun UltraSPARC, etc), the compiler is able to generate processor-specific code and features that (in theory) will result in better (ie hopefully faster) machine code.

But is Gentoo's way really faster? Anecdotal evidence is mixed. It seems to work out that Gentoo system will run somewhat faster than an identically configured one from Red Hat or Debian, but any minor performance advantage will be completely squandered if the system is not installed, configured and tuned correctly. Since many of us don't know how to do that, and since Gentoo does offer so much latitude to do your own thing, it's easy to lose the benefits of slightly faster programs if you do something silly.

The long and the short of it seems to be that in the real world, from a speed/performance perspective, it really doesn't matter whether you use a build-it-from-source distribution or a binary-package distribution.

So if that's not a reason to use Gentoo, why would you want this built-from-source thing?

Common problems in production environments

There are two reasons why people start getting annoyed at their computers. [Actually, come to think of it, there are a whole galaxy of reasons why people get annoyed at computers, but I'll focus on just these for today]. I call it “The newer version problem”, and there are two ways that modern operating systems run into it:

The newer version problem (1) – what if I need something the OS doesn't provide?

Every distribution (be it Linux or Unix) faces a similar problem in the real world.

Let's say we've chosen Sun's Solaris Unix for reasons that are right for our business. Or perhaps I'm rolling out SUSE Linux for medium sized company, or maybe it's Debian Linux in a school. Doesn't matter. The vendor provides the operating system, the GNU toolchain that we all depend on, and tons of other software that gives us a fantastic computing environment. Inevitably, however, there is something that we need that the distro doesn't provide.

This is a crucial point. Invariably, there is something you're going to have to roll out on your own. Two examples:

On the server: you've developed an application which relies upon a new feature in the latest PHP, version 4.3.4. Unfortunately the version of PHP packaged with your Debian system, 4.1.2, doesn't have the functionality you need.

On the desktop: perhaps you're doing some graphics work on your Red Hat workstation, and want to take advantage of the new soft-ray tracing that the latest blender has, and they don't provide it for you. Whatever your particular need is, if you want it you're going to have to build and install yourself.

So if upgrading when you choose to is a task you're going to have to do, does the operating system help you do it?

The newer version problem (2) – hey, they fixed something! Why can't I have it?

Let's face it – free software isn't always perfect. (I know, big secret, wasn't supposed to tell anyone). Commercial software isn't perfect either, but there's a crucial difference: sometimes (and especially in the case of major software like an operating system or a database - I'm thinking Solaris and Oracle here) the vendor will support older versions. If I'm running Solaris 7, and there's a critical driver bug or security issue, then Sun will probably issue a patch that I can use to upgrade my systems.

It's not like that in the free software world. With Apache and the Linux kernel itself being about the only exceptions, no one supports older versions of their software. Any time there's a bug fix or a usability enhancement it's in the next version, the newly released one! Free Software is largely the work of volunteers who have limited time. If they've tracked down a problem, or made an improvement, it's quite reasonable to expect that they've bundled those fixes into the latest version of their software.

This is where Linux (and Unix, for that matter) distributions aren't as strong as they might be.

The current release version of Debian Linux, “stable”, as it is known, is renowned for it's robustness and its ability to run in a wide variety of environments. They achieve this by freezing the versions of various packages at a particular point, then extensively testing that everything works together. Unfortunately, while the core system libraries are rock solid as a result, the user applications (say, a graphical editor or mail client) and common system applications (like the aforementioned PHP) are also frozen in time. Take evolution, Ximian's outstanding email client. Debian stable has version 1.0.5. Evo, on the other hand, has moved through an entire 1.2 stable release, and is now well into a second stable release – [at time of writing] the current version of evolution is 1.4.5! When someone running Debian stable (and hence evo version 1.0.5) contacts the evolution user mailing list asking for help, she is invariably told that the solution to her problem can be obtained by “upgrading to the latest version”. Debian stable, however, doesn't help her do that!

Help me out here!

There's a key issue at play: ease of operation. Does the operating system help you with the challenges that everyday life administering a system presents? Much to my surprise, Gentoo Linux turns out to be really good in this regard.

With Gentoo, one installs new packages by downloading sources and then compiling them – same user experience as with Debian. You want a piece of software, no problem. Just issue the instruction, and away it goes. A little while later, it's installed.

But when I'm faced with the need to get a newer version of a piece of software, that's when Gentoo shines.

Here's an example: let's say I'm using bluefish, the HTML editor, and there's a bug that's annoying me. First of all, a newer version might be available in portage (Gentoo's package system1) so I might be able to just ask for the upgrade. Gentoo has a handy command called etcat2 which is one of several ways I can tell what's available:

# etcat versions bluefish



This tells me I've got version 0.9 of bluefish installed, and now there's a 0.12 available. From reading their website, I know that they've fixed the problem I'm having, so I want the upgrade!

The emerge command can tells us what will happen if I do upgrade:

# emerge --pretend bluefish



Apparently this version of bluefish needs a library called libpcre. Portage has shown me that in addition to doing an upgrade of bluefish, it's going to bring in libpcre as well. Hey, fine with me. So off we go:

# emerge bluefish

First Portage downloads, builds and installs libpcre, and then it does the same for bluefish. Four minutes later, I had my upgrade. Pretty easy!

You might have noticed that it didn't say it was going to install version 0.13. That's because, at present, it's “masked” (that's why it showed up in red). In this scenario, 0.13 just came out, and there's now an ebuild for it. The ebuild, though, is still being tested to see that the software actually installs and that there's nothing blatantly wrong with it.

Maybe it'll be “unmasked” tomorrow, maybe next week, and maybe never if it turns out to have problems and is superseded by an even newer version. If I really needed it, I could override portage and tell it to bring in 0.133 in. Likewise, I could have picked version 0.11 if I'd had a reason to. In my case, I know my issue is fixed in 0.12, it's the most recent available, and so I just let Portage do it's thing.

This flexibility is one of Gentoo's greatest strengths.

Do it yourself packages

A trickier situation occurs when I need to install a piece of software that the system doesn't provide.

One of the significant reasons why various distributions evolved package management tools was so there would be a single, unified view about what is installed on the system. For each piece of software (be it a a basic system tool, a core library, a server program, or a user application) a package is made. As each is installed on your system, the OS records what files got put where, and that the package is installed. That way other software which depends on that package can be installed knowing that their prerequisite pieces are in place.

But what happens if you install a newer version of software and don't have a package appropriate to your OS? You typically go though the same build steps that the person who built the package did, only you probably either

  1. install it in some private place, perhaps /usr/local/bin, and then go to the effort of making sure “your” program is getting run (not the older one); or

  2. blindly install your software to the root filesystem, hoping that you don't clobber anything on the way in, and praying that nothing in the future will ever overwrite the programs and files you have installed.

Think about that for a minute. Doesn't having to worry about that strike you as being sort of silly? I mean, after all, isn't that what the package management system is supposed to prevent?

The usual answer at this point is “well, just create your own package from the software you want to install”. Fair enough.

How easy is that to do?

The question I'm posing isn't "does the ability to make packages exist" (because of course the answer to that is yes across the board), nor am I asking "can you create your own packages", but rather "how easy is it to do so"?

Lets say you've got the OS provided copy of bogofilter, and let's say you've got an .rpm for version 0.16.1 . All good. But suddenly the authors of bogofilter discovered that there was a silly (but serious) error that crept in, and so a short time later released 0.16.2

That upstream has fixed the problem is evident - but you're impatient and you want this silly problem (that happens to be affecting you personally) done away with now! The problem is that you're now stuck with waiting for your distribution to release a new version of the .rpm (or .deb, or .pkg, or ...), and that can sometimes take a very long while. Which leaves you in the position of wanting to roll your own.

That's where the trouble creeps in. Conceptually, just creating your own new .rpm or .deb package is easy, right? "Just use the existing 0.16.1 package as a prototype." But for most people (ie anyone not at wizard level and sometimes not even then) it's actually rather tough to do: You have to:

While this is all doable there's a fairly steep learning curve (especially for newbies) in getting the skills needed here. More to the point, it's a lot of work!

Enter Gentoo

While conceptually no different than above, on a Gentoo system it's much easier.

The magical part is that package description files in Gentoo – “ebuilds” – follow a very simple format. They're basically shell scripts (see sidebar). Along the way you specify where to get the source tarball from. When you build, Portage downloads the source for you and then proceeds to unpack and compile it. Because they're shell scripts, they can use shell variables to great effect; in particular, they take the version number by parsing the ebuild filename and putting it in a variable the script can use.

In our bogofilter example above, the package file (called bogofilter-0.16.1.ebuild) contains a line like this:

SRC_URI="http://sourceforge.net/downloads/bogofilter-${PV}.tar.gz"

When you go to build and install bogofilter, Portage sets $PV to be 0.16.1 based on the filename, and fetches the appropriate .tar.gz4. It then unpacks it, and proceeds to ./configure; make; make install and then build the package as instructed. So, guess how hard it is to create an ebuild script for the new version you want, 0.16.2?

# cd /usr/portage/net-mail/
# cp bogofilter-0.16.1.ebuild bogofilter-0.16.2.ebuild

Done.

DONE!!!

Assuming, of course, that nothing in the package description, unpacking instructions, etc, needs to be updated. Usually you're all right5. Now you can just tell Portage to:

# emerge bogofilter

and you'll have your new version!6

If you want software for which your OS doesn't provide a package, you of course have to write your own. With Gentoo, writing a custom .ebuild is easy – see sidebar!

Administering multiple machines

Consider these problems not just on a single desktop, but in the context of a production platform of dozens of servers or thousands of workstations.

Frankly, there aren't any Operating Systems out there that give you much help here. There's an entire body of literature on the subject of infrastructure management; sadly, the consensus seems to be that despite efforts to cobble things together that there's still a great deal of ad-hoc deployment that goes on. While many vendors have tools that help you build a series of systems the first time, the task of maintaining them over time is left to the individual site to deal with. The newer version problem isn't just about single machines – it's about entire networks of them.

Given that Portage helps solve these problems so well, is it any surprise that people are using Gentoo in production environments?

Portage can be told to build binary packages. This allows you to have one machine over in the corner doing all the compilation work, and then the packages can be shared out and used by all your target machines (instead of them having to build the packages themselves).

You might be tempted to say “but that's what the other Linux distributions do!” The difference is that selecting the right mix of packages is a site decision, and the newer version problem is definitely a site burden to deal with. Gentoo gives the local systems team the tools to deal with solving these version issues themselves.

By using a local build server you can concentrate horsepower and version management effectively, but still have room for local customization. Staging environments are easy to set up. Then, once you're happy with the set of versions you've tested then you just make a snapshot of those binary packages and share them out to your rank-and-file machines.

But that's a story for another day.

Conclusion

Create your own package, or privately version bump an existing one – the newer version problem comes up all the time. The more mainstream package management tools, while "mature", require a much greater level of effort to accomplish these tasks. Conceptually, though, the tasks are trivial. Quite to my surprise (since they don't run around advertising this aspect), the design of Gentoo's tools makes it really easy to do these things yourself.

Use the source, Luke

As I watch packages build, I'm awed again and again at the vast contribution made by so many in the Free Software world. The code really is right there, and that makes it that much easier to make a change and maybe contribute back.

And there's just something nice about using Open Source software that I can see coming from the source.

Unusual reasons indeed.

Acknowledgments

The author would like to thank Stephen White (University of Adelaide) and Andrea Barisani (University of Trieste and Gentoo developer) for having helped develop the ideas here as they relate to production use of Gentoo. They also kindly reviewed the article, as did Pia Smith (Linux Australia), Jeff Waugh (gnome) , Craige McWhirter (Sydney Linux Users Group), and Wade Mealing (Gentoo server project).

Author

Andrew Cowie runs Operational Dynamics, an operations and infrastructure engineering consultancy. He helps organizations get value from their technology, but does so by focusing on people and the processes around people – which is probably why he's so obsessed with finding easier ways to do things. You can reach him at andrew@operationaldynamics.com or as AfC on irc.freenode.net





All about ebuilds (a sidebar)

Gentoo's package descriptions are “written” in bash. They are basically just shell scripts! The various instructions go in functions which get called by Portage at various points along the way. The major ones are:

pkg_setup()
src_unpack()
src_compile()
src_install()
pkg_preinst()
pkg_postinst()

which get called in order. To tell Portage how to build your software, you just write functions for each of the steps, and proceed them with a bit of information (like the SRC_URI discussed in the main text).

To compile your sources, you might have

src_compile () {
./configure --prefix=/usr
make
}

The amazing thing about the shell script thing is that by overloading functions, they can provide sensible defaults. In fact, the default for src_compile() is pretty much what I showed above. And for many packages, that's perfect. In fact, you could write an ebuild which relies on the defaults and has no custom functions defined at all. It works quite frequently! Talk about easy.

Sometime you want to ./configure a package differently depending on what sort of system you have. Portage has an environment variable called USE which contains tokens you can use to describe and customize your system. Say you've got a package that can be told to build differently depending on whether or not you want, say, X Windows support, or IPv6 support. Your src_compile() function might look like something like this:

src_compile () {
use X && conf="${conf} --with-x"
use ipv6 || conf="${conf} --without-ipv6"

./configure --prefix=/usr ${conf}
make
}

You can see various features of shell scripting being used. In this example, if your system (like most users' machines) has X windows on it, then this package will be told to go ahead and build X support in. But if it's a server, and you don't need any of that, then your software gets built without that extra overhead. The USE variables can overridden on the command line, so you have even more precise control if you need it!

src_unpack() is the same. If you don't include one, Portage will just plow ahead, untar the source tarball in the default place, chdir to that directory, and set the working directory environment variable, $WORKDIR accordingly. On the other hand, if something unusual has to happen (say, a patch be applied) then you can write a simple unpack function yourself; they provide some really useful tools to help things along:

src_unpack () {
unpack ${A}
epatch ${FILESDIR}/fixit.patch
}

So, with knowledge of the default activities, the assumed working directories, and the automatically set environment variables, you have enormous power at your fingertips.

This only scratches the surface. For more details, see

1“Portage” is the name for the software package management system in Gentoo. It refers both to the small suite of programs that comprise the system, and also to the collection of build descriptions maintained and made available by the Gentoo Linux developers. emerge is the command you use to do most actions such as installing new software. It, in turn, calls a program called ebuild which actually does the work.

2etcat is in the gentoolkit package, for those wondering where to find it. Great little program!

3There wasn't actually a version 0.13 of bluefish when I wrote this – I just quickly made up an ebuild pretending there was to show you about unstable/masked packages.

4Gentoo has copies of the source tarballs required for all of its various packages on its mirrors around the world. So, normally Portage just gets the source from one of them. If, however, you're building something which isn't in Gentoo's mirrors, no problem – Portage will just reach out to the original upstream download site.

5There's a touch more to keep abreast of – for example, you probably would do the above action in a copy of the
/usr/portage tree (say, /usr/local/portage) , so you don't lose your changes when the primary tree updates, that sort of thing. But the essence doesn't change: copy the file, and then tell it to get on with building.

6Portage uses md5sums to help ensure that you get an uncorrupted downloads. You actually have to run # ebuild bogofilter-0.16.2 digest , which downloads the source and then computes the md5sum for you.

Contents copyright © 2002-2008 Operational Dynamics Consulting, Pty Ltd unless otherwise noted. If you wish to use material found herein, see attribution policy for details.