Home
Reference
Magazine Articles
Unusual reasons to use Gentoo Linux
Gentoo for all the Unusual Reasons
Using a source-based Linux distribution to
solve real world problem of dealing with newer versions
of software.
In the February 2005 issue of
Linux Journal.
You can download a PDF of the article at right, see it
online
at Linux Journal's web site. Or, read on!
Introduction
I have a confession to make. I use Gentoo Linux. All my colleagues
at the various Linux User Groups I attend think I'm nuts.
“Everyone knows” that Gentoo is a “source based”
Linux distribution. Gentoo's reputation (in large measure pushed by
the people who develop the distribution) is that it's for people that
want super crazy optimizations, and really only suitable for those
who use desktops.
It turns out that Gentoo is really ideal for a whole bunch of
other, unexpected, reasons - and that, much to my surprise, there are
actually people using Gentoo in production environments for these
very reasons.
Speed
Before I can move on to the weird, wonderful, and totally
non-standard reasons I'm actually using Gentoo, I need to address a
bit of a religious issue: optimizations, and just how much of a
performance [speed] gain you get from using them. Unfortunately many
people get all wrapped up about this and don't see past it.
Since they're based around binary packages, the bulk of the other
Linux distributions (not to mention Microsoft Windows) are limited by
their desire to support the lowest common denominator. This is not
a bad thing – indeed, the fact that there is binary
compatibility across all the descendants of the original i386
processor allows prepackaged software to be run on so many systems.
It does mean, however, that they are unable to take advantage of any
new optimizations that your fancy [expensive] CPU might offer, which
is a pity.
One of the primary ways that Gentoo achieves its performance goals
is by optimizing for the processor the system is running. Since
Gentoo is a built-from-source distribution, you are able to specify
compiler flags to be used when building software for your system.
gcc in particular allows one
to specify what kind of CPU you're going to build the code for. By
specifying the processor type (Intel Pentium III, AMD Athlon
Thunderbird, Sun UltraSPARC, etc), the
compiler is able to generate processor-specific code and features
that (in theory) will result in better (ie hopefully faster) machine
code.
But is Gentoo's way really faster? Anecdotal evidence is mixed. It
seems to work out that Gentoo system will run somewhat faster than an
identically configured one
from Red Hat or Debian, but any minor performance advantage
will be completely squandered if the system is not installed,
configured and tuned correctly. Since many of us don't know
how to do that, and since Gentoo does offer so much latitude to do
your own thing, it's easy to lose the benefits of slightly faster
programs if you do something silly.
The long and the short of it seems to be that in the real world,
from a speed/performance perspective, it really doesn't matter
whether you use a build-it-from-source distribution or a
binary-package distribution.
So if that's not a reason to use Gentoo, why would you want
this built-from-source thing?
Common problems in
production environments
There are two reasons why people start getting annoyed at their
computers. [Actually, come to think of it, there are a whole galaxy
of reasons why people get annoyed at computers, but I'll focus on
just these for today]. I call it “The newer version problem”,
and there are two ways that modern operating systems run into it:
The newer version problem (1) – what if I
need something the OS doesn't provide?
Every distribution (be it Linux or
Unix) faces a similar problem in the real world.
Let's say we've chosen Sun's
Solaris Unix for reasons that are right for our business. Or perhaps
I'm rolling out SUSE Linux for medium sized company, or maybe it's
Debian Linux in a school. Doesn't matter. The vendor provides the
operating system, the GNU toolchain that we all depend on, and tons
of other software that gives us a fantastic computing environment.
Inevitably, however, there is something that we need
that the distro doesn't provide.
This is a crucial point. Invariably, there is something you're
going to have to roll out on your own. Two examples:
On the server: you've developed an application which relies upon a
new feature in the latest PHP,
version 4.3.4. Unfortunately the version of PHP
packaged with your Debian system, 4.1.2, doesn't have the
functionality you need.
On the desktop: perhaps you're doing some graphics work on your
Red Hat workstation, and want to take advantage of the new soft-ray
tracing that the latest blender
has, and they don't provide it for you. Whatever your particular need
is, if you want it you're going to have to build and install
yourself.
So if upgrading when you choose to is
a task you're going to have to do, does the operating system help you
do it?
The newer version problem (2) – hey, they
fixed something! Why can't I have it?
Let's face it – free software isn't always perfect. (I know,
big secret, wasn't supposed to tell anyone). Commercial software
isn't perfect either, but there's a crucial difference: sometimes
(and especially in the case of major software like an operating
system or a database - I'm thinking Solaris and Oracle here) the
vendor will support older versions. If I'm running Solaris 7, and
there's a critical driver bug or security issue, then Sun will
probably issue a patch that I can use to upgrade my systems.
It's not like that in the free
software world. With Apache and the Linux kernel itself being about
the only exceptions, no one supports older versions of their
software. Any time there's a bug fix or a usability enhancement it's
in the next version, the newly
released one! Free Software is largely the work of volunteers who
have limited time. If they've tracked down a problem, or made an
improvement, it's quite reasonable to expect that they've bundled
those fixes into the latest version of their software.
This is where Linux (and Unix, for that
matter) distributions aren't as strong as they might be.
The current release version of
Debian Linux, “stable”,
as it is known, is renowned for it's robustness and its ability to
run in a wide variety of environments. They achieve this by freezing
the versions of various packages at a particular point, then
extensively testing that everything works together. Unfortunately,
while the core system libraries are rock solid as a result, the user
applications (say, a graphical editor or mail client) and common
system applications (like the aforementioned PHP) are also frozen in
time. Take evolution,
Ximian's outstanding email client. Debian stable has version
1.0.5. Evo, on the other hand, has moved
through an entire 1.2 stable release, and is now well into a second
stable release – [at time of writing] the current version of
evolution is 1.4.5! When someone running Debian stable (and hence evo
version 1.0.5) contacts the evolution user mailing list asking for
help, she is invariably told that the solution to her problem can be
obtained by “upgrading to the latest version”.
Debian stable, however, doesn't
help her do that!
Help me out here!
There's a key
issue at play: ease of operation. Does the operating system help you
with the challenges that everyday life administering a system
presents? Much to my surprise, Gentoo Linux turns out to be really
good in this regard.
With
Gentoo, one installs new packages by downloading sources and then
compiling them – same user experience as with
Debian. You want a piece of software, no problem. Just issue the
instruction, and away it goes. A little while later, it's installed.
But when I'm faced with the need to get a newer version of
a piece of software, that's when Gentoo shines.
Here's an example: let's say I'm using bluefish,
the HTML editor, and there's a bug that's annoying me. First of all,
a newer version might be available in portage
(Gentoo's package system)
so I might be able to just ask for the upgrade. Gentoo has a handy
command called etcat
which is one of several ways I can tell what's available:
# etcat versions bluefish

This tells me I've got version 0.9 of bluefish installed, and now
there's a 0.12 available. From reading their website, I know that
they've fixed the problem I'm having, so I want the upgrade!
The emerge command can
tells us what will happen if I do upgrade:
# emerge --pretend bluefish

Apparently this version of bluefish needs a library called
libpcre. Portage has shown me
that in addition to doing an upgrade of bluefish, it's going to bring
in libpcre as well. Hey, fine with me. So off we go:
# emerge bluefish
First Portage downloads, builds and installs libpcre, and then it
does the same for bluefish. Four minutes later, I had my upgrade.
Pretty easy!
You might have noticed that it didn't say it was going to install
version 0.13. That's because, at present, it's “masked”
(that's why it showed up in red). In this scenario, 0.13 just came
out, and there's now an ebuild for it. The ebuild, though, is still
being tested to see that the software actually installs and that
there's nothing blatantly wrong with it.
Maybe it'll be “unmasked” tomorrow, maybe next week,
and maybe never if it turns out to have problems and is superseded by
an even newer version. If I really needed it, I could override
portage and tell it to bring in 0.13
in. Likewise, I could have picked version 0.11 if I'd had a reason
to. In my case, I know my issue is fixed in 0.12, it's the most
recent available, and so I just let Portage do it's thing.
This flexibility is one of Gentoo's greatest strengths.
Do it yourself packages
A trickier
situation occurs when I need to install a piece of software that the
system doesn't provide.
One
of the significant reasons why various distributions evolved package
management tools was so there would be a single, unified view about
what is installed on the system. For each piece of software
(be it a a basic system tool, a core library, a server program, or a
user application) a package is made. As each is installed on
your system, the OS records what files got put where, and that the
package is installed. That way other software which depends on that
package can be installed knowing that their prerequisite pieces are
in place.
But what happens if you install a newer
version of software and don't have a package appropriate to your OS?
You typically go though the same build steps that the person who
built the package did, only you probably either
install it in some private place,
perhaps /usr/local/bin, and
then go to the effort of making sure “your” program is
getting run (not the older one); or
blindly install your software to
the root filesystem, hoping that you don't clobber anything on the
way in, and praying that nothing in the future will ever overwrite
the programs and files you have installed.
Think about that for a minute. Doesn't
having to worry about that strike you as being sort of silly? I mean,
after all, isn't that what the package management system is supposed
to prevent?
The usual answer at this point is
“well, just create your own package from the software you want
to install”. Fair enough.
How easy is that to
do?
The question I'm posing isn't "does the ability to make
packages exist" (because of course the answer to that is yes
across the board), nor am I asking "can you create your own
packages", but rather "how easy is it to do so"?
Lets say you've got the OS provided copy of bogofilter,
and let's say you've got an .rpm for version 0.16.1 . All good. But
suddenly the authors of bogofilter discovered that there was a silly
(but serious) error that crept in, and so a short time later released
0.16.2
That upstream has fixed the problem is evident - but you're
impatient and you want this silly problem (that happens to be
affecting you personally) done away with now! The problem is that
you're now stuck with waiting for your distribution to release a new
version of the .rpm (or .deb, or .pkg, or ...), and that can
sometimes take a very long while. Which leaves you in the position of
wanting to roll your own.
That's where the trouble creeps in. Conceptually, just creating
your own new .rpm or .deb package is easy, right? "Just use the
existing 0.16.1 package as a prototype." But for most people (ie
anyone not at wizard level and sometimes not even then) it's actually
rather tough to do: You have to:
download the package description or somehow extract if from
the existing package file;
manually download the new version of the upstream .tar.gz
(or whatever) source and unpack it;
transplant the build descriptions (in the case of Debian
into the new upstream sources) and maybe even patch against
those sources;
you might have to modify the build script to instruct it
about the new version;
actually try and create the package, which of course involves
compiling it (which probably also will require you to install a
large number of "-dev" packages
you hadn't previously known about - real pain);
then you install, and test.
While this is all doable there's a fairly steep learning
curve (especially for newbies) in getting the skills needed here.
More to the point, it's a lot of work!
Enter Gentoo
While conceptually no different than above, on a Gentoo system
it's much easier.
The magical part is that package description files in Gentoo –
“ebuilds” – follow a very simple format. They're
basically shell scripts (see sidebar). Along the way you specify
where to get the source tarball from. When
you build, Portage downloads the source for you and then
proceeds to unpack and compile it. Because they're shell scripts,
they can use shell variables to great effect; in particular, they
take the version number by parsing the ebuild
filename and putting it in a variable the script can use.
In our bogofilter example above, the package file (called
bogofilter-0.16.1.ebuild) contains a line like this:
SRC_URI="http://sourceforge.net/downloads/bogofilter-${PV}.tar.gz"
When you go to build and install bogofilter, Portage sets $PV
to be 0.16.1 based on the filename, and fetches the appropriate
.tar.gz.
It then unpacks it, and proceeds to ./configure; make; make install
and then build the package as instructed. So, guess how hard it is to
create an ebuild script for the new version you want, 0.16.2?
# cd
/usr/portage/net-mail/
# cp
bogofilter-0.16.1.ebuild bogofilter-0.16.2.ebuild
Done.
DONE!!!
Assuming, of course, that nothing in the package description,
unpacking instructions, etc, needs to be updated. Usually you're all
right.
Now you can just tell Portage to:
# emerge bogofilter
and you'll have your new version!
If you
want software for which your OS doesn't provide a package, you of
course have to write your own. With Gentoo, writing a custom .ebuild
is easy – see sidebar!
Administering multiple machines
Consider these problems not just on a single desktop, but in the
context of a production platform of dozens of servers or thousands of
workstations.
Frankly, there aren't any Operating Systems out there that give
you much help here. There's an entire body of literature on the
subject of infrastructure management; sadly, the consensus seems to
be that despite efforts to cobble things together that there's still
a great deal of ad-hoc deployment that goes on. While many vendors
have tools that help you build a series of systems the first time,
the task of maintaining them over time is left to the individual site
to deal with. The newer version problem isn't just about single
machines – it's about entire networks of them.
Given that Portage helps solve these problems so well, is it any
surprise that people are using Gentoo in production environments?
Portage can be told to build binary packages. This allows you to
have one machine over in the corner doing all the compilation work,
and then the packages can be shared out and used by all your target
machines (instead of them having to build the packages themselves).
You might be tempted to say “but that's what the other Linux
distributions do!” The difference is that selecting the right
mix of packages is a site decision, and the newer version problem is
definitely a site burden to deal with. Gentoo gives the local systems
team the tools to deal with solving these version issues themselves.
By using a local build server you can concentrate horsepower and
version management effectively, but still have room for local
customization. Staging environments are easy to set up. Then, once
you're happy with the set of versions you've tested then you just
make a snapshot of those binary packages and share them out to your
rank-and-file machines.
But that's a story for another day.
Conclusion
Create your own package, or privately version bump an existing one
– the newer version problem comes up all the time. The more
mainstream package management tools, while "mature",
require a much greater level of effort to accomplish these tasks.
Conceptually, though, the tasks are trivial. Quite to my surprise
(since they don't run around advertising this aspect), the design of
Gentoo's tools makes it really easy to do these things yourself.
Use the source, Luke
As I watch packages build, I'm awed
again and again at the vast contribution made by so many in the Free
Software world. The code really is right there, and that makes it
that much easier to make a change and maybe contribute back.
And there's just something nice
about using Open Source software that I can see coming from
the source.
Unusual reasons indeed.
Acknowledgments
The author would like to thank
Stephen White (University of Adelaide) and Andrea Barisani
(University of Trieste and Gentoo developer) for having helped
develop the ideas here as they relate to production use of Gentoo.
They also kindly reviewed the article, as did Pia Smith (Linux
Australia), Jeff Waugh (gnome) , Craige McWhirter (Sydney Linux Users
Group), and Wade Mealing (Gentoo server project).
Author
Andrew
Cowie runs Operational
Dynamics,
an operations and infrastructure engineering consultancy. He helps
organizations get value from their technology, but does so by
focusing on people and the processes around people – which is
probably why he's so obsessed with finding easier ways to do things.
You can reach him at andrew@operationaldynamics.com
or as AfC
on irc.freenode.net
All about
ebuilds (a sidebar)
Gentoo's package descriptions are “written” in bash.
They are basically just shell scripts! The various instructions go in
functions which get called by Portage at various points along the
way. The major ones are:
pkg_setup()
src_unpack()
src_compile()
src_install()
pkg_preinst()
pkg_postinst()
which get called in order. To tell Portage how to build your
software, you just write functions for each of the steps, and proceed
them with a bit of information (like the SRC_URI
discussed in the main text).
To compile your sources, you might have
src_compile () {
./configure
--prefix=/usr
make
}
The amazing thing about the shell script thing is that by
overloading functions, they can provide sensible defaults. In fact,
the default for src_compile() is pretty much what I showed above. And
for many packages, that's perfect. In fact, you could write an ebuild
which relies on the defaults and has no custom functions defined at
all. It works quite frequently! Talk about easy.
Sometime you want to ./configure a package differently depending
on what sort of system you have. Portage has an environment variable
called USE which contains
tokens you can use to describe and customize your system. Say you've
got a package that can be told to build differently depending on
whether or not you want, say, X Windows support, or IPv6 support.
Your src_compile() function might look like something like this:
src_compile () {
use X &&
conf="${conf} --with-x"
use ipv6 || conf="${conf}
--without-ipv6"
./configure --prefix=/usr
${conf}
make
}
You can see various features of shell scripting being used. In
this example, if your system (like most users' machines) has X
windows on it, then this package will be told to go ahead and build X
support in. But if it's a server, and you don't need any of that,
then your software gets built without that extra overhead. The USE
variables can overridden on the command line, so you have even more
precise control if you need it!
src_unpack() is the same. If you don't include one, Portage will
just plow ahead, untar the source tarball in the default place, chdir
to that directory, and set the working directory environment
variable, $WORKDIR
accordingly. On the other hand, if something unusual has to happen
(say, a patch be applied) then you can write a simple unpack function
yourself; they provide some really useful tools to help things along:
src_unpack () {
unpack ${A}
epatch
${FILESDIR}/fixit.patch
}
So, with knowledge of the default
activities, the assumed working directories, and the automatically
set environment variables, you have enormous power at your
fingertips.
This only scratches the surface. For
more details, see