There are a number of third-party package repositories out there for Linux distributions. For example, Fedora runs the Extra Packages for Enterprise Linux (EPEL) repository, which contains builds of open source software that isn’t supplied with Fedora or Red Hat Enterprise Linux. Similarly, a lot of projects have their own repositories that supply builds of software for OpenSUSE, Debian, Ubuntu, etc.
Maybe it’s just because I’m old-school, and maybe it’s because I enjoy compiling things, but I really don’t like the idea of using binaries found on the Internet. I never have.
As an aside, I really tried hard to make the rest of this post not be critical, or seem like criticism, but instead just be a reflection of what I think is the reality of the situation for me. A ton of very valuable work is done in the open source community by hard-working, dedicated, smart, honorable volunteers, to which we all owe a lot of beer. I will consider it a failure if those people take this the wrong way.
I’m also going to use RPMs as my examples, but this is all likely true for any package management system, on Linux, Solaris, AIX, whatever.
Anyhow, the reasons I shy away from third-party precompiled & pre-assembled packages are:
1. Trust. The people that are doing the packaging are unknown to me, and I cannot make any assumptions about them or their intentions. For example, I can’t easily audit a binary package to confirm that it’s built from exactly the source that I trust. I can’t even audit it to confirm that it’s built from a particular source package, and I cannot assume that a source RPM & binary RPM pair have the same contents. What if the binary package is built from different source that includes a rootkit?
Source RPMs help this by allowing me to confirm that the tarball inside is the same as what was released by the developers, and I can see what the build parameters are. Then I can build the RPM myself. I like that transparency.
2. Timeliness. Binary builds of software are often done by volunteers, and volunteers, though awesome, have other stuff going on in their lives. In many cases updates are built when someone has time. When it’s a security problem that causes a new release it’s important that the updates appear quickly, and not just when someone has time in the future. A number of Linux distributions have been dinged in the last year for not providing timely security updates, and it’s pretty common to see RPMs lag behind.
Solutions to this sort of problem require a project-level commitment, and are often helped by automation and teams. Automation helps reduce the time a volunteer needs to spend on packaging, and having a team of people who are capable of doing this work increases the odds that someone will have time for it. That said, it’s up to the project to do this sort of thing, and many don’t.
3. Understanding. I like building the software, or the binary RPMs, on my own, because it means I understand the decisions that were made in the process. Why did the packager decide to put the binaries in /usr/local rather than /opt? Who thought those crazy compiler flags were a good idea? Why aren’t the logs in /var/log?
It also helps me understand the dependencies. Too often a third-party package requires another set of packages in order to resolve dependencies, and that quick installation of OpenNMS, for example, via the pre-built binaries turns into a giant chain of installations. The more third-party stuff you install on a host the better the chance of creating dependency loops and conflicts with vendor-supplied software.
To each their own, but I’m a control freak, and I don’t like the idea that some unknown person is making decisions for me and my organization, or that my organization is now dependent in some way on someone who is unaccountable to us. Similarly, I don’t like the idea that I don’t know where binaries really came from, as I install them on hosts inside all of my security and defenses. As a result I build my own software most of the time, or create my own in-house RPMs when I need to, documenting the process and leaving the build environment around so if I’m on vacation someone from my team can take over.
Hi Bob. Most of your points are sound, but its really more “why I don’t use *shitty* third party binary packages”. For instance, you should really take a closer look at EPEL. It is really just an “official but unsupported” package set from the greater redhat/fedora community. They use the fedora build infrastructure and packaging guidelines, and everything’s gpg-key signed. I don’t know how I’d do my job without EPEL, there’s at least a dozen or two packages from there I rely on every day, it would be an insane time-suck to bring that in house. But yea, after EPEL third-party-rpm’s start to get pretty ghetto pretty fast.
Jim, perhaps my example of EPEL is exactly the wrong example to have, because as I wrote about them I was really thinking about the “ghetto” repositories, and the projects out there that build binaries for people.
I popped out of google reader to comment only to find that Jim B. has already touched all the points I wanted to make. So, ditto. EPEL is great and is maintained largely by Red Hat employees.
I’ll throw in my 2 cents to back you up.
The biggest problem I have with pre-built packages are dependency resolution and API compatibility.
I can’t totally fault the API part on the packager, because they may have no idea there’s even going to be a problem. Too many code authors out there rarely document their API changes, because they don’t think it will be an issue, probably because their own API was never properly documented in the first place. So yes, upgrading to the latest and greatest version of RRDTool will break a LOT of things for me.
If the latest version of my OS only offers the latest version of package X which is incompatible with the latest version of package Y, then the whole thing is worthless. More regression testing would go a long way towards resolving problems like this.
Dependencies can tie into the API issue, but most dependency problems come from packaging systems that think they are doing the right thing. And the packagers let it slide because disk space is cheap. I don’t want to have a server loaded with Perl, Python, TCL, and Ruby just because it satisfies dependencies somebody else never really investigated.
Those are the biggies for me, too, and where you get into conflicts with vendor-installed stuff. At least when you build it yourself you can isolate things into /usr/local or another directory of your choosing, and if it doesn’t work you can rm the whole mess without a big dependency fight.
Shitty repos make them all look bad. But the trust thing is a per-repo kind of thing. I trust EPEL, even though the RH pros do make mistakes; just far fewer than others.
A quick read of the above suggests you want to build everything from bare source, almost without an SRPM. I know you wouldn’t risk repeatability for the sake of verifiability, but it’s not reinforced above, and the inexperienced reader may believe it’s the way to go. The worst thing one can do is open up the support nightmare of crusty, hand-built software rotting on unmaintainable machines. Your spec file is your build-doc, after all.
The next trade-off is security vs effort — and I like having the huge teams at RH and such, testing and watching for exploits. I like the network effect where all the other RH subscribers test for corner-cases that slip through System Test at the distro, and that I benefit from that as well just by being in that herd. I like that I can give RH Support an RPM list and they can build a machine very close to one I’m reporting an issue on. But that’s just repeatability working for us.
Binary RPMs, then, get a lot of the benefits of black box software, but with SRPMs you can check and rebuild; welcome to the joys of open source! Once you find a few repos you know you can trust, keep one eye open but do relax a bit. And if you can avoid packaging something like wview or trying to install it by hand on more than one machine, count yourself lucky!
Very well put.
I love bare source, but like you said it’s not repeatable. I like SPEC files for exactly the reasons you describe.
Thank you!