It's major upgrade time again! The Debian project just published
the Debian 11 "bullseye" release, and it's pretty awesome! This
makes me realized that I have never written here about my peculiar
upgrade process, and figured it was worth
bringing that up to a wider audience.
My upgrade process also has a
notable changes section
which includes major version changes (e.g. Inkscape 1.0!), new
packages (e.g. podman!) and important behavior changes
(e.g. driverless scanning and printing!).
I'm particularly interested to hear about any significant change I
might have missed. If you know of a cool new package that shipped with
bullseye and that I forgot, do let me know!
But that's for the cool new stuff. We need to talk about the problems
with Debian major upgrades.
I have been maintaining detailed upgrade guides,
on my wiki, starting with the jessie
release, but I have actually written such
guides for Koumbit.org as far back as Debian squeeze in 2011
(another worker wrote the older Debian lenny upgrade guide in
2009). Koumbit, since then, has kept maintaining those guides all the
way to the latest bullseye upgrade, through 7 major releases!
Over the years, those guides evolved from a quick "cheat-sheet" format
copied from the release notes into a more or less "scripted" form
that I currently use.
Each guide has a procedure made of a few steps that can be basically
copy-pasted to batch-upgrade a host (or multiple hosts in parallel) as
quickly as possible. There is also the predict-os script which
allows you to keep track of progress of the upgrades in a Puppet cluster.
Limitations of the official procedure
In comparison with my procedure, the official upgrade guide is
mostly designed to upgrade a single machine, typically a workstation,
with a rather slow and exhaustive process. The PDF version of the
upgrade guide is 14 pages long! This, obviously, does not work
when you have tens or hundreds of machines to upgrade.
Debian upgrades are notorious for being extremely reliable, but we
have a lot of packages, and there are always corner cases where
the upgrade will just fail because of a bug specific to your
environment. Those will only be fixed after some back and forth in the
community (and that's assuming users report those bugs, which is not
always the case). There's no obvious way to deploy "hot fixes" in this
context, at least not without fixing the package and publishing it
on an unofficial Debian archive while the official ones catch up. This
is slow and difficult.
Or some packages require manual labor. Examples of this are the
PostgreSQL or Ganeti packages which require you to upgrade your
clusters by hand, while the old and new packages live side by
side. Debian packages bring you far in the upgrade process, but
sometimes not all the way.
Which means every Debian install needs to be manually upgraded and
inspected when a new release comes out. That's slow and error prone
and we can do better.
How to automate major upgrades
I have a proposal to automate this. It's been mostly dormant in
the Debian wiki, for 5 years now. Fundamentally, this is a hard
problem: Debian gets installed in so many different environments, from
workstations to physical servers to virtual machines, embedded systems
and so on, that it's extremely hard to come up with a "one size fits
The (manual) procedure I'm using is mostly targeting servers, but I'm
also using it on workstations. And I'll note that it's specific to my
home setup: I have a different procedure at work, although it has
a lot of common code.
To automate this, I would factor out that common code with hooks where
you could easily inject special code like "you need to upgrade
first", "you need an extra reboot here", or "this is how you finish
the PostgreSQL upgrade".
With Debian getting closer to a 2 year release cycle, with the
previous release being supported basically only one year after the
new stable comes out, I feel more and more strongly that this needs
So I'm thinking that I should write a prototype for this. Ubuntu has
do-release-upgrade that is too Ubuntu-specific to be reused. An
attempt at collaborating on this has been mostly met with silence
from Ubuntu's side as well.
I'm thinking that using something like Fabric, Mitogen, or
Transilience: anything that will allow me to write simple,
portable Python code that can run transparently on a local machine
(for single systems upgrades, possibly with a GUI frontend) to remote
servers (for large clusters of servers, maybe with canaries and
grouping using Cumin). I'll note that Koumbit started
experimenting with Puppet Bolt in the bullseye upgrade process,
but that feels too site-specific to be useful more broadly.
I am not sure where this stands in the XKCD time trade-off
evaluation, because the table doesn't actually cover the time
frequency of Debian release (which is basically "biennial") and the
amount of time the upgrade would take across a cluster (which varies a
lot, but that I estimate to be between one to 6 hours per machine).
Assuming I have 80 machines to upgrade, that is 80 to 480 hours
(between ~3 to 20 days) of work! It's unclear how much work such an
automated system would shave off, however. Assuming things are an
order of magnitude faster (say I upgrade 10 machines at a time), I
would shave off between 3 and 18 days of work, which implies I might
allow myself to spend a minimum of 5 days working on such a
The other option: never upgrade
Before people mention those: I am aware of containers, Kubernetes, and
other deployment mechanisms. Indeed, those may be a long-term
solution, we currently can't afford to migrate everything over to
containers right now: that is a huge migration and a total paradigm
shift. At that point, whatever is left might not even be Debian in the
first place. And besides, if you run Kubernetes, you still need to run
some OS underneath and upgrade that, so that problem never
Still, maybe that's the final answer: never upgrade.
For some stateless machines like DNS replicas or load balancers, that
might make a lot of sense as there's no or little data to carry to the
new host. But this implies a seamless and fast provisioning process,
and we don't have that either: at my work, installing a machine takes
about as long as upgrading it, and that's after a significant amount
of work automating that process, partly writing my own Debian
installer with Fabric (!).
What is your process?
I'm curious to hear what people think of those ideas. It strikes me as
really odd that no one has really tackled that problem yet,
considering how many clusters of Debian machines are out there. Surely
people are upgrading those, and not following that slow step by step
I suspect everyone is doing the same thing: we all have our little
copy-paste script we batch onto multiple machines, sometimes in
parallel. That is what the Debian.org sysadmins are doing as well.
There must be a better way. What is yours?
My upgrades so far
So far, I have upgraded 2 out of my 3 home machines running buster --
others have been installed directly in bullseye -- with only my main,
old, messy server left. Upgrades have been pretty painless so far (see
another report, for example), much better than the previous
buster upgrade. Obviously, for me personal
use, automating this is pointless.
Work-side, however, is another story: we have over 80 boxes to
upgrade there and that will take a while. The last stretch to buster
cycle took about two years to complete, so we might be done by the
time the next release (12, "bookworm") is released, but that's
actually a full year after "buster" becomes EOL, so it's actually
At least I fixed the installers so that new the machines we create all
ship with bullseye, so we stopped accumulating new buster hosts...
Thanks to lelutin and pabs for reviewing a draft of this post.