Planet Tor

@ooni November 24, 2022 - 00:00 • 3 days ago
Join the 10th Ooniversary Events!
Subscribe on YouTube You’re invited to join us for the celebration of OONI’s 10th anniversary! To share OONI highlights from the last 10 years, as well as how community members have used OONI tools and data as part of their work, we’ll be hosting 2 live-streamed events: OONI Highlights: Date and time: 5th December 2022 at 14:00 UTC (1 hour event) ...
@blog November 23, 2022 - 00:00 • 4 days ago
New Release: Tor Browser 11.5.9 (Android)
...
@meejah November 23, 2022 - 00:00 • 4 days ago
Winden: magic-wormhole for the Web
Least Authority has launched a Web-based magic-wormhole client ...
@blog November 22, 2022 - 00:00 • 5 days ago
New Release: Tor Browser 11.5.8 (Android, Windows, macOS, Linux)

Tor Browser 11.5.8 is now available from the Tor Browser download page and also from our distribution directory. This release will not be published on Google Play due to their target API level requirements. Assuming we do not run into any major problems, Tor Browser 11.5.9 will be an Android-only release that fixes this issue.

Tor Browser 11.5.8 backports the following security updates from Firefox ESR 102.5 to Firefox ESR 91.13 on Windows, macOS and Linux:

Tor Browser 11.5.8 updates GeckoView on Android to Firefox ESR 102.5 and includes important security updates. Tor Browser 11.5.8 backports the following security updates from Firefox 107 to Firefox ESR 102.5 on Android:

The full changelog since Tor Browser 11.5.7 is:

...
@anarcat November 16, 2022 - 22:04 • 11 days ago
Wayland: i3 to Sway migration

I started migrating my graphical workstations to Wayland, specifically migrating from i3 to Sway. This is mostly to address serious graphics bugs in the latest Framwork laptop, but also something I felt was inevitable.

The current status is that I've been able to convert my i3 configuration to Sway, and adapt my systemd startup sequence to the new environment. Screen sharing only works with Pipewire, so I also did that migration, which basically requires an upgrade to Debian bookworm to get a nice enough Pipewire release.

I'm testing Wayland on my laptop, but I'm not using it as a daily driver because I first need to upgrade to Debian bookworm on my main workstation.

Most irritants have been solved one way or the other. My main problem with Wayland right now is that I spent a frigging week doing the conversion: it's exciting and new, but it basically sucked the life out of all my other projects and it's distracting, and I want it to stop.

The rest of this page documents why I made the switch, how it happened, and what's left to do. Hopefully it will keep you from spending as much time as I did in fixing this.

TL;DR: Wayland is mostly ready. Main blockers you might find are that you need to do manual configurations, DisplayLink (multiple monitors on a single cable) doesn't work in Sway, HDR and color management are still in development.

I had to install the following packages:

apt install \
    brightnessctl \
    foot \
    gammastep \
    gdm3 \
    grim slurp \
    pipewire-pulse \
    sway \
    swayidle \
    swaylock \
    wdisplays \
    wev \
    wireplumber \
    wlr-randr \
    xdg-desktop-portal-wlr

And did some of tweaks in my $HOME, mostly dealing with my esoteric systemd startup sequence, which you won't have to deal with if you are not a fan.

Why switch?

I originally held back from migrating to Wayland: it seemed like a complicated endeavor hardly worth the cost. It also didn't seem actually ready.

But after reading this blurb on LWN, I decided to at least document the situation here. The actual quote that convinced me it might be worth it was:

It’s amazing. I have never experienced gaming on Linux that looked this smooth in my life.

... I'm not a gamer, but I do care about latency. The longer version is worth a read as well.

The point here is not to bash one side or the other, or even do a thorough comparison. I start with the premise that Xorg is likely going away in the future and that I will need to adapt some day. In fact, the last major Xorg release (21.1, October 2021) is rumored to be the last ("just like the previous release...", that said, minor releases are still coming out, e.g. 21.1.4). Indeed, it seems even core Xorg people have moved on to developing Wayland, or at least Xwayland, which was spun off it its own source tree.

X, or at least Xorg, in in maintenance mode and has been for years. Granted, the X Window System is getting close to forty years old at this point: it got us amazingly far for something that was designed around the time the first graphical interface. Since Mac and (especially?) Windows released theirs, they have rebuilt their graphical backends numerous times, but UNIX derivatives have stuck on Xorg this entire time, which is a testament to the design and reliability of X. (Or our incapacity at developing meaningful architectural change across the entire ecosystem, take your pick I guess.)

What pushed me over the edge is that I had some pretty bad driver crashes with Xorg while screen sharing under Firefox, in Debian bookworm (around November 2022). The symptom would be that the UI would completely crash, reverting to a text-only console, while Firefox would keep running, audio and everything still working. People could still see my screen, but I couldn't, of course, let alone interact with it. All processes still running, including Xorg.

(And no, sorry, I haven't reported that bug, maybe I should have, and it's actually possible it comes up again in Wayland, of course. But at first, screen sharing didn't work of course, so it's coming a much further way. After making screen sharing work, though, the bug didn't occur again, so I consider this a Xorg-specific problem until further notice.)

There were also frustrating glitches in the UI, in general. I actually had to setup a compositor alongside i3 to make things bearable at all. Video playback in a window was laggy, sluggish, and out of sync.

Wayland fixed all of this.

Wayland equivalents

This section documents each tool I have picked as an alternative to the current Xorg tool I am using for the task at hand. It also touches on other alternatives and how the tool was configured.

Note that this list is based on the series of tools I use in desktop.

TODO: update desktop with the following when done, possibly moving old configs to a ?xorg archive.

Window manager: i3 → sway

This seems like kind of a no-brainer. Sway is around, it's feature-complete, and it's in Debian.

I'm a bit worried about the "Drew DeVault community", to be honest. There's a certain aggressiveness in the community I don't like so much; at least an open hostility towards more modern UNIX tools like containers and systemd that make it hard to do my work while interacting with that community.

I'm also concern about the lack of unit tests and user manual for Sway. The i3 window manager has been designed by a fellow (ex-)Debian developer I have a lot of respect for (Michael Stapelberg), partly because of i3 itself, but also working with him on other projects. Beyond the characters, i3 has a user guide, a code of conduct, and lots more documentation. It has a test suite.

Sway has... manual pages, with the homepage just telling users to use man -k sway to find what they need. I don't think we need that kind of elitism in our communities, to put this bluntly.

But let's put that aside: Sway is still a no-brainer. It's the easiest thing to migrate to, because it's mostly compatible with i3. I had to immediately fix those resources to get a minimal session going:

i3 Sway note
set_from_resources set no support for X resources, naturally
new_window pixel 1 default_border pixel 1 actually supported in i3 as well

That's it. All of the other changes I had to do (and there were actually a lot) were all Wayland-specific changes, not Sway-specific changes. For example, use brightnessctl instead of xbacklight to change the backlight levels.

See a copy of my full sway/config for details.

Other options include:

  • dwl: tiling, minimalist, dwm for Wayland, not in Debian
  • Hyprland: tiling, fancy animations, not in Debian
  • Qtile: tiling, extensible, in Python, not in Debian (1015267)
  • river: Zig, stackable, tagging, not in Debian (1006593)
  • velox: inspired by xmonad and dwm, not in Debian
  • vivarium: inspired by xmonad, not in Debian

Status bar: py3status → waybar

I have invested quite a bit of effort in setting up my status bar with py3status. It supports Sway directly, and did not actually require any change when migrating to Wayland.

Unfortunately, I had trouble making nm-applet work. Based on this nm-applet.service, I found that you need to pass --indicator for it to show up at all.

In theory, tray icon support was merged in 1.5, but in practice there are still several limitations, like icons not clickable. Also, on startup, nm-applet --indicator triggers this error in the Sway logs:

nov 11 22:34:12 angela sway[298938]: 00:49:42.325 [INFO] [swaybar/tray/host.c:24] Registering Status Notifier Item ':1.47/org/ayatana/NotificationItem/nm_applet'
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet IconPixmap: No such property “IconPixmap”
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet AttentionIconPixmap: No such property “AttentionIconPixmap”
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet ItemIsMenu: No such property “ItemIsMenu”
nov 11 22:36:10 angela sway[313419]: info: fcft.c:838: /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf: size=24.00pt/32px, dpi=96.00

... but that seems innocuous. The tray icon displays but is not clickable.

Note that there is currently (November 2022) a pull request to hook up a "Tray D-Bus Menu" which, according to Reddit might fix this, or at least be somewhat relevant.

If you don't see the icon, check the bar.tray_output property in the Sway config, try: tray_output *.

The non-working tray was the biggest irritant in my migration. I have used nmtui to connect to new Wifi hotspots or change connection settings, but that doesn't support actions like "turn off WiFi".

I eventually fixed this by switching from py3status to waybar, which was another yak horde shaving session, but ultimately, it worked.

Web browser: Firefox

Firefox has had support for Wayland for a while now, with the team enabling it by default in nightlies around January 2022. It's actually not easy to figure out the state of the port, the meta bug report is still open and it's huge: it currently (Sept 2022) depends on 76 open bugs, it was opened twelve (2010) years ago, and it's still getting daily updates (mostly linking to other tickets).

Firefox 106 presumably shipped with "Better screen sharing for Windows and Linux Wayland users", but I couldn't quite figure out what those were.

TL;DR: echo MOZ_ENABLE_WAYLAND=1 >> ~/.config/environment.d/firefox.conf && apt install xdg-desktop-portal-wlr

How to enable it

Firefox depends on this silly variable to start correctly under Wayland (otherwise it starts inside Xwayland and looks fuzzy and fails to screen share):

MOZ_ENABLE_WAYLAND=1 firefox

To make the change permanent, many recipes recommend adding this to an environment startup script:

if [ "$XDG_SESSION_TYPE" == "wayland" ]; then
    export MOZ_ENABLE_WAYLAND=1
fi

At least that's the theory. In practice, Sway doesn't actually run any startup shell script, so that can't possibly work. Furthermore, XDG_SESSION_TYPE is not actually set when starting Sway from gdm3 which I find really confusing, and I'm not the only one. So the above trick doesn't actually work, even if the environment (XDG_SESSION_TYPE) is set correctly, because we don't have conditionals in environment.d(5).

(Note that systemd.environment-generator(7) do support running arbitrary commands to generate environment, but for some some do not support user-specific configuration files... Even then it may be a solution to have a conditional MOZ_ENABLE_WAYLAND environment, but I'm not sure it would work because ordering between those two isn't clear: maybe the XDG_SESSION_TYPE wouldn't be set just yet...)

At first, I made this ridiculous script to workaround those issues. Really, it seems to me Firefox should just parse the XDG_SESSION_TYPE variable here... but then I realized that Firefox works fine in Xorg when the MOZ_ENABLE_WAYLAND is set.

So now I just set that variable in environment.d and It Just Works™:

MOZ_ENABLE_WAYLAND=1

Screen sharing

Out of the box, screen sharing doesn't work until you install xdg-desktop-portal-wlr or similar (e.g. xdg-desktop-portal-gnome on GNOME). I had to reboot for the change to take effect.

Without those tools, it shows the usual permission prompt with "Use operating system settings" as the only choice, but when we accept... nothing happens. After installing the portals, it actualyl works, and works well!

This was tested in Debian bookworm/testing with Firefox ESR 102 and Firefox 106.

Major caveat: we can only share a full screen, we can't currently share just a window. The major upside to that is that, by default, it streams only one output which is actually what I want most of the time! See the screencast compatibility for more information on what is supposed to work.

This is actually a huge improvement over the situation in Xorg, where Firefox can only share a window or all monitors, which led me to use Chromium a lot for video-conferencing. With this change, in other words, I will not need Chromium for anything anymore, whoohoo!

If slurp, wofi, or bemenu are installed, one of them will be used to pick the monitor to share, which effectively acts as some minimal security measure. See xdg-desktop-portal-wlr(1) for how to configure that.

Side note: Chrome fails to share a full screen

I was still using Google Chrome (or, more accurately, Debian's Chromium package) for some videoconferencing. It's mainly because Chromium was the only browser which will allow me to share only one of my two monitors, which is extremely useful.

To start chrome with the Wayland backend, you need to use:

chromium  -enable-features=UseOzonePlatform -ozone-platform=wayland

If it shows an ugly gray border, check the Use system title bar and borders setting.

It can do some screensharing. Sharing a window and a tab seems to work, but sharing a full screen doesn't: it's all black. Maybe not ready for prime time.

And since Firefox can do what I need under Wayland now, I will not need to fight with Chromium to work under Wayland:

apt purge chromium

Note that a similar fix was necessary for Signal Desktop, see this commit. Basically you need to figure out a way to pass those same flags to signal:

--enable-features=WaylandWindowDecorations --ozone-platform-hint=auto

Email: notmuch

See Emacs, below.

File manager: thunar

Unchanged.

News: feed2exec, gnus

See Email, above, or Emacs in Editor, below.

Editor: Emacs okay-ish

Emacs is being actively ported to Wayland. According to this LWN article, the first (partial, to Cairo) port was done in 2014 and a working port (to GTK3) was completed in 2021, but wasn't merged until late 2021. That is: after Emacs 28 was released (April 2022).

So we'll probably need to wait for Emacs 29 to have native Wayland support in Emacs, which, in turn, is unlikely to arrive in time for the Debian bookworm freeze. There are, however, unofficial builds for both Emacs 28 and 29 provided by spwhitton which may provide native Wayland support.

I tested the snapshot packages and they do not quite work well enough. First off, they completely take over the builtin Emacs — they hijack the $PATH in /etc! — and certain things are simply not working in my setup. For example, this hook never gets ran on startup:

(add-hook 'after-init-hook 'server-start t) 

Still, like many X11 applications, Emacs mostly works fine under Xwayland. The clipboard works as expected, for example.

Scaling is a bit of an issue: fonts look fuzzy.

I have heard anecdotal evidence of hard lockups with Emacs running under Xwayland as well, but haven't experienced any problem so far. I did experience a Wayland crash with the snapshot version however.

TODO: look again at Wayland in Emacs 29.

Backups: borg

Mostly irrelevant, as I do not use a GUI.

Color theme: srcery, redshift → gammastep

I am keeping Srcery as a color theme, in general.

Redshift is another story: it has no support for Wayland out of the box, but it's apparently possible to apply a hack on the TTY before starting Wayland, with:

redshift -m drm -PO 3000

This tip is from the arch wiki which also has other suggestions for Wayland-based alternatives. Both KDE and GNOME have their own "red shifters", and for wlroots-based compositors, they (currently, Sept. 2022) list the following alternatives:

I configured gammastep with a simple gammastep.service file associated with the sway-session.target.

Display manager: lightdm → gdm3

Switched because lightdm failed to start sway:

nov 16 16:41:43 angela sway[843121]: 00:00:00.002 [ERROR] [wlr] [libseat] [common/terminal.c:162] Could not open target tty: Permission denied

Possible alternatives:

Terminal: xterm → foot

One of the biggest question mark in this transition was what to do about Xterm. After writing two articles about terminal emulators as a professional journalist, decades of working on the terminal, and probably using dozens of different terminal emulators, I'm still not happy with any of them.

This is such a big topic that I actually have an entire blog post specifically about this.

For starters, using xterm under Xwayland works well enough, although the font scaling makes things look a bit too fuzzy.

I have also tried foot: it ... just works!

Fonts are much crisper than Xterm and Emacs. URLs are not clickable but the URL selector (control-shift-u) is just plain awesome (think "vimperator" for the terminal).

There's cool hack to jump between prompts.

Copy-paste works. True colors work. The word-wrapping is excellent: it doesn't lose one byte. Emojis are nicely sized and colored. Font resize works. There's even scroll back search (control-shift-r).

Foot went from a question mark to being a reason to switch to Wayland, just for this little goodie, which says a lot about the quality of that software.

The selection clicks are a not quite what I would expect though. In rxvt and others, you have the following patterns:

  • single click: reset selection, or drag to select
  • double: select word
  • triple: select quotes or line
  • quadruple: select line

I particularly find the "select quotes" bit useful. It seems like foot just supports double and triple clicks, with word and line selected. You can select a rectangle with control,. It correctly extends the selection word-wise with right click if double-click was first used.

One major problem with Foot is that it's a new terminal, with its own termcap entry. Support for foot was added to ncurses in the 20210731 release, which was shipped after the current Debian stable release (Debian bullseye, which ships 6.2+20201114-2). A workaround for this problem is to install the foot-terminfo package on the remote host, which is available in Debian stable.

This should eventually resolve itself, as Debian bookworm has a newer version. Note that some corrections were also shipped in the 20211113 release, but that is also shipped in Debian bookworm.

That said, I am almost certain I will have to revert back to xterm under Xwayland at some point in the future. Back when I was using GNOME Terminal, it would mostly work for everything until I had to use the serial console on a (HP ProCurve) network switch, which have a fancy TUI that was basically unusable there. I fully expect such problems with foot, or any other terminal than xterm, for that matter.

The foot wiki has good troubleshooting instructions as well.

Update: I did find one tiny thing to improve with foot, and it's the default logging level which I found pretty verbose. After discussing it with the maintainer on IRC, I submitted this patch to tweak it, which I described like this on Mastodon:

today's reason why i will go to hell when i die (TRWIWGTHWID?): a 600-word, 63 lines commit log for a one line change: https://codeberg.org/dnkl/foot/pulls/1215

It's Friday.

Launcher: rofi → rofi??

rofi does not support Wayland. There was a rather disgraceful battle in the pull request that led to the creation of a fork (lbonn/rofi), so it's unclear how that will turn out.

Given how relatively trivial problem space is, there is of course a profusion of options:

Tool In Debian Notes
alfred yes general launcher/assistant tool
bemenu yes, bookworm+ inspired by dmenu
cerebro no Javascript ... uh... thing
dmenu-wl no fork of dmenu, straight port to Wayland
Fuzzel ITP 982140 dmenu/drun replacement, app icon overlay
gmenu no drun replacement, with app icons
kickoff no dmenu/run replacement, fuzzy search, "snappy", history, copy-paste, Rust
krunner yes KDE's runner
mauncher no dmenu/drun replacement, math
nwg-launchers no dmenu/drun replacement, JSON config, app icons, nwg-shell project
Onagre no rofi/alfred inspired, multiple plugins, Rust
πmenu no dmenu/drun rewrite
Rofi (lbonn's fork) no see above
sirula no .desktop based app launcher
Ulauncher ITP 949358 generic launcher like Onagre/rofi/alfred, might be overkill
tofi yes, bookworm+ dmenu/drun replacement, C
wmenu no fork of dmenu-wl, but mostly a rewrite
Wofi yes dmenu/drun replacement, not actively maintained
yofi no dmenu/drun replacement, Rust

The above list comes partly from https://arewewaylandyet.com/ and awesome-wayland. It is likely incomplete.

I have read some good things about bemenu, fuzzel, and wofi.

A particularly tricky option is that my rofi password management depends on xdotool for some operations. At first, I thought this was just going to be (thankfully?) impossible, because we actually like the idea that one app cannot send keystrokes to another. But it seems there are actually alternatives to this, like wtype or ydotool, the latter which requires root access. wl-ime-type does that through the input-method-unstable-v2 protocol (sample emoji picker, but is not packaged in Debian.

As it turns out, wtype just works as expected, and fixing this was basically a two-line patch. Another alternative, not in Debian, is wofi-pass.

The other problem is that I actually heavily modified rofi. I use "modis" which are not actually implemented in wofi or tofi, so I'm left with reinventing those wheels from scratch or using the rofi + wayland fork... It's really too bad that fork isn't being reintegrated...

For now, I'm actually still using rofi under Xwayland. The main downside is that fonts are fuzzy, but it otherwise just works.

Note that wlogout could be a partial replacement (just for the "power menu").

Image viewers: geeqie → ?

I'm not very happy with geeqie in the first place, and I suspect the Wayland switch will just make add impossible things on top of the things I already find irritating (Geeqie doesn't support copy-pasting images).

In practice, Geeqie doesn't seem to work so well under Wayland. The fonts are fuzzy and the thumbnail preview just doesn't work anymore (filed as Debian bug 1024092). It seems it also has problems with scaling.

Alternatives:

See also this list and that list for other list of image viewers, not necessarily ported to Wayland.

TODO: pick an alternative to geeqie, nomacs would be gorgeous if it wouldn't be basically abandoned upstream (no release since 2020), has an unpatched CVE-2020-23884 since July 2020, does bad vendoring, and is in bad shape in Debian (4 minor releases behind).

So for now I'm still grumpily using Geeqie.

Media player: mpv, gmpc / sublime

This is basically unchanged. mpv seems to work fine under Wayland, better than Xorg on my new laptop (as mentioned in the introduction), and that before the version which improves Wayland support significantly, by bringing native Pipewire support and DMA-BUF support.

gmpc is more of a problem, mainly because it is abandoned. See 2022-08-22-gmpc-alternatives for the full discussion, one of the alternatives there will likely support Wayland.

Finally, I might just switch to sublime-music instead... In any case, not many changes here, thankfully.

Screensaver: xsecurelock → swaylock

I was previously using xss-lock and xsecurelock as a screensaver, with xscreensaver "hacks" as a backend for xsecurelock.

The basic screensaver in Sway seems to be built with swayidle and swaylock. It's interesting because it's the same "split" design as xss-lock and xsecurelock.

That, unfortunately, does not include the fancy "hacks" provided by xscreensaver, and that is unlikely to be implemented upstream.

Other alternatives include gtklock and waylock (zig), which do not solve that problem either.

It looks like swaylock-plugin, a swaylock fork, which at least attempts to solve this problem, although not directly using the real xscreensaver hacks. swaylock-effects is another attempt at this, but it only adds more effects, it doesn't delegate the image display.

Other than that, maybe it's time to just let go of those funky animations and just let swaylock do it's thing, which is display a static image or just a black screen, which is fine by me.

In the end, I am just using swayidle with a configuration based on the systemd integration wiki page but with additional tweaks from this service, see the resulting swayidle.service file.

Interestingly, damjan also has a service for swaylock itself, although it's not clear to me what its purpose is...

Screenshot: maim → grim, pubpaste

I'm a heavy user of maim (and a package uploader in Debian). It looks like the direct replacement to maim (and slop) is grim (and slurp). There's also swappy which goes on top of grim and allows preview/edit of the resulting image, nice touch (not in Debian though).

See also awesome-wayland screenshots for other alternatives: there are many, including X11 tools like Flameshot that also support Wayland.

One key problem here was that I have my own screenshot / pastebin software which will needed an update for Wayland as well. That, thankfully, meant actually cleaning up a lot of horrible code that involved calling xterm and xmessage for user interaction. Now, pubpaste uses GTK for prompts and looks much better. (And before anyone freaks out, I already had to use GTK for proper clipboard support, so this isn't much of a stretch...)

Screen recorder: simplescreenrecorder → wf-recorder

In Xorg, I have used both peek or simplescreenrecorder for screen recordings. The former will work in Wayland, but has no sound support. The latter has a fork with Wayland support but it is limited and buggy ("doesn't support recording area selection and has issues with multiple screens").

It looks like wf-recorder will just do everything correctly out of the box, including audio support (with --audio, duh). It's also packaged in Debian.

One has to wonder how this works while keeping the "between app security" that Wayland promises, however... Would installing such a program make my system less secure?

Many other options are available, see the awesome Wayland screencasting list.

RSI: workrave → nothing?

Workrave has no support for Wayland. activity watch is a time tracker alternative, but is not a RSI watcher. KDE has rsiwatcher, but that's a bit too much on the heavy side for my taste.

SafeEyes looks like an alternative at first, but it has many issues under Wayland (escape doesn't work, idle doesn't work, it just doesn't work really). timekpr-next could be an alternative as well, and has support for Wayland.

I am also considering just abandoning workrave, even if I stick with Xorg, because it apparently introduces significant latency in the input pipeline.

And besides, I've developed a pretty unhealthy alert fatigue with Workrave. I have used the program for so long that my fingers know exactly where to click to dismiss those warnings very effectively. It makes my work just more irritating, and doesn't fix the fundamental problem I have with computers.

Other apps

This is a constantly changing list, of course. There's a bit of a "death by a thousand cuts" in migrating to Wayland because you realize how many things you were using are tightly bound to X.

  • .Xresources - just say goodbye to that old resource system, it was used, in my case, only for rofi, xterm, and ... Xboard!?

  • keyboard layout switcher: built-in to Sway since 2017 (PR 1505, 1.5rc2+), requires a small configuration change, see this answer as well, looks something like this command:

     swaymsg input 0:0:X11_keyboard xkb_layout de
    

    or using this config:

     input * {
         xkb_layout "ca,us"
         xkb_options "grp:sclk_toggle"
     }
    

    That works refreshingly well, even better than in Xorg, I must say.

    swaykbdd is an alternative that supports per-window layouts (in Debian).

  • wallpaper: currently using feh, will need a replacement, TODO: figure out something that does, like feh, a random shuffle. swaybg just loads a single image, duh. oguri might be a solution, but unmaintained, used here, not in Debian. wallutils is another option, also not in Debian. For now I just don't have a wallpaper, the background is a solid gray, which is better than Xorg's default (which is whatever crap was left around a buffer by the previous collection of programs, basically)

  • notifications: currently using dunst in some places, which works well in both Xorg and Wayland, not a blocker, salut a possible alternative (not in Debian), damjan uses mako. TODO: install dunst everywhere

  • notification area: I had trouble making nm-applet work. based on this nm-applet.service, I found that you need to pass --indicator. In theory, tray icon support was merged in 1.5, but in practice there are still several limitations, like icons not clickable. On startup, nm-applet --indicator triggers this error in the Sway logs:

     nov 11 22:34:12 angela sway[298938]: 00:49:42.325 [INFO] [swaybar/tray/host.c:24] Registering Status Notifier Item ':1.47/org/ayatana/NotificationItem/nm_applet'
     nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet IconPixmap: No such property “IconPixmap”
     nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet AttentionIconPixmap: No such property “AttentionIconPixmap”
     nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet ItemIsMenu: No such property “ItemIsMenu”
     nov 11 22:36:10 angela sway[313419]: info: fcft.c:838: /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf: size=24.00pt/32px, dpi=96.00
    

    ... but it seems innocuous. The tray icon displays but, as stated above, is not clickable. If you don't see the icon, check the bar.tray_output property in the Sway config, try: tray_output *. Note that there is currently (November 2022) a pull request to hook up a "Tray D-Bus Menu" which, according to Reddit might fix this, or at least be somewhat relevant.

    This was the biggest irritant in my migration. I have used nmtui to connect to new Wifi hotspots or change connection settings, but that doesn't support actions like "turn off WiFi".

    I eventually fixed this by switching from py3status to waybar.

  • window switcher: in i3 I was using this bespoke i3-focus script, which doesn't work under Sway, swayr an option, not in Debian. So I put together this other bespoke hack from multiple sources, which works.

  • PDF viewer: currently using atril (which supports Wayland), could also just switch to zatura/mupdf permanently, see also calibre for a discussion on document viewers

See also this list of useful addons and this other list for other app alternatives.

More X11 / Wayland equivalents

For all the tools above, it's not exactly clear what options exist in Wayland, or when they do, which one should be used. But for some basic tools, it seems the options are actually quite clear. If that's the case, they should be listed here:

X11 Wayland In Debian
arandr wdisplays yes
autorandr kanshi yes
xdotool wtype yes
xev wev yes
xlsclients swaymsg -t get_tree yes
xrandr wlr-randr yes

lswt is a more direct replacement for xlsclients but is not packaged in Debian.

See also:

Note that arandr and autorandr are not directly part of X. arewewaylandyet.com refers to a few alternatives. We suggest wdisplays and kanshi above (see also this service file) but wallutils can also do the autorandr stuff, apparently, and nwg-displays can do the arandr part. Neither are packaged in Debian yet.

So I have tried wdisplays and it Just Works, and well. The UI even looks better and more usable than arandr, so another clean win from Wayland here.

TODO: test kanshi as a autorandr replacement

Other issues

systemd integration

I've had trouble getting session startup to work. This is partly because I had a kind of funky system to start my session in the first place. I used to have my whole session started from .xsession like this:

#!/bin/sh

. ~/.shenv

systemctl --user import-environment

exec systemctl --user start --wait xsession.target

But obviously, the xsession.target is not started by the Sway session. It seems to just start a default.target, which is really not what we want because we want to associate the services directly with the graphical-session.target, so that they don't start when logging in over (say) SSH.

damjan on #debian-systemd showed me his sway-setup which features systemd integration. It involves starting a different session in a completely new .desktop file. That work was submitted upstream but refused on the grounds that "I'd rather not give a preference to any particular init system." Another PR was abandoned because "restarting sway does not makes sense: that kills everything".

The work was therefore moved to the wiki.

So. Not a great situation. The upstream wiki systemd integration suggests starting the systemd target from within Sway, which has all sorts of problems:

  • you don't get Sway logs anywhere
  • control groups are all messed up

I have done a lot of work trying to figure this out, but I remember that starting systemd from Sway didn't actually work for me: my previously configured systemd units didn't correctly start, and especially not with the right $PATH and environment.

So I went down that rabbit hole and managed to correctly configure Sway to be started from the systemd --user session. I have partly followed the wiki but also picked ideas from damjan's sway-setup and xdbob's sway-services. Another option is uwsm (not in Debian).

This is the config I have in .config/systemd/user/:

I have also configured those services, but that's somewhat optional:

You will also need at least part of my sway/config, which sends the systemd notification (because, no, Sway doesn't support any sort of readiness notification, that would be too easy). And you might like to see my swayidle-config while you're there.

Finally, you need to hook this up somehow to the login manager. This is typically done with a desktop file, so drop sway-session.desktop in /usr/share/wayland-sessions and sway-user-service somewhere in your $PATH (typically /usr/bin/sway-user-service).

The session then looks something like this:

$ systemd-cgls | head -101
Control group /:
-.slice
├─user.slice (#472)
│ → user.invocation_id: bc405c6341de4e93a545bde6d7abbeec
│ → trusted.invocation_id: bc405c6341de4e93a545bde6d7abbeec
│ └─user-1000.slice (#10072)
│   → user.invocation_id: 08f40f5c4bcd4fd6adfd27bec24e4827
│   → trusted.invocation_id: 08f40f5c4bcd4fd6adfd27bec24e4827
│   ├─user@1000.service … (#10156)
│   │ → user.delegate: 1
│   │ → trusted.delegate: 1
│   │ → user.invocation_id: 76bed72a1ffb41dca9bfda7bb174ef6b
│   │ → trusted.invocation_id: 76bed72a1ffb41dca9bfda7bb174ef6b
│   │ ├─session.slice (#10282)
│   │ │ ├─xdg-document-portal.service (#12248)
│   │ │ │ ├─9533 /usr/libexec/xdg-document-portal
│   │ │ │ └─9542 fusermount3 -o rw,nosuid,nodev,fsname=portal,auto_unmount,subt…
│   │ │ ├─xdg-desktop-portal.service (#12211)
│   │ │ │ └─9529 /usr/libexec/xdg-desktop-portal
│   │ │ ├─pipewire-pulse.service (#10778)
│   │ │ │ └─6002 /usr/bin/pipewire-pulse
│   │ │ ├─wireplumber.service (#10519)
│   │ │ │ └─5944 /usr/bin/wireplumber
│   │ │ ├─gvfs-daemon.service (#10667)
│   │ │ │ └─5960 /usr/libexec/gvfsd
│   │ │ ├─gvfs-udisks2-volume-monitor.service (#10852)
│   │ │ │ └─6021 /usr/libexec/gvfs-udisks2-volume-monitor
│   │ │ ├─at-spi-dbus-bus.service (#11481)
│   │ │ │ ├─6210 /usr/libexec/at-spi-bus-launcher
│   │ │ │ ├─6216 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2…
│   │ │ │ └─6450 /usr/libexec/at-spi2-registryd --use-gnome-session
│   │ │ ├─pipewire.service (#10403)
│   │ │ │ └─5940 /usr/bin/pipewire
│   │ │ └─dbus.service (#10593)
│   │ │   └─5946 /usr/bin/dbus-daemon --session --address=systemd: --nofork --n…
│   │ ├─background.slice (#10324)
│   │ │ └─tracker-miner-fs-3.service (#10741)
│   │ │   └─6001 /usr/libexec/tracker-miner-fs-3
│   │ ├─app.slice (#10240)
│   │ │ ├─xdg-permission-store.service (#12285)
│   │ │ │ └─9536 /usr/libexec/xdg-permission-store
│   │ │ ├─gammastep.service (#11370)
│   │ │ │ └─6197 gammastep
│   │ │ ├─dunst.service (#11958)
│   │ │ │ └─7460 /usr/bin/dunst
│   │ │ ├─wterminal.service (#13980)
│   │ │ │ ├─69100 foot --title pop-up
│   │ │ │ ├─69101 /bin/bash
│   │ │ │ ├─77660 sudo systemd-cgls
│   │ │ │ ├─77661 head -101
│   │ │ │ ├─77662 wl-copy
│   │ │ │ ├─77663 sudo systemd-cgls
│   │ │ │ └─77664 systemd-cgls
│   │ │ ├─syncthing.service (#11995)
│   │ │ │ ├─7529 /usr/bin/syncthing -no-browser -no-restart -logflags=0 --verbo…
│   │ │ │ └─7537 /usr/bin/syncthing -no-browser -no-restart -logflags=0 --verbo…
│   │ │ ├─dconf.service (#10704)
│   │ │ │ └─5967 /usr/libexec/dconf-service
│   │ │ ├─gnome-keyring-daemon.service (#10630)
│   │ │ │ └─5951 /usr/bin/gnome-keyring-daemon --foreground --components=pkcs11…
│   │ │ ├─gcr-ssh-agent.service (#10963)
│   │ │ │ └─6035 /usr/libexec/gcr-ssh-agent /run/user/1000/gcr
│   │ │ ├─swayidle.service (#11444)
│   │ │ │ └─6199 /usr/bin/swayidle -w
│   │ │ ├─nm-applet.service (#11407)
│   │ │ │ └─6198 /usr/bin/nm-applet --indicator
│   │ │ ├─wcolortaillog.service (#11518)
│   │ │ │ ├─6226 foot colortaillog
│   │ │ │ ├─6228 /bin/sh /home/anarcat/bin/colortaillog
│   │ │ │ ├─6230 sudo journalctl -f
│   │ │ │ ├─6233 ccze -m ansi
│   │ │ │ ├─6235 sudo journalctl -f
│   │ │ │ └─6236 journalctl -f
│   │ │ ├─afuse.service (#10889)
│   │ │ │ └─6051 /usr/bin/afuse -o mount_template=sshfs -o transform_symlinks -…
│   │ │ ├─gpg-agent.service (#13547)
│   │ │ │ ├─51662 /usr/bin/gpg-agent --supervised
│   │ │ │ └─51719 scdaemon --multi-server
│   │ │ ├─emacs.service (#10926)
│   │ │ │ ├─ 6034 /usr/bin/emacs --fg-daemon
│   │ │ │ └─33203 /usr/bin/aspell -a -m -d en --encoding=utf-8
│   │ │ ├─xdg-desktop-portal-gtk.service (#12322)
│   │ │ │ └─9546 /usr/libexec/xdg-desktop-portal-gtk
│   │ │ ├─xdg-desktop-portal-wlr.service (#12359)
│   │ │ │ └─9555 /usr/libexec/xdg-desktop-portal-wlr
│   │ │ └─sway.service (#11037)
│   │ │   ├─6037 /usr/bin/sway
│   │ │   ├─6181 swaybar -b bar-0
│   │ │   ├─6209 py3status
│   │ │   ├─6309 /usr/bin/i3status -c /tmp/py3status_oy4ntfnq
│   │ │   └─6969 Xwayland :0 -rootless -terminate -core -listen 29 -listen 30 -…
│   │ └─init.scope (#10198)
│   │   ├─5909 /lib/systemd/systemd --user
│   │   └─5911 (sd-pam)
│   └─session-7.scope (#10440)
│     ├─5895 gdm-session-worker [pam/gdm-password]
│     ├─6028 /usr/libexec/gdm-wayland-session --register-session sway-user-serv…
[...]

I think that's pretty neat.

Environment propagation

At first, my terminals and rofi didn't have the right $PATH, which broke a lot of my workflow. It's hard to tell exactly how Wayland gets started or where to inject environment. This discussion suggests a few alternatives and this Debian bug report discusses this issue as well.

I eventually picked environment.d(5) since I already manage my user session with systemd, and it fixes a bunch of other problems. I used to have a .shenv that I had to manually source everywhere. The only problem with that approach is that it doesn't support conditionals, but that's something that's rarely needed.

Pipewire

This is a whole topic onto itself, but migrating to Wayland also involves using Pipewire if you want screen sharing to work. You can actually keep using Pulseaudio for audio, that said, but that migration is actually something I've wanted to do anyways: Pipewire's design seems much better than Pulseaudio, as it folds in JACK features which allows for pretty neat tricks. (Which I should probably show in a separate post, because this one is getting rather long.)

I first tried this migration in Debian bullseye, and it didn't work very well. Ardour would fail to export tracks and I would get into weird situations where streams would just drop mid-way.

A particularly funny incident is when I was in a meeting and I couldn't hear my colleagues speak anymore (but they could) and I went on blabbering on my own for a solid 5 minutes until I realized what was going on. By then, people had tried numerous ways of letting me know that something was off, including (apparently) coughing, saying "hello?", chat messages, IRC, and so on, until they just gave up and left.

I suspect that was also a Pipewire bug, but it could also have been that I muted the tab by error, as I recently learned that clicking on the little tiny speaker icon on a tab mutes that tab. Since the tab itself can get pretty small when you have lots of them, it's actually quite frequently that I mistakenly mute tabs.

Anyways. Point is: I already knew how to make the migration, and I had already documented how to make the change in Puppet. It's basically:

apt install pipewire pipewire-audio-client-libraries pipewire-pulse wireplumber 

Then, as a regular user:

systemctl --user daemon-reload
systemctl --user --now disable pulseaudio.service pulseaudio.socket
systemctl --user --now enable pipewire pipewire-pulse
systemctl --user mask pulseaudio

An optional (but key, IMHO) configuration you should also make is to "switch on connect", which will make your Bluetooth or USB headset automatically be the default route for audio, when connected. In ~/.config/pipewire/pipewire-pulse.conf.d/autoconnect.conf:

context.exec = [
    { path = "pactl"        args = "load-module module-always-sink" }
    { path = "pactl"        args = "load-module module-switch-on-connect" }
    #{ path = "/usr/bin/sh"  args = "~/.config/pipewire/default.pw" }
]

See the excellent — as usual — Arch wiki page about Pipewire for that trick and more information about Pipewire. Note that you must not put the file in ~/.config/pipewire/pipewire.conf (or pipewire-pulse.conf, maybe) directly, as that will break your setup. If you want to add to that file, first copy the template from /usr/share/pipewire/pipewire-pulse.conf first.

So far I'm happy with Pipewire in bookworm, but I've heard mixed reports from it. I have high hopes it will become the standard media server for Linux in the coming months or years, which is great because I've been (rather boldly, I admit) on the record saying I don't like PulseAudio.

Rereading this now, I feel it might have been a little unfair, as "over-engineered and tries to do too many things at once" applies probably even more to Pipewire than PulseAudio (since it also handles video dispatching).

That said, I think Pipewire took the right approach by implementing existing interfaces like Pulseaudio and JACK. That way we're not adding a third (or fourth?) way of doing audio in Linux; we're just making the server better.

Improvements over i3

Tiling improvements

There's a lot of improvements Sway could bring over using plain i3. There are pretty neat auto-tilers that could replicate the configurations I used to have in Xmonad or Awesome, see:

Display latency tweaks

TODO: You can tweak the display latency in wlroots compositors with the max_render_time parameter, possibly getting lower latency than X11 in the end.

Sound/brightness changes notifications

TODO: Avizo can display a pop-up to give feedback on volume and brightness changes. Not in Debian. Other alternatives include SwayOSD and sway-nc, also not in Debian.

Debugging tricks

The xeyes (in the x11-apps package) will run in Wayland, and can actually be used to easily see if a given window is also in Wayland. If the "eyes" follow the cursor, the app is actually running in xwayland, so not natively in Wayland.

Another way to see what is using Wayland in Sway is with the command:

swaymsg -t get_tree

Other documentation

Conclusion

In general, this took me a long time, but it mostly works. The tray icon situation is pretty frustrating, but there's a workaround and I have high hopes it will eventually fix itself. I'm also actually worried about the DisplayLink support because I eventually want to be using this, but hopefully that's another thing that will hopefully fix itself before I need it.

A word on the security model

I'm kind of worried about all the hacks that have been added to Wayland just to make things work. Pretty much everywhere we need to, we punched a hole in the security model:

Wikipedia describes the security properties of Wayland as it "isolates the input and output of every window, achieving confidentiality, integrity and availability for both." I'm not sure those are actually realized in the actual implementation, because of all those holes punched in the design, at least in Sway. For example, apparently the GNOME compositor doesn't have the virtual-keyboard protocol, but they do have (another?!) text input protocol.

Wayland does offer a better basis to implement such a system, however. It feels like the Linux applications security model lacks critical decision points in the UI, like the user approving "yes, this application can share my screen now". Applications themselves might have some of those prompts, but it's not mandatory, and that is worrisome.

...
@anarcat November 16, 2022 - 17:40 • 11 days ago
A ZFS migration

In my tubman setup, I started using ZFS on an old server I had lying around. The machine is really old though (2011!) and it "feels" pretty slow. I want to see how much of that is ZFS and how much is the machine. Synthetic benchmarks show that ZFS may be slower than mdadm in RAID-10 or RAID-6 configuration, so I want to confirm that on a live workload: my workstation. Plus, I want easy, regular, high performance backups (with send/receive snapshots) and there's no way I'm going to use BTRFS because I find it too confusing and unreliable.

So off we go.

Installation

Since this is a conversion (and not a new install), our procedure is slightly different than the official documentation but otherwise it's pretty much in the same spirit: we're going to use ZFS for everything, including the root filesystem.

So, install the required packages, on the current system:

apt install --yes gdisk zfs-dkms zfs zfs-initramfs zfsutils-linux

We also tell DKMS that we need to rebuild the initrd when upgrading:

echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf

Partitioning

This is going to partition /dev/sdc with:

  • 1MB MBR / BIOS legacy boot
  • 512MB EFI boot
  • 1GB bpool, unencrypted pool for /boot
  • rest of the disk for zpool, the rest of the data

     sgdisk --zap-all /dev/sdc
     sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/sdc
     sgdisk     -n2:1M:+512M   -t2:EF00 /dev/sdc
     sgdisk     -n3:0:+1G      -t3:BF01 /dev/sdc
     sgdisk     -n4:0:0        -t4:BF00 /dev/sdc
    

That will look something like this:

    root@curie:/home/anarcat# sgdisk -p /dev/sdc
    Disk /dev/sdc: 1953525168 sectors, 931.5 GiB
    Model: ESD-S1C         
    Sector size (logical/physical): 512/512 bytes
    Disk identifier (GUID): [REDACTED]
    Partition table holds up to 128 entries
    Main partition table begins at sector 2 and ends at sector 33
    First usable sector is 34, last usable sector is 1953525134
    Partitions will be aligned on 16-sector boundaries
    Total free space is 14 sectors (7.0 KiB)

    Number  Start (sector)    End (sector)  Size       Code  Name
       1              48            2047   1000.0 KiB  EF02  
       2            2048         1050623   512.0 MiB   EF00  
       3         1050624         3147775   1024.0 MiB  BF01  
       4         3147776      1953525134   930.0 GiB   BF00

Unfortunately, we can't be sure of the sector size here, because the USB controller is probably lying to us about it. Normally, this smartctl command should tell us the sector size as well:

root@curie:~# smartctl -i /dev/sdb -qnoserial
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-14-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Black Mobile
Device Model:     WDC WD10JPLX-00MBPT0
Firmware Version: 01.01H01
User Capacity:    1 000 204 886 016 bytes [1,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue May 17 13:33:04 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Above is the example of the builtin HDD drive. But the SSD device enclosed in that USB controller doesn't support SMART commands, so we can't trust that it really has 512 bytes sectors.

This matters because we need to tweak the ashift value correctly. We're going to go ahead the SSD drive has the common 4KB settings, which means ashift=12.

Note here that we are not creating a separate partition for swap. Swap on ZFS volumes (AKA "swap on ZVOL") can trigger lockups and that issue is still not fixed upstream. Ubuntu recommends using a separate partition for swap instead. But since this is "just" a workstation, we're betting that we will not suffer from this problem, after hearing a report from another Debian developer running this setup on their workstation successfully.

We do not recommend this setup though. In fact, if I were to redo this partition scheme, I would probably use LUKS encryption and setup a dedicated swap partition, as I had problems with ZFS encryption as well.

Creating pools

ZFS pools are somewhat like "volume groups" if you are familiar with LVM, except they obviously also do things like RAID-10. (Even though LVM can technically also do RAID, people typically use mdadm instead.)

In any case, the guide suggests creating two different pools here: one, in cleartext, for boot, and a separate, encrypted one, for the rest. Technically, the boot partition is required because the Grub bootloader only supports readonly ZFS pools, from what I understand. But I'm a little out of my depth here and just following the guide.

Boot pool creation

This creates the boot pool in readonly mode with features that grub supports:

    zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O canmount=off \
        -O compression=lz4 \
        -O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
        -O mountpoint=/boot -R /mnt \
        bpool /dev/sdc3

I haven't investigated all those settings and just trust the upstream guide on the above.

Main pool creation

This is a more typical pool creation.

    zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool /dev/sdc4

Breaking this down:

  • -o ashift=12: mentioned above, 4k sector size
  • -O encryption=on -O keylocation=prompt -O keyformat=passphrase: encryption, prompt for a password, default algorithm is aes-256-gcm, explicit in the guide, made implicit here
  • -O acltype=posixacl -O xattr=sa: enable ACLs, with better performance (not enabled by default)
  • -O dnodesize=auto: related to extended attributes, less compatibility with other implementations
  • -O compression=zstd: enable zstd compression, can be disabled/enabled by dataset to with zfs set compression=off rpool/example
  • -O relatime=on: classic atime optimisation, another that could be used on a busy server is atime=off
  • -O canmount=off: do not make the pool mount automatically with mount -a?
  • -O mountpoint=/ -R /mnt: mount pool on / in the future, but /mnt for now

Those settings are all available in zfsprops(8). Other flags are defined in zpool-create(8). The reasoning behind them is also explained in the upstream guide and some also in [the Debian wiki][]. Those flags were actually not used:

  • -O normalization=formD: normalize file names on comparisons (not storage), implies utf8only=on, which is a bad idea (and effectively meant my first sync failed to copy some files, including this folder from a supysonic checkout). and this cannot be changed after the filesystem is created. bad, bad, bad.

[the Debian wiki]: https://wiki.debian.org/ZFS#Advanced_Topics

Side note about single-disk pools

Also note that we're living dangerously here: single-disk ZFS pools are rumoured to be more dangerous than not running ZFS at all. The choice quote from this article is:

[...] any error can be detected, but cannot be corrected. This sounds like an acceptable compromise, but its actually not. The reason its not is that ZFS' metadata cannot be allowed to be corrupted. If it is it is likely the zpool will be impossible to mount (and will probably crash the system once the corruption is found). So a couple of bad sectors in the right place will mean that all data on the zpool will be lost. Not some, all. Also there's no ZFS recovery tools, so you cannot recover any data on the drives.

Compared with (say) ext4, where a single disk error can recovered, this is pretty bad. But we are ready to live with this with the idea that we'll have hourly offline snapshots that we can easily recover from. It's trade-off. Also, we're running this on a NVMe/M.2 drive which typically just blinks out of existence completely, and doesn't "bit rot" the way a HDD would.

Also, the FreeBSD handbook quick start doesn't have any warnings about their first example, which is with a single disk. So I am reassured at least.

Creating mount points

Next we create the actual filesystems, known as "datasets" which are the things that get mounted on mountpoint and hold the actual files.

  • this creates two containers, for ROOT and BOOT

     zfs create -o canmount=off -o mountpoint=none rpool/ROOT &&
     zfs create -o canmount=off -o mountpoint=none bpool/BOOT
    

    Note that it's unclear to me why those datasets are necessary, but they seem common practice, also used in this FreeBSD example. The OpenZFS guide mentions the Solaris upgrades and Ubuntu's zsys that use that container for upgrades and rollbacks. This blog post seems to explain a bit the layout behind the installer.

  • this creates the actual boot and root filesystems:

     zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian &&
     zfs mount rpool/ROOT/debian &&
     zfs create -o mountpoint=/boot bpool/BOOT/debian
    

    I guess the debian name here is because we could technically have multiple operating systems with the same underlying datasets.

  • then the main datasets:

     zfs create                                 rpool/home &&
     zfs create -o mountpoint=/root             rpool/home/root &&
     chmod 700 /mnt/root &&
     zfs create                                 rpool/var
    
  • exclude temporary files from snapshots:

     zfs create -o com.sun:auto-snapshot=false  rpool/var/cache &&
     zfs create -o com.sun:auto-snapshot=false  rpool/var/tmp &&
     chmod 1777 /mnt/var/tmp
    
  • and skip automatic snapshots in Docker:

     zfs create -o canmount=off                 rpool/var/lib &&
     zfs create -o com.sun:auto-snapshot=false  rpool/var/lib/docker
    

    Notice here a peculiarity: we must create rpool/var/lib to create rpool/var/lib/docker otherwise we get this error:

     cannot create 'rpool/var/lib/docker': parent does not exist
    

    ... and no, just creating /mnt/var/lib doesn't fix that problem. In fact, it makes things even more confusing because an existing directory shadows a mountpoint, which is the opposite of how things normally work.

    Also note that you will probably need to change storage driver in Docker, see the zfs-driver documentation for details but, basically, I did:

    echo '{ "storage-driver": "zfs" }' > /etc/docker/daemon.json
    

    Note that podman has the same problem (and similar solution):

    printf '[storage]\ndriver = "zfs"\n' > /etc/containers/storage.conf
    
  • make a tmpfs for /run:

     mkdir /mnt/run &&
     mount -t tmpfs tmpfs /mnt/run &&
     mkdir /mnt/run/lock
    

We don't create a /srv, as that's the HDD stuff.

Also mount the EFI partition:

mkfs.fat -F 32 /dev/sdc2 &&
mount /dev/sdc2 /mnt/boot/efi/

At this point, everything should be mounted in /mnt. It should look like this:

root@curie:~# LANG=C df -h -t zfs -t vfat
Filesystem            Size  Used Avail Use% Mounted on
rpool/ROOT/debian     899G  384K  899G   1% /mnt
bpool/BOOT/debian     832M  123M  709M  15% /mnt/boot
rpool/home            899G  256K  899G   1% /mnt/home
rpool/home/root       899G  256K  899G   1% /mnt/root
rpool/var             899G  384K  899G   1% /mnt/var
rpool/var/cache       899G  256K  899G   1% /mnt/var/cache
rpool/var/tmp         899G  256K  899G   1% /mnt/var/tmp
rpool/var/lib/docker  899G  256K  899G   1% /mnt/var/lib/docker
/dev/sdc2             511M  4.0K  511M   1% /mnt/boot/efi

Now that we have everything setup and mounted, let's copy all files over.

Copying files

This is a list of all the mounted filesystems

for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done

You can check that the list is correct with:

mount -l -t ext4,btrfs,vfat | awk '{print $3}'

Note that we skip /srv as it's on a different disk.

On the first run, we had:

root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
     16,831,437 100%  184.14MB/s    0:00:00 (xfr#101, to-chk=0/110)
syncing / to /mnt/...
 28,019,293,280  94%   47.63MB/s    0:09:21 (xfr#703710, ir-chk=6748/839220)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,267,990  98%   50.71MB/s    0:10:40 (xfr#736577, to-chk=0/867732)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
 24,456,268,098  98%   68.03MB/s    0:05:42 (xfr#159867, ir-chk=6875/172377) 
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/B3AB0CDA9C4454B3C1197E5A22669DF8EE849D90"
199,762,528,125  93%   74.82MB/s    0:42:26 (xfr#1437846, ir-chk=1018/1983979)rsync: [generator] recv_generator: mkdir "/mnt/home/anarcat/dist/supysonic/tests/assets/\#346" failed: Invalid or incomplete multibyte or wide character (84)
*** Skipping any contents from this failed directory ***
315,384,723,978  96%   76.82MB/s    1:05:15 (xfr#2256473, to-chk=0/2993950)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]

Note the failure to transfer that supysonic file? It turns out they had a weird filename in their source tree, since then removed, but still it showed how the utf8only feature might not be such a bad idea. At this point, the procedure was restarted all the way back to "Creating pools", after unmounting all ZFS filesystems (umount /mnt/run /mnt/boot/efi && umount -t zfs -a) and destroying the pool, which, surprisingly, doesn't require any confirmation (zpool destroy rpool).

The second run was cleaner:

root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/110)  
syncing / to /mnt/...
 28,019,033,070  97%   42.03MB/s    0:10:35 (xfr#703671, ir-chk=1093/833515)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,807,102  98%   44.84MB/s    0:12:04 (xfr#736580, to-chk=0/867723)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
IO error encountered -- skipping file deletion
 24,043,086,450  96%   62.03MB/s    0:06:09 (xfr#151819, ir-chk=15117/172571)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/4C1FDBFEA976FF924D062FB990B24B897A77B84B"
315,423,626,507  96%   67.09MB/s    1:14:43 (xfr#2256845, to-chk=0/2994364)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]

Also note the transfer speed: we seem capped at 76MB/s, or 608Mbit/s. This is not as fast as I was expecting: the USB connection seems to be at around 5Gbps:

anarcat@curie:~$ lsusb -tv | head -4
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M
        ID 0b05:1932 ASUSTek Computer, Inc.

So it shouldn't cap at that speed. It's possible the USB adapter is failing to give me the full speed though. It's not the M.2 SSD drive either, as that has a ~500MB/s bandwidth, acccording to its spec.

At this point, we're about ready to do the final configuration. We drop to single user mode and do the rest of the procedure. That used to be shutdown now, but it seems like the systemd switch broke that, so now you can reboot into grub and pick the "recovery" option. Alternatively, you might try systemctl rescue, as I found out.

I also wanted to copy the drive over to another new NVMe drive, but that failed: it looks like the USB controller I have doesn't work with older, non-NVME drives.

Boot configuration

Now we need to enter the new system to rebuild the boot loader and initrd and so on.

First, we bind mounts and chroot into the ZFS disk:

mount --rbind /dev  /mnt/dev &&
mount --rbind /proc /mnt/proc &&
mount --rbind /sys  /mnt/sys &&
chroot /mnt /bin/bash

Next we add an extra service that imports the bpool on boot, to make sure it survives a zpool.cache destruction:

cat > /etc/systemd/system/zfs-import-bpool.service <<EOF
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache

[Install]
WantedBy=zfs-import.target
EOF

Enable the service:

systemctl enable zfs-import-bpool.service

I had to trim down /etc/fstab and /etc/crypttab to only contain references to the legacy filesystems (/srv is still BTRFS!).

If we don't already have a tmpfs defined in /etc/fstab:

ln -s /usr/share/systemd/tmp.mount /etc/systemd/system/ &&
systemctl enable tmp.mount

Rebuild boot loader with support for ZFS, but also to workaround GRUB's missing zpool-features support:

grub-probe /boot | grep -q zfs &&
update-initramfs -c -k all &&
sed -i 's,GRUB_CMDLINE_LINUX.*,GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian",' /etc/default/grub &&
update-grub

For good measure, make sure the right disk is configured here, for example you might want to tag both drives in a RAID array:

dpkg-reconfigure grub-pc

Install grub to EFI while you're there:

grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian --recheck --no-floppy

Filesystem mount ordering. The rationale here in the OpenZFS guide is a little strange, but I don't dare ignore that.

mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool
touch /etc/zfs/zfs-list.cache/rpool
zed -F &

Verify that zed updated the cache by making sure these are not empty:

cat /etc/zfs/zfs-list.cache/bpool
cat /etc/zfs/zfs-list.cache/rpool

Once the files have data, stop zed:

fg
Press Ctrl-C.

Fix the paths to eliminate /mnt:

sed -Ei "s|/mnt/?|/|" /etc/zfs/zfs-list.cache/*

Snapshot initial install:

zfs snapshot bpool/BOOT/debian@install
zfs snapshot rpool/ROOT/debian@install

Exit chroot:

exit

Finalizing

One last sync was done in rescue mode:

for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done

Then we unmount all filesystems:

mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | xargs -i{} umount -lf {}
zpool export -a

Reboot, swap the drives, and boot in ZFS. Hurray!

Benchmarks

This is a test that was ran in single-user mode using fio and the Ars Technica recommended tests, which are:

  • Single 4KiB random write process:

     fio --name=randwrite4k1x --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
    
  • 16 parallel 64KiB random write processes:

     fio --name=randwrite64k16x --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1
    
  • Single 1MiB random write process:

     fio --name=randwrite1m1x --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
    

Strangely, that's not exactly what the author, Jim Salter, did in his actual test bench used in the ZFS benchmarking article. The first thing is there's no read test at all, which is already pretty strange. But also it doesn't include stuff like dropping caches or repeating results.

So here's my variation, which i called fio-ars-bench.sh for now. It just batches a bunch of fio tests, one by one, 60 seconds each. It should take about 12 minutes to run, as there are 3 pair of tests, read/write, with and without async.

My bias, before building, running and analysing those results is that ZFS should outperform the traditional stack on writes, but possibly not on reads. It's also possible it outperforms it on both, because it's a newer drive. A new test might be possible with a new external USB drive as well, although I doubt I will find the time to do this.

Results

All tests were done on WD blue SN550 drives, which claims to be able to push 2400MB/s read and 1750MB/s write. An extra drive was bought to move the LVM setup from a WDC WDS500G1B0B-00AS40 SSD, a WD blue M.2 2280 SSD that was at least 5 years old, spec'd at 560MB/s read, 530MB/s write. Benchmarks were done on the M.2 SSD drive but discarded so that the drive difference is not a factor in the test.

In practice, I'm going to assume we'll never reach those numbers because we're not actually NVMe (this is an old workstation!) so the bottleneck isn't the disk itself. For our purposes, it might still give us useful results.

Rescue test, LUKS/LVM/ext4

Those tests were performed with everything shutdown, after either entering the system in rescue mode, or by reaching that target with:

systemctl rescue

The network might have been started before or after the test as well:

systemctl start systemd-networkd

So it should be fairly reliable as basically nothing else is running.

Raw numbers, from the ?job-curie-lvm.log, converted to MiB/s and manually merged:

test read I/O read IOPS write I/O write IOPS
rand4k4g1x 39.27 10052 212.15 54310
rand4k4g1x--fsync=1 39.29 10057 2.73 699
rand64k256m16x 1297.00 20751 1068.57 17097
rand64k256m16x--fsync=1 1290.90 20654 353.82 5661
rand1m16g1x 315.15 315 563.77 563
rand1m16g1x--fsync=1 345.88 345 157.01 157

Peaks are at about 20k IOPS and ~1.3GiB/s read, 1GiB/s write in the 64KB blocks with 16 jobs.

Slowest is the random 4k block sync write at an abysmal 3MB/s and 700 IOPS The 1MB read/write tests have lower IOPS, but that is expected.

Rescue test, ZFS

This test was also performed in rescue mode.

Raw numbers, from the ?job-curie-zfs.log, converted to MiB/s and manually merged:

test read I/O read IOPS write I/O write IOPS
rand4k4g1x 77.20 19763 27.13 6944
rand4k4g1x--fsync=1 76.16 19495 6.53 1673
rand64k256m16x 1882.40 30118 70.58 1129
rand64k256m16x--fsync=1 1865.13 29842 71.98 1151
rand1m16g1x 921.62 921 102.21 102
rand1m16g1x--fsync=1 908.37 908 64.30 64

Peaks are at 1.8GiB/s read, also in the 64k job like above, but much faster. The write is, as expected, much slower at 70MiB/s (compared to 1GiB/s!), but it should be noted the sync write doesn't degrade performance compared to async writes (although it's still below the LVM 300MB/s).

Conclusions

Really, ZFS has trouble performing in all write conditions. The random 4k sync write test is the only place where ZFS outperforms LVM in writes, and barely (7MiB/s vs 3MiB/s). Everywhere else, writes are much slower, sometimes by an order of magnitude.

And before some ZFS zealot jumps in talking about the SLOG or some other cache that could be added to improved performance, I'll remind you that those numbers are on a bare bones NVMe drive, pretty much as fast storage as you can find on this machine. Adding another NVMe drive as a cache probably will not improve write performance here.

Still, those are very different results than the tests performed by Salter which shows ZFS beating traditional configurations in all categories but uncached 4k reads (not writes!). That said, those tests are very different from the tests I performed here, where I test writes on a single disk, not a RAID array, which might explain the discrepancy.

Also, note that neither LVM or ZFS manage to reach the 2400MB/s read and 1750MB/s write performance specification. ZFS does manage to reach 82% of the read performance (1973MB/s) and LVM 64% of the write performance (1120MB/s). LVM hits 57% of the read performance and ZFS hits barely 6% of the write performance.

Overall, I'm a bit disappointed in the ZFS write performance here, I must say. Maybe I need to tweak the record size or some other ZFS voodoo, but I'll note that I didn't have to do any such configuration on the other side to kick ZFS in the pants...

Real world experience

This section document not synthetic backups, but actual real world workloads, comparing before and after I switched my workstation to ZFS.

Docker performance

I had the feeling that running some git hook (which was firing a Docker container) was "slower" somehow. It seems that, at runtime, ZFS backends are significant slower than their overlayfs/ext4 equivalent:

May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:53 curie dockerd[1723]: time="2022-05-16T14:42:53.087219426-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 pid=151170
May 16 14:42:53 curie systemd[1]: Started libcontainer container af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.
May 16 14:42:54 curie systemd[1]: docker-af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.scope: Succeeded.
May 16 14:42:54 curie dockerd[1723]: time="2022-05-16T14:42:54.047297800-04:00" level=info msg="shim disconnected" id=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10
May 16 14:42:54 curie dockerd[998]: time="2022-05-16T14:42:54.051365015-04:00" level=info msg="ignoring event" container=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 16 14:42:54 curie systemd[2444]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[2444]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.

Translating this:

  • container setup: ~1 second
  • container runtime: ~1 second
  • container teardown: ~1 second
  • total runtime: 2-3 seconds

Obviously, those timestamps are not quite accurate enough to make precise measurements...

After I switched to ZFS:

mai 30 15:31:39 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:39 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:40 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:40 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:41 curie dockerd[3199]: time="2022-05-30T15:31:41.551403693-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 pid=141080 
mai 30 15:31:41 curie systemd[1]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[5287]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[1]: Started libcontainer container 42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142. 
mai 30 15:31:45 curie systemd[1]: docker-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.scope: Succeeded. 
mai 30 15:31:45 curie dockerd[3199]: time="2022-05-30T15:31:45.883019128-04:00" level=info msg="shim disconnected" id=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 
mai 30 15:31:45 curie dockerd[1726]: time="2022-05-30T15:31:45.883064491-04:00" level=info msg="ignoring event" container=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" 
mai 30 15:31:45 curie systemd[1]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.

That's double or triple the run time, from 2 seconds to 6 seconds. Most of the time is spent in run time, inside the container. Here's the breakdown:

  • container setup: ~2 seconds
  • container run: ~4 seconds
  • container teardown: ~1 second
  • total run time: about ~6-7 seconds

That's a two- to three-fold increase! Clearly something is going on here that I should tweak. It's possible that code path is less optimized in Docker. I also worry about podman, but apparently it also supports ZFS backends. Possibly it would perform better, but at this stage I wouldn't have a good comparison: maybe it would have performed better on non-ZFS as well...

Interactivity

While doing the offsite backups (below), the system became somewhat "sluggish". I felt everything was slow, and I estimate it introduced ~50ms latency in any input device.

Arguably, those are all USB and the external drive was connected through USB, but I suspect the ZFS drivers are not as well tuned with the scheduler as the regular filesystem drivers...

Recovery procedures

For test purposes, I unmounted all systems during the procedure:

umount /mnt/boot/efi /mnt/boot/run
umount -a -t zfs
zpool export -a

And disconnected the drive, to see how I would recover this system from another Linux system in case of a total motherboard failure.

To import an existing pool, plug the device, then import the pool with an alternate root, so it doesn't mount over your existing filesystems, then you mount the root filesystem and all the others:

zpool import -l -a -R /mnt &&
zfs mount rpool/ROOT/debian &&
zfs mount -a &&
mount /dev/sdc2 /mnt/boot/efi &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock

Offsite backup

Part of the goal of using ZFS is to simplify and harden backups. I wanted to experiment with shorter recovery times — specifically both point in time recovery objective and recovery time objective — and faster incremental backups.

This is, therefore, part of my backup services.

This section documents how an external NVMe enclosure was setup in a pool to mirror the datasets from my workstation.

The final setup should include syncoid copying datasets to the backup server regularly, but I haven't finished that configuration yet.

Partitioning

The above partitioning procedure used sgdisk, but I couldn't figure out how to do this with sgdisk, so this uses sfdisk to dump the partition from the first disk to an external, identical drive:

sfdisk -d /dev/nvme0n1 | sfdisk --no-reread /dev/sda --force

Pool creation

This is similar to the main pool creation, except we tweaked a few bits after changing the upstream procedure:

zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O xattr=sa \
        -O compression=lz4 \
        -O devices=off \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/boot -R /mnt \
        bpool-tubman /dev/sdb3

The change from the main boot pool are:

Main pool creation is:

zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool-tubman /dev/sdb4

First sync

I used syncoid to copy all pools over to the external device. syncoid is a thing that's part of the sanoid project which is specifically designed to sync snapshots between pool, typically over SSH links but it can also operate locally.

The sanoid command had a --readonly argument to simulate changes, but syncoid didn't so I tried to fix that with an upstream PR.

It seems it would be better to do this by hand, but this was much easier. The full first sync was:

root@curie:/home/anarcat# ./bin/syncoid -r  bpool bpool-tubman

CRITICAL ERROR: Target bpool-tubman exists but has no snapshots matching with bpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.

          NOTE: Target bpool-tubman dataset is < 64MB used - did you mistakenly run
                `zfs create bpool-tubman` on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.

INFO: Sending oldest full snapshot bpool/BOOT@test (~ 42 KB) to new target filesystem:
44.2KiB 0:00:00 [4.19MiB/s] [========================================================================================================================] 103%            
INFO: Updating new target filesystem with incremental bpool/BOOT@test ... syncoid_curie_2022-05-30:12:50:39 (~ 4 KB):
2.13KiB 0:00:00 [ 114KiB/s] [===============================================================>                                                         ] 53%            
INFO: Sending oldest full snapshot bpool/BOOT/debian@install (~ 126.0 MB) to new target filesystem:
 126MiB 0:00:00 [ 308MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental bpool/BOOT/debian@install ... syncoid_curie_2022-05-30:12:50:39 (~ 113.4 MB):
 113MiB 0:00:00 [ 315MiB/s] [=======================================================================================================================>] 100%

root@curie:/home/anarcat# ./bin/syncoid -r  rpool rpool-tubman

CRITICAL ERROR: Target rpool-tubman exists but has no snapshots matching with rpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.

          NOTE: Target rpool-tubman dataset is < 64MB used - did you mistakenly run
                `zfs create rpool-tubman` on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.

INFO: Sending oldest full snapshot rpool/ROOT@syncoid_curie_2022-05-30:12:50:51 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [2.44MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/ROOT/debian@install (~ 25.9 GB) to new target filesystem:
25.9GiB 0:03:33 [ 124MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental rpool/ROOT/debian@install ... syncoid_curie_2022-05-30:12:50:52 (~ 3.9 GB):
3.92GiB 0:00:33 [ 119MiB/s] [======================================================================================================================>  ] 99%            
INFO: Sending oldest full snapshot rpool/home@syncoid_curie_2022-05-30:12:55:04 (~ 276.8 GB) to new target filesystem:
 277GiB 0:27:13 [ 174MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/home/root@syncoid_curie_2022-05-30:13:22:19 (~ 2.2 GB) to new target filesystem:
2.22GiB 0:00:25 [90.2MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var@syncoid_curie_2022-05-30:13:22:47 (~ 5.6 GB) to new target filesystem:
5.56GiB 0:00:32 [ 176MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/cache@syncoid_curie_2022-05-30:13:23:22 (~ 627.3 MB) to new target filesystem:
 627MiB 0:00:03 [ 169MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib@syncoid_curie_2022-05-30:13:23:28 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [1.40MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/var/lib/docker@syncoid_curie_2022-05-30:13:23:28 (~ 442.6 MB) to new target filesystem:
 443MiB 0:00:04 [ 103MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 (~ 6.3 MB) to new target filesystem:
6.49MiB 0:00:00 [12.9MiB/s] [========================================================================================================================] 102%            
INFO: Updating new target filesystem with incremental rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 ... syncoid_curie_2022-05-30:13:23:34 (~ 4 KB):
1.52KiB 0:00:00 [27.6KiB/s] [============================================>                                                                            ] 38%            
INFO: Sending oldest full snapshot rpool/var/lib/flatpak@syncoid_curie_2022-05-30:13:23:36 (~ 2.0 GB) to new target filesystem:
2.00GiB 0:00:17 [ 115MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/tmp@syncoid_curie_2022-05-30:13:23:55 (~ 57.0 MB) to new target filesystem:
61.8MiB 0:00:01 [45.0MiB/s] [========================================================================================================================] 108%            
INFO: Clone is recreated on target rpool-tubman/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205 based on rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254
INFO: Sending oldest full snapshot rpool/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205@syncoid_curie_2022-05-30:13:23:58 (~ 218.6 MB) to new target filesystem:
 219MiB 0:00:01 [ 151MiB/s] [=======================================================================================================================>] 100%

Funny how the CRITICAL ERROR doesn't actually stop syncoid and it just carries on merrily doing when it's telling you it's "cowardly refusing to destroy your existing target"... Maybe that's because my pull request broke something though...

During the transfer, the computer was very sluggish: everything feels like it has ~30-50ms latency extra:

anarcat@curie:sanoid$ LANG=C top -b  -n 1 | head -20
top - 13:07:05 up 6 days,  4:01,  1 user,  load average: 16.13, 16.55, 11.83
Tasks: 606 total,   6 running, 598 sleeping,   0 stopped,   2 zombie
%Cpu(s): 18.8 us, 72.5 sy,  1.2 ni,  5.0 id,  1.2 wa,  0.0 hi,  1.2 si,  0.0 st
MiB Mem :  15898.4 total,   1387.6 free,  13170.0 used,   1340.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1319.8 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     70 root      20   0       0      0      0 S  83.3   0.0   6:12.67 kswapd0
4024878 root      20   0  282644  96432  10288 S  44.4   0.6   0:11.43 puppet
3896136 root      20   0   35328  16528     48 S  22.2   0.1   2:08.04 mbuffer
3896135 root      20   0   10328    776    168 R  16.7   0.0   1:22.93 zfs
3896138 root      20   0   10588    788    156 R  16.7   0.0   1:49.30 zfs
    350 root       0 -20       0      0      0 R  11.1   0.0   1:03.53 z_rd_int
    351 root       0 -20       0      0      0 S  11.1   0.0   1:04.15 z_rd_int
3896137 root      20   0    4384    352    244 R  11.1   0.0   0:44.73 pv
4034094 anarcat   30  10   20028  13960   2428 S  11.1   0.1   0:00.70 mbsync
4036539 anarcat   20   0    9604   3464   2408 R  11.1   0.0   0:00.04 top
    352 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    353 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    354 root       0 -20       0      0      0 S   5.6   0.0   1:04.01 z_rd_int

I wonder how much of that is due to syncoid, particularly because I often saw mbuffer and pv in there which are not strictly necessary to do those kind of operations, as far as I understand.

Once that's done, export the pools to disconnect the drive:

zpool export bpool-tubman
zpool export rpool-tubman

Raw disk benchmark

Copied the 512GB SSD/M.2 device to another 1024GB NVMe/M.2 device:

anarcat@curie:~$ sudo dd if=/dev/sdb of=/dev/sdc bs=4M status=progress conv=fdatasync
499944259584 octets (500 GB, 466 GiB) copiés, 1713 s, 292 MB/s
119235+1 enregistrements lus
119235+1 enregistrements écrits
500107862016 octets (500 GB, 466 GiB) copiés, 1719,93 s, 291 MB/s

... while both over USB, whoohoo 300MB/s!

Monitoring

ZFS should be monitoring your pools regularly. Normally, the [[!debman zed]] daemon monitors all ZFS events. It is the thing that will report when a scrub failed, for example. See this configuration guide.

Scrubs should be regularly scheduled to ensure consistency of the pool. This can be done in newer zfsutils-linux versions (bullseye-backports or bookworm) with one of those, depending on the desired frequency:

systemctl enable zfs-scrub-weekly@rpool.timer --now

systemctl enable zfs-scrub-monthly@rpool.timer --now

When the scrub runs, if it finds anything it will send an event which will get picked up by the zed daemon which will then send a notification, see below for an example.

TODO: deploy on curie, if possible (probably not because no RAID) TODO: this should be in Puppet

Scrub warning example

So what happens when problems are found? Here's an example of how I dealt with an error I received.

After setting up another server (tubman) with ZFS, I eventually ended up getting a warning from the ZFS toolchain.

Date: Sun, 09 Oct 2022 00:58:08 -0400
From: root <root@anarc.at>
To: root@anarc.at
Subject: ZFS scrub_finish event for rpool on tubman

ZFS has finished a scrub:

   eid: 39536
 class: scrub_finish
  host: tubman
  time: 2022-10-09 00:58:07-0400
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:33:57 with 0 errors on Sun Oct  9 00:58:07 2022
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb4    ONLINE       0     1     0
            sdc4    ONLINE       0     0     0
        cache
          sda3      ONLINE       0     0     0

errors: No known data errors

This, in itself, is a little worrisome. But it helpfully links to this more detailed documentation (and props up there: the link still works) which explains this is a "minor" problem (something that could be included in the report).

In this case, this happened on a server setup on 2021-04-28, but the disks and server hardware are much older. The server itself (marcos v1) was built around 2011, over 10 years ago now. The hard drive in question is:

root@tubman:~# smartctl -i -qnoserial /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:02:32 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Some more SMART stats:

root@tubman:~# smartctl -a -qnoserial /dev/sdb | grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       12464 (206 202 0)
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       10966h+55m+23.757s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       21107792664
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3201579750

That's over a year of power on, which shouldn't be so bad. It has written about 10TB of data (21107792664 LBAs * 512 byte/LBA), which is about two full writes. According to its specification, this device is supposed to support 55 TB/year of writes, so we're far below spec. Note that are still far from the "non-recoverable read error per bits" spec (1 per 10E15), as we've basically read 13E12 bits (3201579750 LBAs * 512 byte/LBA = 13E12 bits).

It's likely this disk was made in 2018, so it is in its fourth year.

Interestingly, /dev/sdc is also a Seagate drive, but of a different series:

root@tubman:~# smartctl -qnoserial  -i /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:21:35 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

It has seen much more reads than the other disk which is also interesting:

root@tubman:~# smartctl -a -qnoserial /dev/sdc | grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   059   059   000    Old_age   Always       -       36240
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       33994h+10m+52.118s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       30730174438
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       51894566538

That's 4 years of Head_Flying_Hours, and over 4 years (4 years and 48 days) of Power_On_Hours. The copyright date on that drive's specs goes back to 2016, so it's a much older drive.

SMART self-test succeeded.

Remaining issues

  • TODO: move send/receive backups to offsite host, see also zfs for alternatives to syncoid/sanoid there
  • TODO: setup backup cron job (or timer?)
  • TODO: swap still not setup on curie, see zfs
  • TODO: document this somewhere: bpool and rpool are both pools and datasets. that's pretty confusing, but also very useful because it allows for pool-wide recursive snapshots, which are used for the backup system

fio improvements

I really want to improve my experience with fio. Right now, I'm just cargo-culting stuff from other folks and I don't really like it. stressant is a good example of my struggles, in the sense that it doesn't really work that well for disk tests.

I would love to have just a single .fio job file that lists multiple jobs to run serially. For example, this file describes the above workload pretty well:

[global]
# cargo-culting Salter
fallocate=none
ioengine=posixaio
runtime=60
time_based=1
end_fsync=1
stonewall=1
group_reporting=1
# no need to drop caches, done by default
# invalidate=1

# Single 4KiB random read/write process
[randread-4k-4g-1x]
rw=randread
bs=4k
size=4g
numjobs=1
iodepth=1

[randwrite-4k-4g-1x]
rw=randwrite
bs=4k
size=4g
numjobs=1
iodepth=1

# 16 parallel 64KiB random read/write processes:
[randread-64k-256m-16x]
rw=randread
bs=64k
size=256m
numjobs=16
iodepth=16

[randwrite-64k-256m-16x]
rw=randwrite
bs=64k
size=256m
numjobs=16
iodepth=16

# Single 1MiB random read/write process
[randread-1m-16g-1x]
rw=randread
bs=1m
size=16g
numjobs=1
iodepth=1

[randwrite-1m-16g-1x]
rw=randwrite
bs=1m
size=16g
numjobs=1
iodepth=1

... except the jobs are actually started in parallel, even though they are stonewall'd, as far as I can tell by the reports. I sent a mail to the fio mailing list for clarification.

It looks like the jobs are started in parallel, but actual (correctly) run serially. It seems like this might just be a matter of reporting the right timestamps in the end, although it does feel like starting all the processes (even if not doing any work yet) could skew the results.

Hangs during procedure

During the procedure, it happened a few times where any ZFS command would completely hang. It seems that using an external USB drive to sync stuff didn't work so well: sometimes it would reconnect under a different device (from sdc to sdd, for example), and this would greatly confuse ZFS.

Here, for example, is sdd reappearing out of the blue:

May 19 11:22:53 curie kernel: [  699.820301] scsi host4: uas
May 19 11:22:53 curie kernel: [  699.820544] usb 2-1: authorized to connect
May 19 11:22:53 curie kernel: [  699.922433] scsi 4:0:0:0: Direct-Access     ROG      ESD-S1C          0    PQ: 0 ANSI: 6
May 19 11:22:53 curie kernel: [  699.923235] sd 4:0:0:0: Attached scsi generic sg2 type 0
May 19 11:22:53 curie kernel: [  699.923676] sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
May 19 11:22:53 curie kernel: [  699.923788] sd 4:0:0:0: [sdd] Write Protect is off
May 19 11:22:53 curie kernel: [  699.923949] sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 19 11:22:53 curie kernel: [  699.924149] sd 4:0:0:0: [sdd] Optimal transfer size 33553920 bytes
May 19 11:22:53 curie kernel: [  699.961602]  sdd: sdd1 sdd2 sdd3 sdd4
May 19 11:22:53 curie kernel: [  699.996083] sd 4:0:0:0: [sdd] Attached SCSI disk

Next time I run a ZFS command (say zpool list), the command completely hangs (D state) and this comes up in the logs:

May 19 11:34:21 curie kernel: [ 1387.914843] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=71344128 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914859] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=205565952 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914874] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272789504 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914906] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=270336 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914932] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073225728 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914948] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073487872 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.915165] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272793600 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915183] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=339853312 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915648] WARNING: Pool 'bpool' has encountered an uncorrectable I/O failure and has been suspended.
May 19 11:34:21 curie kernel: [ 1387.915648] 
May 19 11:37:25 curie kernel: [ 1571.558614] task:txg_sync        state:D stack:    0 pid:  997 ppid:     2 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.558623] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.558640]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.558650]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.558670]  schedule_timeout+0x8b/0x140
May 19 11:37:25 curie kernel: [ 1571.558675]  ? __next_timer_interrupt+0x110/0x110
May 19 11:37:25 curie kernel: [ 1571.558678]  io_schedule_timeout+0x4c/0x80
May 19 11:37:25 curie kernel: [ 1571.558689]  __cv_timedwait_common+0x12b/0x160 [spl]
May 19 11:37:25 curie kernel: [ 1571.558694]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.558702]  __cv_timedwait_io+0x15/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.558816]  zio_wait+0x129/0x2b0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.558929]  dsl_pool_sync+0x461/0x4f0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559032]  spa_sync+0x575/0xfa0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559138]  ? spa_txg_history_init_io+0x101/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559245]  txg_sync_thread+0x2e0/0x4a0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559354]  ? txg_fini+0x240/0x240 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559366]  thread_generic_wrapper+0x6f/0x80 [spl]
May 19 11:37:25 curie kernel: [ 1571.559376]  ? __thread_exit+0x20/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.559379]  kthread+0x11b/0x140
May 19 11:37:25 curie kernel: [ 1571.559382]  ? __kthread_bind_mask+0x60/0x60
May 19 11:37:25 curie kernel: [ 1571.559386]  ret_from_fork+0x22/0x30
May 19 11:37:25 curie kernel: [ 1571.559401] task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
May 19 11:37:25 curie kernel: [ 1571.559404] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559409]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559412]  ? __kmalloc_node+0x141/0x2b0
May 19 11:37:25 curie kernel: [ 1571.559417]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559420]  schedule_preempt_disabled+0xa/0x10
May 19 11:37:25 curie kernel: [ 1571.559424]  __mutex_lock.constprop.0+0x133/0x460
May 19 11:37:25 curie kernel: [ 1571.559435]  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
May 19 11:37:25 curie kernel: [ 1571.559537]  spa_all_configs+0x41/0x120 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559644]  zfs_ioc_pool_configs+0x17/0x70 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559752]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559758]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.559860]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559866]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.559869]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.559873]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.559876] RIP: 0033:0x7fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559878] RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.559881] RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559883] RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
May 19 11:37:25 curie kernel: [ 1571.559885] RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
May 19 11:37:25 curie kernel: [ 1571.559886] R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
May 19 11:37:25 curie kernel: [ 1571.559888] R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
May 19 11:37:25 curie kernel: [ 1571.559980] task:zpool           state:D stack:    0 pid:11815 ppid:  3816 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.559983] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559988]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559992]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559995]  io_schedule+0x42/0x70
May 19 11:37:25 curie kernel: [ 1571.560004]  cv_wait_common+0xac/0x130 [spl]
May 19 11:37:25 curie kernel: [ 1571.560008]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.560118]  txg_wait_synced_impl+0xc9/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560223]  txg_wait_synced+0xc/0x40 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560325]  spa_export_common+0x4cd/0x590 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560430]  ? zfs_log_history+0x9c/0xf0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560537]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560543]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.560644]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560649]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.560653]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.560656]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.560659] RIP: 0033:0x7fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560661] RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.560664] RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560666] RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
May 19 11:37:25 curie kernel: [ 1571.560667] RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
May 19 11:37:25 curie kernel: [ 1571.560669] R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
May 19 11:37:25 curie kernel: [ 1571.560671] R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40

Here's another example, where you see the USB controller bleeping out and back into existence:

mai 19 11:38:39 curie kernel: usb 2-1: USB disconnect, device number 2
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
mai 19 11:39:25 curie kernel: INFO: task zed:1564 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  ? __kmalloc_node+0x141/0x2b0
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  schedule_preempt_disabled+0xa/0x10
mai 19 11:39:25 curie kernel:  __mutex_lock.constprop.0+0x133/0x460
mai 19 11:39:25 curie kernel:  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
mai 19 11:39:25 curie kernel:  spa_all_configs+0x41/0x120 [zfs]
mai 19 11:39:25 curie kernel:  zfs_ioc_pool_configs+0x17/0x70 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fcf0ef32cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
mai 19 11:39:25 curie kernel: RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
mai 19 11:39:25 curie kernel: RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
mai 19 11:39:25 curie kernel: R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
mai 19 11:39:25 curie kernel: R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
mai 19 11:39:25 curie kernel: INFO: task zpool:11815 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zpool           state:D stack:    0 pid:11815 ppid:  2621 flags:0x00004004
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  io_schedule+0x42/0x70
mai 19 11:39:25 curie kernel:  cv_wait_common+0xac/0x130 [spl]
mai 19 11:39:25 curie kernel:  ? add_wait_queue_exclusive+0x70/0x70
mai 19 11:39:25 curie kernel:  txg_wait_synced_impl+0xc9/0x110 [zfs]
mai 19 11:39:25 curie kernel:  txg_wait_synced+0xc/0x40 [zfs]
mai 19 11:39:25 curie kernel:  spa_export_common+0x4cd/0x590 [zfs]
mai 19 11:39:25 curie kernel:  ? zfs_log_history+0x9c/0xf0 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fdc23be2cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
mai 19 11:39:25 curie kernel: RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
mai 19 11:39:25 curie kernel: RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
mai 19 11:39:25 curie kernel: R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
mai 19 11:39:25 curie kernel: R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40

I understand those are rather extreme conditions: I would fully expect the pool to stop working if the underlying drives disappear. What doesn't seem acceptable is that a command would completely hang like this.

References

See the zfs documentation for more information about ZFS, and tubman for another installation and migration procedure.

...
@kushal November 16, 2022 - 12:53 • 11 days ago
Johnnycanencrypt 0.11.0 released

A couple of days ago I released Johnnycanencrypt v0.11.0. It is a Python module for OpenPGP written in Rust.

The most interesting update is for Linux users, now we have pre-built wheels for Python 3.8, 3.9, 3.10 & 3.11. You can just install that via python3 -m pip install johnnycanencrypt. You can also do the same on Intel Macs (for Python 3.10 and 3.11). But, I failed to build for Apple Silicon systems. I will work on it in the coming weeks.

To know the Yubikey card version we can call get_card_version function written in Rust.

>>> rjce.get_card_version()
(4, 3, 1)

Or directly use the function get_card_touch_policies to get a list of policies for the current Yubikey.

>>> import johnnycanencrypt as jce
>>> jce.get_card_touch_policies()
[TouchMode.On, TouchMode.Off, TouchMode.Fixed]

Then we now can also set the touch policies for a given Yubikey. For example:

rjce.set_keyslot_touch_policy(
    b"12345678", rjce.KeySlot.Authentication, rjce.TouchMode.On
)

It will be nice to see what people will build on top this. I also missed to blog about the v0.10.0 release. You should that changelog too.

...
@kushal November 15, 2022 - 16:47 • 12 days ago
Tor 0.4.7.11 is ready

Today I built and pushed the Tor RPM(s) for 0.4.7.11. Please make sure that you upgrade your relays and bridges. I have the updated package for Fedora 37 too (which also released today).

You can know more about the Tor's RPM respository at https://support.torproject.org/rpm/.

If you have any queries, feel free to find us over #tor or #tor-relay channel on OFTC.

...
@anarcat November 8, 2022 - 16:48 • 19 days ago
Using the bell as modern notification

Computer terminals have traditionally had an actual bell that would ring when a certain control character (the bell character, typically control-g or \a in an C escape sequence) would come in the input stream.

That feature actually predates computers altogether, and was present in Baudot code, "an early character encoding for telegraphy invented by Émile Baudot in the 1870s", itself superseding Morse code.

Modern terminal emulators have, of course, kept that feature: if you run this command in a terminal right now:

printf '\a'

... you may hear some annoying beep. Or not. It actually depends on a lot of factors, including which terminal emulator you're using, how it's configured, whether you have headphones on, or speakers connected, or, if you're really old school, a PC speaker, even.

Typically, I have this theory that it does the exact opposite of what you want, regardless of whether you have configured it or not. That is, if you want it to make noises, it won't, and if you want it to stay silent, it will make brutal, annoying noises in moments you would the least expect. I suspect this is a law in computer science, but I'm too lazy to come up with a formal definition.

Yet something can be done with this.

Making the bell useful and silent

Many terminal emulators have this feature where they can silence the bell somehow. It can be turned into a "visual bell" which basically flashes the screen when a bell arrives. Or that can also be disabled and the bell is just completely ignored.

What I did instead is turn the bell into a "urgency hint" (part of the ICCCM standard. In xterm, this is done with this X resource entry (typically in ~/.Xresources):

XTerm*bellIsUrgent:  true
XTerm*visualBell: false

Interestingly, this doesn't clearly say "bell is muted", but it's effectively what it does. Or maybe it works because I have muted "System Sounds" in Pulseaudio. Who knows. I do have this in my startup scripts though:

xset b off

... which, according to the xset(1) manpage, means

If the dash or 'off' are given, the bell will be turned off.

Interestingly, you have the option of setting the bell "volume", "pitch, in hertz, and [...] duration in milliseconds. Note that not all hardware can vary the bell characteristics." In any case, I think that's the magic trick to turn the darn thing off.

Now this should send urgency hints to your window manager:

sleep 3 ; printf '\a'

Try it: run the command, switch to another desktop, then wait 3 seconds. You should see the previous desktop show up in red or something.

In the i3 window manager I am currently using, this is the default, although I did set the colors (client.urgent and urgent_workspace in bar.colors).

Other window managers or desktop environments may require different configurations.

Sending a bell...

Now that, on itself, will only be useful when something sets a bell. One place I had found a trick like this, long ago, is this post (dead link, archived) which has various instructions for different tools. I'll recopy some of them here since the original site is dead, but credit goes to the Netbuz blog.

(Note that the blog post also features an Awesome WM configuration for urgency hints.)

Mutt

set beep=yes
set beep_new=yes

Irssi

/set beep_when_window_active ON
/set beep_when_away ON
/set beep_msg_level MSGS DCC DCCMSGS HILIGHT

It was also recommending this setting, but it appears to be deprecated and gives a warning in modern irssi versions:

/set bell_beeps ON

GNU Screen

This is an important piece of the puzzle, because by default, terminal multiplexers have their own opinion of what to do with the bell as well:

# disabled: we want to propagate bell to clients, which should handle
# it in their own terminal settings. this `vbell off` is also the
# upstream and tmux's default
#
# see also: http://netbuz.org/blog/2011/11/x-bells-and-urgency-hints/
vbell off
# propagate bell from other windows up to the terminal emulator as well
bell_msg 'Bell in window %n^G'

The bell_msg bit is an extra from me: it uses the bell message that pops up when screen detects a bell in another window to resend the bell control character up to the running terminal.

This makes it so a bell in any multiplexed window will also propagate to the parent terminal, which is not the default.

Tmux

Untested, but that is apparently how you do it:

# listen to alerts from all windows
set -g bell-action any
# notice bell in windows
set -g monitor-bell on
# only propagate bell, don't warn user, as it hangs tmux for a second
set -g visual-bell off
# send bell *and* notify when activity (if monitor-activity)
set -g visual-activity both

Note that this config goes beyond what we have in GNU screen in that inactivity or activity will trigger a bell as well. This might be useful for cases where you don't have the prompt hack (below) but it could also very well be very noisy. It will only generate noise when monitor-activity is enabled though.

bash and shell prompts

Now the icing on cake is to actually send the bell when a command completes. This is what I use this for the most, actually.

I was previously using undistract-me for this, actually. That was nice: it would send me a nice desktop notification when a command was running more than a certain amount of time, and if the window was unfocused. But configuring this was a nightmare: because it uses a complex PROMPT_COMMAND in bash, it would conflict with my (already existing and way too) complex bash prompt, and lead to odd behaviors. It would also not work for remote commands, of course, as it wouldn't have access to my local D-BUS to send notifications (thankfully!).

So instead, what I do now is systematically print a bell whenever a command terminates, in all my shells. I have this in my /root/.bashrc on all my servers, deployed in Puppet:

PROMPT_COMMAND='printf "\a"'

Or you can just put it directly in the shell prompt, with something like:

PS1='\[\a\]'"$PS1"

(I have the equivalent in my own .bashrc, although that thing is much more complex, featuring multi-command pipeline exit status, colors, terminal title setting, and more, which should probably warrant its own blog post.)

This sounds a little bonkers and really noisy, but remember that I turned off the audible bell. And urgency hints are going to show up only if the window is unfocused. So it's actually really nice and not really distracting.

Or, to reuse the undistract-me concept: it allows me to not lose focus too much when I'm waiting for a long process to complete.

That idea actually came from ahf, so kudos to him on that nice hack.

Caveats

That is not setup on all the machines I administer for work, that said. I'm afraid that would be too disruptive for people who do not have that configuration. This implies that I don't get notifications for commands that run for a long time on remote servers, most of the time. That said, I could simply run a command with a trailing printf '\a' to get a notification.

This might not work in Wayland, your window manager, your desktop environment, your Linux console, or your telegraphy session.

...
@blog November 4, 2022 - 00:00 • 23 days ago
You’re Invited: State of the Onion 2022

It is time for another State of the Onion! This is our annual virtual event where we share updates from Tor Project’s different teams, highlighting their work during the year and what we are excited for the upcoming year. We also have a part of the stream that is dedicated to our community for them to present their amazing work. You can check out our previous State of the Onion streams (2020 and 2021) at our YouTube channel.

This year we will be doing things a bit differently than we have in the past. Instead of having all the presentations happening on the same day, we are organizing two streams on two different days, one for Tor Project’s teams and another for our community. So please make sure to save the date for both events!

The State of the Onion will be streamed over our YouTube account.

Join the conversation on social media using the hashtag: #StateOfTheOnion2022 or post your questions and comments in the YouTube chat.

Nov 9 Program - State of the Onion: The Tor Project’s teams.

Topic Speaker Details
Opening Isabela, Executive Director Host presenting opening remarks.
Community Team Gus, Community Team Lead User support, translations, digital security trainings, relay operators organizing, are just some of the things that our Community Team will be sharing.
Censorship Circumvention Meskio, Anti-Censorship Team Lead Advances on Anti-Censorship at Tor. How are we reacting to the attempts to block Tor and what are we doing to be prepared for the next ones that will come.
Onions Everywhere Raya, Education and communities coordinator As part of the Community Team, we started an initiative to provide tools and support for organizations adopting onion services. We will talk about this work and the tools we have created as well as our plans to continue it in 2023.
UX & Design Team Duncan, UX & Design Team Lead A recap of recent work from the team in 2022, including our ethical user research program, product updates and design highlights – plus some shiny things we have up our sleeve for next year.
Applications & Tor Browser Richard, Applications Team Lead The Tor Browser 12.0 Release and Looking to 12.5
VPN Project Micah, Director of Engineering Meet our new Director of Engineering! We will share update on the progress on our VPN project.
Break Isabela Why is supporting Tor important?
Network Health Georg, Network Health Lead Highlights of the year and upcoming projects.
Tor network, little “t” tor Alex, Network Team Lead Highlights of the year and upcoming projects.
Arti Nick M., Network Team and Tor Project co-founder Updates on Arti, a new Tor client implementation in Rust.
Metrics Hiro, Metrics Lead Highlights of the year and upcoming projects.
Shadow Jim Newsome, Shadow Simulation Developer Intro to Shadow and the tech advancements we've made, the "push button simulation" setup, and how we're using it to evaluate changes to tor.
TPA Anarcat, Tor Project Admins Team Lead A walk through sysadmin magic land: who we are, what we do, what we did last year, and what's coming.
Year End Fundraising Al, Fundraising Director The Tor Project is a 501(c)(3) nonprofit--let's talk about how you can power privacy online with your support!
Closing Isabela Closing remarks.

Nov 16 Program - State of the Onion for the Tor community.

Topic Speaker Details
Opening Gaba, Tor Project’s Project Manager Host presenting opening remarks.
OONI - Open Observatory of Network Interference Maria Xynou OONI will share highlights from 2022, as well as upcoming plans for 2023.
Guardian Project Nathan Freitas and Fabiola Maurice We will share updates on all the different apps and mobile work we do: Orbot, Onion Browser, OnionShare mobile and mobile developer support, integration, SDKs. And share about our work with the Latin America community.
Calyx Institute Nick Merrill Calyx Institute will present about the progress on Onionshare for Android and work on integrating Tor into CalyxOS.
Superbloom Susan & Ngoc Simply Secure is rebranding as Superbloom. Superbloom leverages design as a transformative practice to shift power in the tech ecosystem, because everyone deserves technology they can trust.
Ricochet Refresh and Gosling Richard, Engineer Presentation on updates to Ricochet-Refresh and Gosling from this year.
Quiet Holmes Wilson Quiet is a peer-to-peer Slack alternative for desktop and mobile built on Tor and IPFS, which lets communities control their own data without running their own servers. We'll describe how Quiet works and demo group chat, file transfer, GIFs, and our fully-p2p Android app.
Privacy Accelerator and DemHack Olga Nemirovskaya Privacy Accelerator is an online accelerator for commercial and non-commercial projects in the field of privacy, access to information and legal tech. DemHack is a hackathon event done in collaboration with Tor and other members of the community.
Roskomsvoboda Sarkis Darbinyan Talk about how the trial over blocking Tor in Russia was conducted, as well as the current situation concerning Tor in Russia.
Tails Intrigeri and Boyska Tails is a portable operating system that protects against surveillance in censorship.
Closing Gaba, Tor Project’s Project Manager Host presenting closing remarks.
...
@blog November 3, 2022 - 00:00 • 24 days ago
Transparency, Openness, and Our 2020-2021 Financials

Every year, as required by U.S. federal law for 501(c)(3) nonprofits, the Tor Project completes a Form 990, and as required by contractual obligations and state regulations, an independent audit of our financial statements. After completing standard audits for 2020-2021,* our federal tax filings (Form 990) and audited financial statements are both now available. We upload all of our tax documents and publish a blog post about these documents in order to be transparent.

Transparency for a privacy project is not a contradiction: privacy is about choice, and we choose to be transparent in order to build trust and a stronger community. In this post, we aim to be very clear about where the Tor Project’s money comes from and what we do with it. This is how we operate in all aspects of our work: we show you all of our projects, in source code, and in periodic project and team reports, and in collaborations with researchers who help assess and improve Tor. Transparency also means being clear about our values, promises, and priorities as laid out in our social contract.

This year’s version of the financial transparency post is a bit different than past iterations: we hope that by expanding each subsection of this post and adding more detail, you will have an even better understanding of the Tor Project’s funding, and that you will be able to read the audited financial statements and Form 990 on your own should you have more questions.

*Note: Our fiscal years run from July through June instead of following the calendar year. We made this change in 2017 so that our fundraising season (December) falls in the middle of our fiscal year, allowing us to better plan our budgets.

Fiscal Year 2020-2021 Summary

The audit and tax filing forms we discuss in this post cover July 1, 2020 to June 30, 2021.

It’s no surprise that the Tor Project faced many uncertainties in our funding sources at the beginning of 2020 when the pandemic changed the world. Directly prior to the beginning of the 2020-2021 fiscal year, we saw a sharp decrease in individual donations.

Despite this difficulty, throughout the end of 2020, you helped us break fundraising records with the 2020 Bug Smash Fund and the 2020 year-end campaign. Then, in the spring of 2021, we held an auction of a generative art piece we called Dreaming at Dusk, created by artist Itzel Yard (ixshells) and derived from the private key of the first onion service, Dusk. PleasrDAO won this auction and raised approximately $1.7M in unrestricted funds for the Tor Project.

We ended the fiscal year in a strong position, thanks to your support. Below, we’ll talk more about the details of the Revenue and Expenses throughout the fiscal year.

Revenue & Support

The Tor Project’s Revenue and Support in fiscal year 2020-2021, as listed in the audited financial statement, was $7,825,375.

Screenshot showing the Revenue and Support section of the Tor Project's 2020-2021 audited financial statements

In our Form 990, our Revenue is listed as $7,412,081.

Screenshot showing the Revenue and Support section of the Tor Project's 2020-2021 IRS Form 990

Why is the “revenue” category different on the Form 990 and the audited financials? This is not an error. The different revenue totals reflect differences in rules about how you must report tax revenue to the IRS and how you must report revenue in audited financial statements. The audited financial statements includes and excludes revenue categories that the Form 990 does not, per tax reporting rules: audited financial statements include in-kind contributions ($797,597), and excludes a (now forgiven) Paycheck Protection Program loan from the U.S. Small Business Administration ($384,303).

A brief aside: let’s talk about in-kind contributions--the value of which is included in the audited financial statements, and excluded in the Form 990--and how we derive that value. In-kind contributions are donated services or goods, like translation completed by volunteers, website hosting, donated hardware, and contributed patches. This year, we counted $797,597 in donated services: that’s an estimated thousands of hours of software development, 1,364,631 words translated, and cloud hosting services for 23 servers. Right now, this estimation does not include relay operator time and cost--but it should. We are working to better estimate both the hours of software development and how to represent the contributions made by relay operators. Stay tuned for the coming years! Clearly, Tor would not be possible without you. Thank you.

For the purposes of the following subsections of this blog post, we’ll be looking at the Revenue total following the Form 990, and breaking this total into the following categories:

  • U.S. government (38.17% of total revenue)
  • Individual donations (36.22% of total revenue)
  • Private foundations (16.15% of total revenue)
  • Non-U.S. government (5.07% of total revenue)
  • Corporate (2.89% of total revenue)
  • Other (1.51% of total revenue)

Pie chart showing segments of revenue for the Tor Project

U.S. Government Support

During this fiscal year, about 38% of our revenue came from U.S. government funds. This is 15% down from the previous fiscal year, and in line with our long-term goal to diversify our funding in order to rely less on one single source. Over time the percentage of funds from different sources will fluctuate--in the current fiscal year, this percentage will go up based on the timing of new projects.

We get a lot of questions (and see a lot of FUD) about how the U.S. government funds the Tor Project, so we want to make this as clear as possible and show you where to find this information in the future in these publicly-available documents.

Let’s talk specifically about which parts of the U.S. government support Tor, and what kind of projects they fund. Below, you can see a screenshot from the Tor Project’s fiscal year 2020-2021 Form 990 on page 39, where we’ve listed all of our U.S. government funders. You will find text like this in our recent Form 990s:

Screenshot of the Tor Project's Form 990 Funding Sources section

Now, we’ll break down which projects are funded by each entity and link you to places in GitLab where we organize the work associated with this funding.

U.S. State Department Bureau of Democracy, Human Rights, and Labor ($1,532,843)

National Science Foundation + Georgetown University ($224,617)

Institute of Museum and Library Science + New York University ($96,569)

Defense Advanced Research Projects Agency + Georgetown University ($570,497)

U.S. Small Business Administration ($384,303)

  • We received a (now forgiven) loan from the U.S. Small Business Administration through the Paycheck Protection Program (PPP). PPP loans were designed to provide a direct incentive for small businesses to keep their workers on payroll during the first waves of economic impact caused by the COVID-19 pandemic. As of March 18, 2021, the loan has been forgiven.

National Institutes of Health + University of Pittsburgh ($20,421)

  • This funding passed through the Tor Project to Library Freedom Project to deliver a virtual health literacy and privacy series of four webinars for public library workers.

For even more about how government funding works for the Tor Project, consider reading our previous financial transparency posts, as well as Roger’s thorough comments on these posts.

Individual Donations

The next largest category of contributions at 36% of the total revenue was individual donors. In the 2020-2021 fiscal year, supporters gave $2,684,390 to the Tor Project in the form of one-time gifts and monthly donations. These gifts came in all different forms (including ten different cryptocurrencies, which are then converted to USD), and come together to equal our greatest individual fundraising year in our history.

Individual contributions come in many forms: some people donate $5 to the Tor Project one time, some donate $100 every month, and some make large gifts annually. The common thread is that individual donations are unrestricted funds, and a very important kind of support. Unrestricted funds allow us to respond to censorship events, develop our tools in a more agile way, and ensure we have reserves to keep Tor strong in case of emergencies.

One reason this category is much higher than previous years is that it includes the results of an NFT auction we held in mid-May 2021. Specifically, we commissioned artist Itzel Yard (ixshells) to create a generative art piece we called Dreaming at Dusk derived from the private key of the first onion service, Dusk. We then held an auction of the piece on Foundation. This auction resulted in a final bid of 500 Ethereum (ETH), roughly $2M USD at the time of the auction, with the proceeds going towards our work to improve and promote Tor. We also donated a portion of the proceeds to the Munduruku Wakoborũn Women's Association, a grassroots indigenous organization in Pará, Brazil.

Private Foundations + Corporations

Of the revenue in fiscal year 2020-2021, about 16% came from foundations and other nonprofits, and then another 3% from corporations.

Some of these contributions are in the form of restricted grants, which means we pick a project that is on our roadmap, we propose it to a funder, they agree that this project is important, and we are funded to complete these projects. Some examples in this category include Zcash Community Grants’ support of our first year of Arti, Digital Defenders’ support to write Onion Guides for easily setting up onion services (see the results of this project in English / Spanish / Portuguese), and the Bertha Foundation’s support for a fellow within the Tor Project who worked with environmental organizations and climate and Indigenous activists to strengthen their digital security.

Also in this category are unrestricted funds, like support from Media Democracy Fund, Craig Newmark Philanthropies, and Ford Foundation. These unrestricted funds are not tied to a specific project.

Unrestricted funds also include contributions from corporations, and is where you will find membership dues from our members. In the 2020-2021 fiscal year, you’ll see contributions from our members during this time. We haven’t listed every single entity you will see in our Form 990 in this blog post, but we hope you have a better understanding of what you might find in the Form 990.

Non-U.S. Governments

Sida is a government agency working on behalf of the Swedish parliament and government, with the mission to reduce poverty in the world. Sida supports the Tor Project so that we can carry out user research, localization, user support, usability improvements, training for communities, and training for trainers. This support is how we are able to meet our users and to carry out the user experience research we do to improve all Tor tools.

Other

$111,919 of our revenue is from and for the Privacy Enhancing Technologies Symposium. Tor supports the yearly PETS conference by providing structure and financial continuity.

Expenses

OK, we’ve told you how we get funding (and which documents to look at to learn more). Now what do we do with that money? Based on the Form 990, our expenses were $3,987,743 in fiscal year 2020-2021.

Screenshot showing the Tor Project's Form 990 expenses

Like I discussed in the Revenue section above, the expenses listed in our Form 990 and audited financial statements are different. Based on the audited financial statements, our expenses (page 4) were $4,782,017.

Screenshot showing the Tor Project's audited financial statements expenses

Why the difference? The Form 990 does not include interest we’ve earned or the in-kind contributions and the audited financial statements does. For the rest of this blog post, as above, we’ll be looking at the Expenses total following the Form 990.

We break our expenses into three main categories:

  • Administration: costs associated with organizational administration, like salary for our Executive Director, office supplies, business insurance costs;
  • Fundraising: costs associated with the fundraising program, like salary for fundraising staff, tools we use for fundraising, bank fees, postal mail supplies, swag; and
  • Program services: costs associated with making Tor and supporting the people who use it, including application, network, UX, metrics, and community staff salaries; contractor salaries; and IT costs. If you want to learn more about the work considered “Program services,” you should read our list of sponsored projects on our wiki.

In the 2020-2021 fiscal year, 87.2% of our expenses were associated with program services. That means that a very significant portion of our budget goes directly into building Tor and making it better. Next comes fundraising at 7.3% and administration at 5.4%.

Pie chart showing the breakdown of each category of expenses

Ultimately, like Roger has written in many past versions of this blog post, it’s very important to remember the big picture: Tor's budget is modest considering global impact. And it’s also critical to remember that our annual revenue is utterly dwarfed by what our adversaries spend to make the world a more dangerous and less free place.

In closing, we are extremely grateful for all of our donors and supporters. You make this work possible, and we hope this expanded version of our financial transparency post sheds more light on how the Tor Project raises money and how we spend it. Remember, that beyond making a donation, there are other ways to get involved, including volunteering and running a Tor relay!

...
@blog November 3, 2022 - 00:00 • 24 days ago
New Release: Tor Browser 11.5.7 (Windows, macOS, Linux)
...
@anarcat November 2, 2022 - 17:40 • 25 days ago
A typical yak shaving session

Someone recently asked what yak shaving means and, because I am a professional at this pastime, I figured I would share my most recent excursion in the field.

As a reminder, "yak shaving" describes a (anti?) pattern by which you engage in more and more (possibly useless) tasks that lead you further and further away from your original objective.

The path I took through the yak heard is this:

  1. i wondered if i can use my home network to experiment with another VPN software (e.g. Wireguard instead of IPsec)

  2. then i tried Tailscale because I heard good things about it, and they have an interesting approach to opensource

  3. I wasn't happy with that, so i tried an IPv6 tunnel

  4. that broke after a few minutes, so i went on to try deploying Wireguard with Puppet), which involved reviewing about 4 different Puppet modules

  5. while I was there, I might as well share those findings with the community, so I publish that as a blog post

  6. someone else mentions that Nebula (from Slack) is a thing, but after investigation, it's not well packaged in Debian, so didn't test it, but add it to the blog post

  7. now that I found the right Puppet module, I tried to deploy it with Puppet's g10k, which requires me to input a checksum

  8. I got lazy and figured if i would put the checksum wrong, it would tell me what the right checksum was, but it didn't: instead it silently succeeded instead of failing, which seemed really bad

  9. then I looked upstream for such a bug report and saw that the Debian package was many versions behind and, because I'm on the Golang packaging team, I figured I would just do the upgrade myself

  10. then there were problems with the Debian-specific patch trying to disable network tests, so i rewrote the patch

  11. ... but ended up realizing basically all tests require the network, so I just disabled the build-time tests

  12. ... but then tried to readd it to Debian CI instead, which didn't work

At that point, I had given up, after shaving a 12th yak. Thankfully, a kind soul provided a working test suite and I was able to roll back all those parenthesis and:

  1. test the g10k package and confirm it works (and checks the checksums)

  2. upload the package to the Debian archive

  3. deploy the package in my Puppet manifests

  4. deploy a first tunnel

You'll also notice the work is not complete at all. I still need to:

  • make a full mesh between all nodes, probably with exported resources

  • have IP addresses in DNS so I don't need to remember them

  • hook up Prometheus into Puppet to monitor all nodes

  • deploy this at work (torproject.org), replacing the IPsec module I was originally trying to publish

Also notice the 8th yak, above, which might be a security issue. I wasn't able to confirm it, because g10k does some pretty aggressive caching, and I could "reproduce" it in the sense that the checksum wasn't checked if it exists in the cache. So it might have just been that I had actually already deployed the module before adding the checksum... but I still had that distressing sentiment:

<anarcat> there's a huge yak breathing down my neck with "CVE" written in large red letters on the side
<anarcat> i'm trying to ignore it, it stinks like hell

Hopefully it's nothing to worry about. Right? Riiight.

Oh. And obviously, writing this blog post is the sugar on top, the one last yak that is self-documented here.

...
@blog October 31, 2022 - 00:00 • 27 days ago
New Alpha Release: Tor Browser 12.0a4 (Android, Windows, macOS, Linux)

Tor Browser 12.0a4 is now available from the Tor Browser download page and also from our distribution directory.

Tor Browser 12.0a4 updates Firefox on Android, Windows, macOS, and Linux to 102.4.0esr.

This version includes important security updates to Firefox and GeckoView. There were no Android-specific security updates to backport from the Firefox 106 release.

What's new?

Tor Browser 12.0a4 includes a number of changes that require testing and feedback:

Multi-locale bundles (Desktop)

This is the first multi-locale release of Tor Browser Alpha for desktop. All supported languages are now included in a single bundle, and can be changed without requiring additional downloads via the Language menu in General settings.

What to test: Tor Browser Alpha should default to your system language on first launch if it matches a language we support. Alpha testers are also encouraged to test changing language within about:preferences#general, and to report any new bugs with localization in general.

tor-launcher migration (Desktop)

Parts of the code that power tor-launcher – which starts tor within Tor Browser – have been refactored. Although this work doesn't include any changes to the user experience, those who run non-standard Tor Browser setups are encouraged to test 12.0a4 on their systems.

What to test: Alpha testers who run non-standard Tor Browser setups (including, but not limited to, those who use system tor in conjunction with Tor Browser) should test starting and connecting to Tor, and report any unexpected error messages they encounter. All of the previously supported environment variables should still behave the same way as in the stable series.

Onion Auth fixes (Desktop)

12.0a4 includes two fixes to Onion Service client authorization:

  1. A fix to the auth window itself, which was broken in Alpha due to a regression caused by the esr102 transition: tor-browser#41344
  2. Another fix to a longstanding issue with Onion Auth failing on subdomains, which has also been backported to 11.5.5: tor-browser#40465

What to test: Accessing client authorized Onion Services on both top-level and subdomains.

Always prioritize .onion sites (Android)

Android users can now enable automatic Onion-Location redirects by switching "Prioritize .onion sites" within Privacy and Security settings.

What to test: Enable "Prioritize .onion sites" within settings, visit a website that supports Onion-Location, and verify that you were redirected to the website's .onion address.

Unified Español locale (Desktop and Android)

Previous versions of Tor Browser Alpha were available in both "es" and "es-AR" (Español Argentina) locales. In Tor Browser 12.0a4 these have been unified into a single Spanish locale instead.

What to test: Alpha testers who use the "es-AR" locale should be automatically switched to "es-ES" after updating.

The full changelog since Tor Browser 12.0a3 is:

...
@anarcat October 28, 2022 - 17:01 • 30 days ago
Debating VPN options

In my home lab(s), I have a handful of machines spread around a few points of presence, with mostly residential/commercial cable/DSL uplinks, which means, generally, NAT. This makes monitoring those devices kind of impossible. While I do punch holes for SSH, using jump hosts gets old quick, so I'm considering adding a virtual private network (a "VPN", not a VPN service) so that all machines can be reachable from everywhere.

I see three ways this can work:

  1. a home-made Wireguard VPN, deployed with Puppet
  2. a Wireguard VPN overlay, with Tailscale or equivalent
  3. IPv6, native or with tunnels

So which one will it be?

Wireguard Puppet modules

As is (unfortunately) typical with Puppet, I found multiple different modules to talk with Wireguard.

module score downloads release stars watch forks license docs contrib issue PR notes
halyard 3.1 1,807 2022-10-14 0 0 0 MIT no requires firewall and Configvault_Write modules?
voxpupuli 5.0 4,201 2022-10-01 2 23 7 AGPLv3 good 1/9 1/4 1/61 optionnally configures ferm, uses systemd-networkd, recommends systemd module with manage_systemd to true, purges unknown keys
abaranov 4.7 17,017 2021-08-20 9 3 38 MIT okay 1/17 4/7 4/28 requires pre-generated private keys
arrnorets 3.1 16,646 2020-12-28 1 2 1 Apache-2 okay 1 0 0 requires pre-generated private keys?

The voxpupuli module seems to be the most promising. The abaranov module is more popular and has more contributors, but it has more open issues and PRs.

More critically, the voxpupuli module was written after the abaranov author didn't respond to a PR from the voxpupuli author trying to add more automation (namely private key management).

It looks like setting up a wireguard network would be as simple as this on node A:

wireguard::interface { 'wg0':
  source_addresses => ['2003:4f8:c17:4cf::1', '149.9.255.4'],
  public_key       => $facts['wireguard_pubkeys']['nodeB'],
  endpoint         => 'nodeB.example.com:53668',
  addresses        => [{'Address' => '192.168.123.6/30',},{'Address' => 'fe80::beef:1/64'},],
}

This configuration come from this pull request I sent to the module to document how to use that fact.

Note that the addresses used here are examples that shouldn't be reused and do not confirm to RFC5737 ("IPv4 Address Blocks Reserved for Documentation", 192.0.2.0/24 (TEST-NET-1), 198.51.100.0/24 (TEST-NET-2), and 203.0.113.0/24 (TEST-NET-3)) or RFC3849 ("IPv6 Address Prefix Reserved for Documentation", 2001:DB8::/32), but that's another story.

(To avoid boostrapping problems, the resubmit-facts configuration could be used so that other nodes facts are more immediately available.)

One problem with the above approach is that you explicitly need to take care of routing, network topology, and addressing. This can get complicated quickly, especially if you have lots of devices, behind NAT, in multiple locations (which is basically my life at home, unfortunately).

Concretely, basic Wireguard only support one peer behind NAT. There are some workarounds for this, but they generally imply a relay server of some sort, or some custom registry, it's kind of a mess. And this is where overlay networks like Tailscale come in.

Tailscale

Tailscale is basically designed to deal with this problem. It's not fully opensource, but pretty close, and they have an interesting philosophy behind that. The client is opensource, and there is an opensource version of the server side, called headscale. They have recently (late 2022) hired the main headscale developer while promising to keep supporting it, which is pretty amazing.

Tailscale provides an overlay network based on Wireguard, where each peer basically has a peer-to-peer encrypted connexion, with automatic key rotation. They also ship a multitude of applications and features on top of that like file sharing, keyless SSH access, and so on. The authentication layer is based on an existing SSO provider, you don't just register with Tailscale with new account, you login with Google, Microsoft, or GitHub (which, really, is still Microsoft).

The Headscale server ships with many features out of that:

  • Full "base" support of Tailscale's features
  • Configurable DNS
    • Split DNS
    • MagicDNS (each user gets a name)
  • Node registration
    • Single-Sign-On (via Open ID Connect)
    • Pre authenticated key
  • Taildrop (File Sharing)
  • Access control lists
  • Support for multiple IP ranges in the tailnet
  • Dual stack (IPv4 and IPv6)
  • Routing advertising (including exit nodes)
  • Ephemeral nodes
  • Embedded DERP server (AKA NAT-to-NAT traversal)

Neither project (client or server) is in Debian (RFP 972439 for the client, none filed yet for the server), which makes deploying this for my use case rather problematic. Their install instructions are basically a curl | bash but they also provide packages for various platforms. Their Debian install instructions are surprisingly good, and check most of the third party checklist we're trying to establish. (It's missing a pin.)

There's also a Puppet module for tailscale, naturally.

What I find a little disturbing with Tailscale is that you not only need to trust Tailscale with authorizing your devices, you also basically delegate that trust also to the SSO provider. So, in my case, GitHub (or anyone who compromises my account there) can penetrate the VPN. A little scary.

Tailscale is also kind of an "all or nothing" thing. They have MagicDNS, file transfers, all sorts of things, but those things require you to hook up your resolver with Tailscale. In fact, Tailscale kind of assumes you will use their nameservers, and have suffered great lengths to figure out how to do that. And naturally, here, it doesn't seem to work reliably; my resolv.conf somehow gets replaced and the magic resolution of the ts.net domain fails.

(I wonder why we can't opt in to just publicly resolve the ts.net domain. I don't care if someone can enumerate the private IP addreses or machines in use in my VPN, at least I don't care as much as fighting with resolv.conf everywhere.)

Because I mostly have access to the routers on the networks I'm on, I don't think I'll be using tailscale in the long term. But it's pretty impressive stuff: in the time it took me to even review the Puppet modules to configure Wireguard (which is what I'll probably end up doing), I was up and running with Tailscale (but with a broken DNS, naturally).

(And yes, basic Wireguard won't bring me DNS either, but at least I won't have to trust Tailscale's Debian packages, and Tailscale, and Microsoft, and GitHub with this thing.)

IPv6

IPv6 is actually what is supposed to solve this. Not NAT port forwarding crap, just real IPs everywhere.

The problem is: even though IPv6 adoption is still growing, it's kind of reaching a plateau at around 40% world-wide, with Canada lagging behind at 34%. It doesn't help that major ISPs in Canada (e.g. Bell Canada, Videotron) don't care at all about IPv6 (e.g. Videotron in beta since 2011). So we can't rely on those companies to do the right thing here.

The typical solution here is often to use a tunnel like HE's tunnelbroker.net. It's kind of tricky to configure, but once it's done, it works. You get end-to-end connectivity as long as everyone on the network is on IPv6.

And that's really where the problem lies here; the second one of your nodes can't setup such a tunnel, you're kind of stuck and that tool completely breaks down. IPv6 tunnels also don't give you the kind of security a VPN provides as well, naturally.

The other downside of a tunnel is you don't really get peer-to-peer connectivity: you go through the tunnel. So you can expect higher latencies and possibly lower bandwidth as well. Also, HE.net doesn't currently charge for this service (and they've been doing this for a long time), but this could change in the future (just like Tailscale, that said).

Concretely, the latency difference is rather minimal, Google:

--- ipv6.l.google.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 136,8ms
RTT[ms]: min = 13, median = 14, p(90) = 14, max = 15

--- google.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 136,0ms
RTT[ms]: min = 13, median = 13, p(90) = 14, max = 14

In the case of GitHub, latency is actually lower, interestingly:

--- ipv6.github.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 134,6ms
RTT[ms]: min = 13, median = 13, p(90) = 14, max = 14

--- github.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 293,1ms
RTT[ms]: min = 29, median = 29, p(90) = 29, max = 30

That is because HE.net peers directly with my ISP and Fastly (which is behind GitHub.com's IPv6, apparently?), so it's only 6 hops away. While over IPv4, the ping goes over New York, before landing AWS's Ashburn, Virginia datacenters, for a whopping 13 hops...

I managed setup a HE.net tunnel at home, because I also need IPv6 for other reasons (namely debugging at work). My first attempt at setting this up in the office failed, but now that I found the openwrt.org guide, it worked... for a while, and I was able to produce the above, encouraging, mini benchmarks.

Unfortunately, a few minutes later, IPv6 just went down again. And the problem with that is that many programs (and especially OpenSSH) do not respect the Happy Eyeballs protocol (RFC 8305), which means various mysterious "hangs" at random times on random applications. It's kind of a terrible user experience, on top of breaking the one thing it's supposed to do, of course, which is to give me transparent access to all the nodes I maintain.

Even worse, it would still be a problem for other remote nodes I might setup where I might not have acess to the router to setup the tunnel. It's also not absolutely clear what happens if you setup the same tunnel in two places... Presumably, something is smart enough to distribute only a part of the /48 block selectively, but I don't really feel like going that far, considering how flaky the setup is already.

Other options

If this post sounds a little biased towards IPv6 and Wireguard, it's because it is. I would like everyone to migrate to IPv6 already, and Wireguard seems like a simple and sound system.

I'm aware of many other options to make VPNs. So before anyone jumps in and says "but what about...", do know that I have personnally experimented with:

  • tinc: nice, automatic meshing, used for the Montreal mesh, serious design flaws in the crypto that make it generally unsafe to use; supposedly, v1.1 (or 2.0?) will fix this, but that's been promised for over a decade by now

  • ipsec, specifically strongswan: hard to configure (especially configure correctly!), harder even to debug, otherwise really nice because transparent (e.g. no need for special subnets), used at work, but also considering a replacement there because it's a major barrier to entry to train new staff

  • OpenVPN: mostly used as a client for [VPN service][]s like Riseup VPN or Mullvad, mostly relevant for client-server configurations, not really peer-to-peer, shared secrets or TLS, kind of an hassle to maintain, see also SoftEther for an alternative implementation

All of those solutions have significant problems and I do not wish to use any of those for this project.

Also note that Tailscale is only one of many projects laid over Wireguard to do that kind of thing, see this LWN review for others (basically NetbBird, Firezone, and Netmaker).

Future work

Those are options that came up after writing this post, and might warrant further examination in the future.

  • Meshbird, a "distributed private networking" with little information about how it actually works other than "encrypted with strong AES-256"

  • Nebula, "A scalable overlay networking tool with a focus on performance, simplicity and security", written by Slack people to replace IPsec, docs, runs as an overlay for Slack's 50k node network, only packaged in Debian experimental, lagging behind upstream (1.4.0, from May 2021 vs upstream's 1.6.1 from September 2022), requires a central CA, Golang, I'm in "wait and see" mode for now

  • n2n: "layer two VPN", seems packaged in Debian but inactive

  • ouroboros: "peer-to-peer packet network prototype", sounds and seems complicated

  • QuickTUN is interesting because it's just a small wrapper around NaCL, and it's in Debian... but maybe too obscure for my own good

  • unetd: Wireguard-based full mesh networking from OpenWRT, not in Debian

  • vpncloud: "high performance peer-to-peer mesh VPN over UDP supporting strong encryption, NAT traversal and a simple configuration", sounds interesting, not in Debian

  • Yggdrasil: actually a pretty good match for my use case, but I didn't think of it when starting the experiments here; packaged in Debian, with the Golang version planned, Puppet module; major caveat: nodes exposed publicly inside the global mesh unless configured otherwise (firewall suggested), requires port forwards, alpha status

Conclusion

Right now, I'm going to deploy Wireguard tunnels with Puppet. It seems like kind of a pain in the back, but it's something I will be able to reuse for work, possibly completely replacing strongswan.

I have another Puppet module for IPsec which I was planning to publish, but now I'm thinking I should just abort that and replace everything with Wireguard, assuming we still need VPNs at work in the future. (I have a number of reasons to believe we might not need any in the near future anyways...)

...
@blog October 27, 2022 - 00:00 • 1 months ago
New Release: Tor Browser 11.5.6 (Android, Windows, macOS, Linux)

Tor Browser 11.5.6 is now available from the Tor Browser download page and also from our distribution directory.

This is an emergency release resolving an issue with Tor Browser 11.5.5's integration of the Snowflake pluggable transport. Users of 11.5.5 will be unable to connect to the Tor Network via the built-in Snowflake bridge until they update to 11.5.6.

If you applied the following workaround to add a working Snowflake bridge manually, you may revert back to using the "Built-in" Snowflake bridge option after updating to 11.5.6.

The full changelog since Tor Browser 11.5.5 is:

...
@blog October 25, 2022 - 00:00 • 1 months ago
New Release: Tor Browser 11.5.5 (Android, Windows, macOS, Linux)

Tor Browser 11.5.5 is now available from the Tor Browser download page and also from our distribution directory.

Tor Browser 11.5.5 backports the following security updates from Firefox ESR 102.4 to to Firefox ESR 91.13 on Windows, macOS and Linux:

Tor Browser 11.5.5 updates GeckoView on Android to 102.4.0esr and includes important security updates. There were no Android-specific security updates to backport from the Firefox 106 release.

The full changelog since Tor Browser 11.5.4 is:

...
@blog October 21, 2022 - 00:00 • 1 months ago
Global Encryption Day: Demand End-to-End Encryption in DMs

Today, October 21, 2022, is the second Global Encryption Day, organized by the Global Encryption Coalition, of which the Tor Project is a member. Global Encryption Day is an opportunity for businesses, civil society organizations, technologists, and millions of Internet users worldwide to highlight why encryption matters and to advocate for its advancement and protection.

At the Tor Project, we’re proud to help millions of people take back their right to privacy, to freely access and share information, and to more easily circumvent internet censorship--and encryption makes this possible.

Encryption allows us to provide these tools: for example, Tor uses three layers of encryption in the Tor circuit; each relay decrypts one layer before passing the request on to the next relay. Encryption is used in many other ways as well! Without encryption, millions of people would lose their access to the safe and uncensored internet.

Global Encryption Day is also a day for all of us to highlight services that should be encrypted--but are not yet.

The context of our online lives changes from day-to-day. Conversations that seem safe one day may contain illegal content the next. In the U.S., after the reversal of Roe v. Wade in June 2022, platforms have been forced to turn over messages between users that took place before June. For example, Facebook recently handed over direct messages between a mom and her teenage daughter to police that took place in April. Without end-to-end encryption, our online lives can be hacked, shared with law enforcement, or even accessed by creepy company employees.

Private and secure spaces for online communication and support are more important right now than ever.

That's why this Global Encryption Day, we’re calling on platforms to do what’s right: protect our privacy. We’re demanding that Big Tech finally bring the protection of end-to-end encryption to DMs in their products--not as a hidden “added feature”--but as the default setting. Companies like Meta, Twitter, Apple, Google, Slack, Discord, and all services with chat features must add end-to-end encryption their messaging services immediately.

Big Tech needs to make DMs safe now. We've signed an open letter to MAKE DMS SAFE organized by Fight for the Future, and we encourage you to do the same. Demand that Facebook, Twitter, Google, Apple--and any company with a messaging platform-- protect our private messages and implement default end-to-end encryption immediately.

...
@ooni October 20, 2022 - 00:00 • 1 months ago
Survey: Help shape OONI’s strategic priorities
OONI will celebrate its 10th anniversary on 5th December 2022! This is a time for celebration, but also a time for reflection: What worked well? What should we improve? What should we do differently going forward? The OONI community has been at the heart of our work over the past decade, and so we invite our community to help shape OONI’s strategic priorities for the future. Please take a few minutes to complete our survey: https://ooni. ...
@blog October 13, 2022 - 00:00 • 2 months ago
Resistance, change, & freedom: powered by privacy

Every year, the Tor Project (a 501(c)(3) nonprofit) holds a fundraiser where we ask for your support. (If you’re a Tor fan, you probably know that by now!) And every year, we unveil new gifts and a slogan that highlight our values and the importance of Tor.

This year, we’re highlighting three positive forces in the world that are powered by privacy: resistance, change, and freedom.

We’re living in a time of rising authoritarianism. Restrictive internet censorship and blackouts are normal tools used by governments to disrupt organizing and dissent. Companies continuously surveill their workers with digital spyware. Schools restrict, monitor, and catalog student activity and behavior. Data brokers sell extremely precise browsing and location data–and governments easily circumvent warrants by scooping that data up for a bargain. After the 2022 reversal of Roe v Wade, U.S. courts have already successfully used digital evidence, like web searches to prosecute people seeking abortions. All of these violations of our privacy make it difficult to exercise our human right to assemble, research, speak, and protest.

At the Tor Project, our mission is to advance human rights and freedoms with open source privacy technology, because protecting privacy is key in the work for a better world.

We’re a small nonprofit facing global adversaries. Compared to the billion dollar surveillance capitalism industry and global governments, we’re a small force. But just because we’re small doesn’t mean we can’t grow, and it doesn’t mean we can’t be successful.

Your support is key to Tor's success, and the best way to power Tor into the future is through a monthly commitment.

Looking at leadership organizations like EFF and Wikimedia Foundation, you can see that recurring donations–by individuals and by companies–are what allow communities to take on ambitious projects (like, say, offering privacy online to everyone who needs it!). Why does a recurring commitment matter so much? Because reliable resources mean that we can plan for the future with confidence. When we bring new folks in to work on critical areas at Tor, we want to make sure they are here for the long term. Stability like this allows us to invest in solving the big problems.

There’s a unique strength to Tor: this community knows the power of your support to change the course of history. In moments of need, you step up in creative ways to support Tor’s work, and millions of people can access Tor because of your drive to advance internet freedom.

Screenshot of a Tweet reading "To help -> go to snowflake.torproject.org," with two images. The left image shows a wide shot of a protest, with people holding signs. The right image shows a close shot of one cardboard protest sign that reads, "To help -> go to snowflake.torproject.org"

You can see examples of your success right now: from spinning up more than 80,000 new Snowflakes to keep Iranians online during the #زن_زندگی_آزادی / #MahsaAmini protests to adding thousands of new bridges when Russia began censoring the Tor Project’s website.

In today’s world, Tor needs to be available for those who need it at a moment’s notice. Circumstances change and governments make decisions that upend our lives. Your commitment to Tor ensures that is the case.

Now through the end of the year, your donation to the Tor Project today will be matched, 1:1. That means that every donation, up to $100,000, will be doubled. Now is an excellent time to give because your donation counts twice. Be sure to check out https://donate.torproject.org for this year’s new t-shirt, and a brand-new offering: a sturdy canvas tote bag.

Image of a black t-shirt and black canvas tote with a purple background. The t-shirt and tote read, "Powered by privacy; RESISTANCE, CHANGE, FREEDOM" in lime green and pink text."

Ways to make a difference:

1:1 Match provided by Friends of Tor

Now, take a moment to meet the Friends of Tor, the generous donors who are matching your donations, up to $100,000:

Aspiration logo Aspiration connects nonprofit organizations, foundations and activists with free and open software solutions and technology skills that help them better carry out their missions. We want those working for social and racial justice to be able to find and use the best tools and practices available, so that they maximize their effectiveness and impact and, in turn, change the world. We also work with free and open source projects and communities in both support and partnership roles, advising and contributing on matters of strategy, sustainability, governance, community health, equity and diversity. We design and facilitate unique and collaborative nonprofit and FLOSS technology convenings, and have run almost 700 in over 50 countries as well as online over the past 16 years.


Portrait of John CallasJon Callas is a cryptographer, software engineer, user experience designer, and entrepreneur. Jon is the co-author of many crypto and security systems including OpenPGP, DKIM, ZRTP, Skein, and Threefish. Jon has co-founded several startups including PGP, Silent Circle, and Blackphone. Jon has worked on security, user experience, and encryption for Apple, Kroll-O'Gara, Counterpane, and Entrust. Before coming to the EFF, Jon was a technologist in the ACLU's Speech, Privacy, and Technology Project on issues including surveillance, encryption, machine learning, end-user security, and privacy. Jon is fond of Leica cameras, Morgan sports cars, and Birman cats. Jon's photographs have been used by Wired, CBS News, and The Guggenheim Museum.


Portrait of Craig NewmarkCraig Newmark is a Web pioneer, philanthropist, and leading advocate. Most commonly known for founding the online classified ads service craigslist, Newmark works to support and connect people and drive broad civic engagement. In 2016, he founded Craig Newmark Philanthropies to advance people and grassroots organizations that are “getting stuff done” in areas that include trustworthy journalism & the information ecosystem, voter protection, women in technology, and veterans & military families. At its core, all of Newmark’s philanthropic work helps to strengthen American democracy by supporting the values that the country aspires to – fairness, opportunity, and respect.


Portrait of Wendy SeltzerWendy Seltzer is Strategy Lead and Counsel to the World Wide Web Consortium (W3C) at MIT, improving the Web's security, availability, and interoperability through standards. As a Fellow with Harvard's Berkman Klein Center for Internet & Society, Wendy founded the Lumen Project (formerly Chilling Effects Clearinghouse), the web's pioneering transparency report to measure the impact of legal takedown demands online. She seeks to improve technology policy in support of user-driven innovation and secure communication.


We would also like to thank anonymous donors who collaborated to create this fund. This spot is dedicated to them as a small recognition of their support. As the Tor community knows, anonymity loves company, so why not join our anonymous donors and make a contribution, too? With their matching donation, your contribution has double the impact.

...
@blog October 12, 2022 - 00:00 • 2 months ago
New Release: Tor Browser 11.5.4 (Android, Windows, macOS, Linux)

Tor Browser 11.5.4 is now available from the Tor Browser download page and also from our distribution directory.

Our Android release follows our alpha branch to the ESR 102 release train. Please see the 12.0a2 blog post for details.

Tor Browser 11.5.4 backports the following security updates from Firefox ESR 102.3 to to Firefox ESR 91.13 on Windows, macOS and Linux:

Tor Browser 11.5.4 updates GeckoView on Android to 102.3.0esr. We also backport the following Android-specific security updates from Firefox 104 and 105:

The full changelog since Tor Browser 11.5.3 is:

...
@kushal October 10, 2022 - 17:30 • 2 months ago
Introducing pyarti, python module for the Tor Project
python3 -m pip install pyarti

pyarti is a Python module written in Rust using Arti from the Tor Project. Right now pyarti is in the initial stage, and you can create a SOCKS5 proxy object, and pass any Python connection via it. The following example creates a default proxy at port 9150 , and then verifies that the connection works. Finally we fetch a web page and print the text.

from pyarti import OnionProxy
import httpx
from httpx_socks import SyncProxyTransport

URL = "https://httpbin.org/get"

p = OnionProxy()
assert p.verify(blocking=True)
# Now we can use the proxy
transport = SyncProxyTransport.from_url("socks5://127.0.0.1:9150")
with httpx.Client(transport=transport) as client:
    res = client.get(URL)
    assert res.status_code == 200
    print(res.text)

In the coming months I am hoping to add more detailed API to the module. For now you can read the documentation.

...
@blog October 4, 2022 - 00:00 • 2 months ago
Arti 1.0.1 is released: Bugfixes and groundwork

Arti is our ongoing project to create an next-generation Tor client in Rust. Last month, we released Arti 1.0.0. Now we're announcing the next release in its series, Arti 1.0.1.

The last month, our team's time has been filled with company meetings, vacations, COVID recovery, groundwork for anticensorship features, and followup from Arti 1.0.0. Thus, this has been a fairly small release, but we believe it's worth upgrading for.

This is a fairly small release: it fixes a few annoying bugs (including one that would cause a busy-loop), tightens log security, improves, exposes an API for building circuits manually, and contains some preparatory work for anticensorship support, which we hope to deliver in early November.

You can find a more complete list of changes in our CHANGELOG.

For more information on using Arti, see our top-level README, and the docmentation for the arti binary.

Thanks to everybody who has helped with this release, including Alexander Færøy, Trinity Pointard, and Yuan Lyu.

Also, our deep thanks to Zcash Community Grants for funding the development of Arti!

...
@blog October 3, 2022 - 00:00 • 2 months ago
The Role of the Tor Project Board and Conflicts of Interest

Over the last couple of weeks, friends of the Tor Project have been raising questions about how Tor Project thinks of conflicts of interest and its board members, in light of the reporting from Motherboard about Team Cymru. I understand why folks would have questions, and so I want to write a bit about how the board of directors interacts with the Tor Project, and how our conflict of interest process works.

The Role of the Board

First off, a word about non-profit boards of directors. Although every non-profit is unique in its own way, the purpose of a board of an organization like The Tor Project, with a substantial staff and community, is not to set day-to-day policy or make engineering decisions for the organization. The board's primary role is a fiduciary one: to ensure that Tor is meeting its obligations under its bylaws and charter, and “hire/fire” power over the executive director. Although staff members may consult board members with relevant expertise over strategic decisions, and board members are selected in part for their background in the space, the board is separate from the maintenance and decision-making on Tor's code, and a board seat doesn't come with any special privileges over the Tor network. Board members may be consulted on technical decisions, but they don't make them. The Tor Project's staff and volunteers do. The Tor Project also has a social contract which everyone at Tor, including board members, has to comply with.

When we invite a person to join the Board, we are looking at the overall individual, their experience, expertise, character, and other qualities. We are not looking at them as representatives of another organization. But because Board members have fiduciary duties, they are are required to agree to a conflict of interest policy. That policy defines a conflict as “...the signee has an economic interest in, or acts as an officer or a director of, any outside entity whose financial interests would reasonably appear to be affected by the signee's relationship with The Tor Project, Inc. The signee should also disclose any personal, business, or volunteer affiliations that may give rise to a real or apparent conflict of interest.”

Handling Conflicts of Interest

Like most conflict processes under United States law, non-profit conflicts rely on individuals to assess their own interests and the degree to which they might diverge. The onus is often on individual board members, who know the extent of their obligations, to raise questions about conflicts to the rest of the board, or to recuse themselves from decisions.

It also means that conflicts, and perceived conflicts, change over time. In the case of Rob Thomas's work with Team Cymru, the Tor Project staff and volunteers expressed concerns to me at the end of 2021, spurring internal conversations. I believe it is important to listen to the community, and so I worked to facilitate discussions and surface questions that we could try to address. During these conversations, it became clear that although Team Cymru may offer services that run counter to the mission of Tor, there was no indication that Rob Thomas's role in the provision of those services created any direct risk to Tor users, which was our primary concern. This was also discussed by the Board in March and the Board came to the same conclusion.

But of course, not actively endangering our users is a low bar. It is reasonable to raise questions about the inherent disconnection between the business model of Team Cymru and the mission of Tor which consists of private and anonymous internet access for all. Rob Thomas's reasons for choosing to resign from the board are his own, but it has become more clear over the months since our initial conversation how Team Cymru's work is at odds with the Tor Project's mission.

What's Next

We at Tor, me, the board, staff and volunteers, will continue these conversations to identify how to do better from what we have learned here.

I have been working with the board to see where things can be done better in general. One of these initiatives is changing the Tor Project's board recruitment process. Historically, recruitment for board slots has been ad hoc - with current board members or project staff suggesting potential new candidates. This selection process has limited the pool of who has joined the board, and meant that we do not always reflect the diversity of experiences or perspectives of Tor users. For the first time we are running an open call for our board seats. Although this may seem unrelated to the idea of conflicts, we believe that more formalized processes create healthier boards that are able to work through potential conflict issues from a number of different angles.

Finally, let's talk about infrastructure for a moment. Our community has, rightly so, also raised concerns regarding the Tor Project usage of Team Cymru infrastructure. Team Cymru has donated hardware, and significant amounts of bandwidth to Tor over the years. These were mostly web mirrors and for internal projects like build and simulation machines.

Like all hardware that the Tor Project uses, we cannot guarantee perfect security when there is physical access, so we operate from a position of mistrust and rely on cryptographically verifiable reproducibility of our code to keep our users safe. As we would with machines hosted anywhere, the machines hosted at Cymru were cleanly installed using full disk encryption. This means that the set up with Team Cymru was not different from any other provider we would be using. So the level of risk for our users was the same when we used other providers.

But given the discussion of conflicts above, it's not tenable to continue to accept Team Cymru's donations of infrastructure. We have already been planning to move things out since early 2022. It is not a simple, or cheap task to move everything to some other location, so this process is going to take some time. We've already moved the web mirrors away, and are working on the next steps of this plan to completely move all services away from Team Cymru infrastructure. We thank the community for its patience with this process.

...