Planet Tor

@kushal July 5, 2022 - 15:41 • 11 hours ago
Using sigstore-python to sign and verify your software release

Sigstore allows software developers to quickly sign and verify the software they release. Many of the bigger projects use hardware-based OpenPGP keys to sign and release. But the steps used to make sure that the end-users are correctly verifying those signatures are long, and people make mistakes. Also, not every project has access to hardware smartcards, air-gapped private keys etc. Sigstore solves (or at least makes it way easier) these steps for most developers. It uses existing known (right now only 3) big OIDC providers using which one can sign and verify any data/software.

For this blog post, I will use the python tool called sigstore-python.

The first step is to create a virtual environment and then install the tool.

$ python3 -m venv .venv
$ source .venv/bin/activate
$ python -m pip install -r install/requirements.txt

Next, we create a file called message.txt with the data. This can be our actual release source code tarball.

$ echo "Kushal loves Python!" > message.txt

Signing the data

The next step is to actually sign the file.

$ python -m sigstore sign message.txt 
Waiting for browser interaction...
Using ephemeral certificate:
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----

Transparency log entry created at index: 2844439
Signature written to file message.txt.sig
Certificate written to file message.txt.crt

The command will open up the default browser, and we will have the choice to select one of the 3 following OIDC providers.

oidc providers

This will also create message.txt.crt & message.txt.sig files in the same directory.

We can use the openssl command to see the contents of the certificate file.

$ openssl x509 -in message.txt.crt -noout -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            3a:c4:2d:19:20:f0:bf:85:37:a6:01:0f:49:d1:b6:39:20:06:fd:77
        Signature Algorithm: ecdsa-with-SHA384
        Issuer: O = sigstore.dev, CN = sigstore-intermediate
        Validity
            Not Before: Jul  5 14:45:23 2022 GMT
            Not After : Jul  5 14:55:23 2022 GMT
        Subject: 
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (384 bit)
                pub:
                    04:12:aa:88:fd:c7:1f:9e:62:78:46:2a:48:63:d3:
                    b6:92:8b:51:a4:eb:59:18:fb:18:a0:13:54:ac:d0:
                    a4:d8:20:ab:a3:f3:5e:f5:86:aa:34:9b:30:db:59:
                    1b:5c:3d:29:b1:5a:40:ff:55:2c:26:fc:42:58:95:
                    53:d6:23:e5:66:90:3c:32:8c:82:b7:fc:fd:f8:28:
                    2b:53:2d:5c:cb:df:2f:17:d0:f3:bc:26:d2:42:3d:
                    c0:b1:55:61:50:ff:18
                ASN1 OID: secp384r1
                NIST CURVE: P-384
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage: 
                Code Signing
            X509v3 Subject Key Identifier: 
                6C:F0:C0:63:B8:3D:BB:08:90:C3:03:45:FF:55:92:43:7D:47:19:38
            X509v3 Authority Key Identifier: 
                DF:D3:E9:CF:56:24:11:96:F9:A8:D8:E9:28:55:A2:C6:2E:18:64:3F
            X509v3 Subject Alternative Name: critical
                email:mail@kushaldas.in
            1.3.6.1.4.1.57264.1.1: 
                https://github.com/login/oauth
            CT Precertificate SCTs: 
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 08:60:92:F0:28:52:FF:68:45:D1:D1:6B:27:84:9C:45:
                                67:18:AC:16:3D:C3:38:D2:6D:E6:BC:22:06:36:6F:72
                    Timestamp : Jul  5 14:45:23.112 2022 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:46:02:21:00:AB:A6:ED:59:3E:B7:C4:79:11:6A:92:
                                29:92:BF:54:45:6A:B6:1F:6F:1C:63:7C:D9:89:26:D4:
                                6B:EF:E3:3E:9F:02:21:00:AD:87:A7:BA:BA:7C:61:D2:
                                53:34:E0:D0:C4:BF:6A:6E:28:B4:02:82:AA:F8:FD:0B:
                                FB:3A:CD:B9:33:3D:F4:36
    Signature Algorithm: ecdsa-with-SHA384
    Signature Value:
        30:65:02:30:17:89:76:ef:a1:0e:97:5b:a3:fe:c0:34:13:36:
        3f:6f:2a:ba:e9:cd:bd:f2:74:d9:8c:13:2a:88:c9:96:b2:72:
        de:34:44:95:41:f8:b0:69:5b:f0:86:a7:05:cf:81:7f:02:31:
        00:d8:3a:12:89:39:4b:2c:ad:ff:5a:23:85:d9:c0:73:f0:b1:
        db:5c:65:f9:5d:ee:7a:bb:b8:08:01:44:7a:2e:9f:ba:2b:4b:
        df:6a:93:08:e9:44:2c:23:88:66:2c:f7:8f

Verifying the signature

We can verify the signature, just make sure that the certificate & signature files are in the same directory.

$ python -m sigstore verify message.txt 
OK: message.txt

Now, to test this with some real software releases, we will download the cosign RPM package and related certificate & signature files. The certificate in this case, is base64 encoded, so we decode that file first.

$ curl -sOL https://github.com/sigstore/cosign/releases/download/v1.9.0/cosign-1.9.0.x86_64.rpm
$ curl -sOL https://github.com/sigstore/cosign/releases/download/v1.9.0/cosign-1.9.0.x86_64.rpm-keyless.sig
$ curl -sOL https://github.com/sigstore/cosign/releases/download/v1.9.0/cosign-1.9.0.x86_64.rpm-keyless.pem
$ base64 -d cosign-1.9.0.x86_64.rpm-keyless.pem > cosign-1.9.0.x86_64.rpm.pem

Now let us verify the downloaded RPM package along with the email address and signing OIDC issuer URL. We are also printing the debug statements, so that we can see what is actually happening for verification.

$ SIGSTORE_LOGLEVEL=debug python -m sigstore verify --certificate cosign-1.9.0.x86_64.rpm.pem --signature cosign-1.9.0.x86_64.rpm-keyless.sig --cert-email keyless@projectsigstore.iam.gserviceaccount.com --cert-oidc-issuer https://accounts.google.com  cosign-1.9.0.x86_64.rpm

DEBUG:sigstore._cli:parsed arguments Namespace(subcommand='verify', certificate=PosixPath('cosign-1.9.0.x86_64.rpm.pem'), signature=PosixPath('cosign-1.9.0.x86_64.rpm-keyless.sig'), cert_email='keyless@projectsigstore.iam.gserviceaccount.com', cert_oidc_issuer='https://accounts.google.com', rekor_url='https://rekor.sigstore.dev', staging=False, files=[PosixPath('cosign-1.9.0.x86_64.rpm')])
DEBUG:sigstore._cli:Using certificate from: cosign-1.9.0.x86_64.rpm.pem
DEBUG:sigstore._cli:Using signature from: cosign-1.9.0.x86_64.rpm-keyless.sig
DEBUG:sigstore._cli:Verifying contents from: cosign-1.9.0.x86_64.rpm
DEBUG:sigstore._verify:Successfully verified signing certificate validity...
DEBUG:sigstore._verify:Successfully verified signature...
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): rekor.sigstore.dev:443
DEBUG:urllib3.connectionpool:https://rekor.sigstore.dev:443 "POST /api/v1/index/retrieve/ HTTP/1.1" 200 69
DEBUG:urllib3.connectionpool:https://rekor.sigstore.dev:443 "GET /api/v1/log/entries/9ee91f2c5444e4ff77a3a18885f46fa2b6f7e629450904d67b5920333327b90d HTTP/1.1" 200 None
DEBUG:sigstore._verify:Successfully verified Rekor entry...
OK: cosign-1.9.0.x86_64.rpm

Oh, one more important thing. The maintainers of the tool are amazing about feedback. I had some trouble initially (a few weeks ago). They sat down with me to make sure that they could understand the problem & also solved the issue I had. You can talk to the team (and other users, including me) in the slack room.

...
@blog July 2, 2022 - 00:00 • 4 days ago
New Alpha Release: Tor Browser 11.5a13 (Android, Windows, macOS, Linux)

Tor Browser 11.5a13 is now available from the Tor Browser download page and also from our distribution directory.

Tor Browser 11.5a13 updates Firefox on Windows, macOS, and Linux to 91.10.0esr.

We use the opportunity as well to update various other components of Tor Browser:

  • NoScript 11.4.6
  • Tor Launcher 0.2.36
  • Tor 0.4.7.8

This version includes important security updates to Firefox.

The full changelog since Tor Browser 11.5a12 is:

...
@blog July 2, 2022 - 00:00 • 4 days ago
New Release: Tor Browser 11.0.15 (Android, Windows, macOS, Linux)
...
@blog June 24, 2022 - 00:00 • 12 days ago
Arti 0.5.0 is released: Robustness and API improvements

Arti is our ongoing project to create a working embeddable Tor client in Rust. It’s not ready to replace the main Tor implementation in C, but we believe that it’s the future.

Right now, our focus is on making Arti production-quality, by stress-testing the code, hunting for likely bugs, and adding missing features that we know from experience that users will need. We're going to try not to break backward compatibility too much, but we'll do so when we think it's a good idea.

What's new in 0.5.0?

For a complete list of changes, have a look at the CHANGELOG.

This release adds cryptographic acceleration, a useful set of toplevel build features, reachable-address filtering, detection for failed directory downloads, and numerous cleanups.

There are also a bunch of smaller features, bugfixes, and infrastructure improvements; again, see the CHANGELOG for a more complete list.

Note that for the first time ever, we managed to go a whole release cycle without breaking source-compatibility in our arti-client APIs: for this reason, the latest version of arti-client is 0.4.1.

And what's next?

In the short term, we're working for feature-parity with Tor in netflow resistance and congestion control. We already have some backend code, but a complete implementation will take a while.

Beyond that, between now and our 1.0.0 milestone in September, we're aiming to make Arti a production-quality Tor client for direct internet access. (Onion services aren't funded yet, but we hope to change that soon.)

To do so, we need to bring Arti up to par with the C tor implementation in terms of its CPU usage, and security features. You can follow our progress on our 1.0.0 milestone.

We still plan to continue regular releases between now and then.

Here's how to try it out

We rely on users and volunteers to find problems in our software and suggest directions for its improvement. Although Arti isn't yet ready for production use, you can test it as a SOCKS proxy (if you're willing to compile from source) and as an embeddable library (if you don't mind a little API instability).

Assuming you've installed Arti (with cargo install arti, or directly from a cloned repository), you can use it to start a simple SOCKS proxy for making connections via Tor with:

$ arti proxy -p 9150

and use it more or less as you would use the C Tor implementation!

(It doesn't support onion services yet. If compilation doesn't work, make sure you have development files for libsqlite installed on your platform.)

If you want to build a program with Arti, you probably want to start with the arti-client crate. Be sure to check out the examples too.

For more information, check out the README file. (For now, it assumes that you're comfortable building Rust programs from the command line). Our CONTRIBUTING file has more information on installing development tools, and on using Arti inside of Tor Browser. (If you want to try that, please be aware that Arti doesn't support onion services yet.)

When you find bugs, please report them on our bugtracker. You can request an account or report a bug anonymously.

And if this documentation doesn't make sense, please ask questions! The questions you ask today might help improve the documentation tomorrow.

Call for feedback

Our priority for the coming months is to make Arti a production-quality Tor client, for the purposes of direct connections to the internet. (Onion services will come later.) We know some of the steps we'll need to take to get there, but not all of them: we need to know what's missing for your use-cases.

Whether you're a user or a developer, please give Arti a try, and let us know what you think. The sooner we learn what you need, the better our chances of getting it into an early milestone.

Acknowledgments

Thanks to everybody who has contributed to this release, including 0x4ndy, Alex Xu, Arturo Marquez, Dimitris Apostolou, Michael McCune, Neel Chauhan, Orhun Parmaksız, Steven Murdoch, and Trinity Pointard.

And thanks, of course, to Zcash Community Grants (formerly Zcash Open Major Grants (ZOMG)) for funding this project!

...
@kushal June 18, 2022 - 07:16 • 18 days ago
Tor 0.4.7.8 is ready

Last night I built and pushed the Tor RPM(s) for 0.4.7.8. This is a security update, so please make sure that you upgrade your relays and bridges.

You can know more about the Tor's RPM respository at https://support.torproject.org/rpm/

If you have any queries, feel free to find us over #tor channel on OFTC.

...
@anarcat June 17, 2022 - 15:34 • 18 days ago
Matrix notes

I have some concerns about Matrix (the protocol, not the movie that came out recently, although I do have concerns about that as well). I've been watching the project for a long time, and it seems more a promising alternative to many protocols like IRC, XMPP, and Signal.

This review may sound a bit negative, because it focuses on those concerns. I am the operator of an IRC network and people keep asking me to bridge it with Matrix. I have myself considered just giving up on IRC and converting to Matrix. This space is a living document exploring my research of that problem space. The TL;DR: is that no, I'm not setting up a bridge just yet, and I'm still on IRC.

This article was written over the course of the last three months, but I have been watching the Matrix project for years (my logs seem to say 2016 at least). The article is rather long. It will likely take you half an hour to read, so copy this over to your ebook reader, your tablet, or dead trees, and lean back and relax as I show you around the Matrix. Or, alternatively, just jump to a section that interest you, most likely the conclusion.

Introduction to Matrix

Matrix is an "open standard for interoperable, decentralised, real-time communication over IP. It can be used to power Instant Messaging, VoIP/WebRTC signalling, Internet of Things communication - or anywhere you need a standard HTTP API for publishing and subscribing to data whilst tracking the conversation history".

It's also (when compared with XMPP) "an eventually consistent global JSON database with an HTTP API and pubsub semantics - whilst XMPP can be thought of as a message passing protocol."

According to their FAQ, the project started in 2014, has about 20,000 servers, and millions of users. Matrix works over HTTPS but over a special port: 8448.

Security and privacy

I have some concerns about the security promises of Matrix. It's advertised as a "secure" with "E2E [end-to-end] encryption", but how does it actually work?

Data retention defaults

One of my main concerns with Matrix is data retention, which is a key part of security in a threat model where (for example) an hostile state actor wants to surveil your communications and can seize your devices.

On IRC, servers don't actually keep messages all that long: they pass them along to other servers and clients as fast as they can, only keep them in memory, and move on to the next message. There are no concerns about data retention on messages (and their metadata) other than the network layer. (I'm ignoring the issues with user registration, which is a separate, if valid, concern.) Obviously, an hostile server could log everything passing through it, but IRC federations are normally tightly controlled. So, if you trust your IRC operators, you should be fairly safe. Obviously, clients can (and often do, even if OTR is configured!) log all messages, but this is generally not the default. Irssi, for example, does not log by default. IRC bouncers are more likely to log to disk, of course, to be able to do what they do.

Compare this to Matrix: when you send a message to a Matrix homeserver, that server first stores it in its internal SQL database. Then it will transmit that message to all clients connected to that server and room, and to all other servers that have clients connected to that room. Those remote servers, in turn, will keep a copy of that message and all its metadata in their own database, by default forever. On encrypted rooms those messages are encrypted, but not their metadata.

There is a mechanism to expire entries in Synapse, but it is not enabled by default. So one should generally assume that a message sent on Matrix is never expired.

GDPR in the federation

But even if that setting was enabled by default, how do you control it? This is a fundamental problem of the federation: if any user is allowed to join a room (which is the default), those user's servers will log all content and metadata from that room. That includes private, one-on-one conversations, since those are essentially rooms as well.

In the context of the GDPR, this is really tricky: who is the responsible party (known as the "data controller") here? It's basically any yahoo who fires up a home server and joins a room.

In a federated network, one has to wonder whether GDPR enforcement is even possible at all. But in Matrix in particular, if you want to enforce your right to be forgotten in a given room, you would have to:

  1. enumerate all the users that ever joined the room while you were there
  2. discover all their home servers
  3. start a GDPR procedure against all those servers

I recognize this is a hard problem to solve while still keeping an open ecosystem. But I believe that Matrix should have much stricter defaults towards data retention than right now. Message expiry should be enforced by default, for example. (Note that there are also redaction policies that could be used to implement part of the GDPR automatically, see the privacy policy discussion below on that.)

Also keep in mind that, in the brave new peer-to-peer world that Matrix is heading towards, the boundary between server and client is likely to be fuzzier, which would make applying the GDPR even more difficult.

Update: this comment links to this post (in german) which apparently studied the question and concluded that Matrix is not GDPR-compliant.

In fact, maybe Synapse should be designed so that there's no configurable flag to turn off data retention. A bit like how most system loggers in UNIX (e.g. syslog) come with a log retention system that typically rotate logs after a few weeks or month. Historically, this was designed to keep hard drives from filling up, but it also has the added benefit of limiting the amount of personal information kept on disk in this modern day. (Arguably, syslog doesn't rotate logs on its own, but, say, Debian GNU/Linux, as an installed system, does have log retention policies well defined for installed packages, and those can be discussed. And "no expiry" is definitely a bug.

Matrix.org privacy policy

When I first looked at Matrix, five years ago, Element.io was called Riot.im and had a rather dubious privacy policy:

We currently use cookies to support our use of Google Analytics on the Website and Service. Google Analytics collects information about how you use the Website and Service.

[...]

This helps us to provide you with a good experience when you browse our Website and use our Service and also allows us to improve our Website and our Service.

When I asked Matrix people about why they were using Google Analytics, they explained this was for development purposes and they were aiming for velocity at the time, not privacy (paraphrasing here).

They also included a "free to snitch" clause:

If we are or believe that we are under a duty to disclose or share your personal data, we will do so in order to comply with any legal obligation, the instructions or requests of a governmental authority or regulator, including those outside of the UK.

Those are really broad terms, above and beyond what is typically expected legally.

Like the current retention policies, such user tracking and ... "liberal" collaboration practices with the state set a bad precedent for other home servers.

Thankfully, since the above policy was published (2017), the GDPR was "implemented" (2018) and it seems like both the Element.io privacy policy and the Matrix.org privacy policy have been somewhat improved since.

Notable points of the new privacy policies:

  • 2.3.1.1: the "federation" section actually outlines that "Federated homeservers and Matrix clients which respect the Matrix protocol are expected to honour these controls and redaction/erasure requests, but other federated homeservers are outside of the span of control of Element, and we cannot guarantee how this data will be processed"
  • 2.6: users under the age of 16 should not use the matrix.org service
  • 2.10: Upcloud, Mythic Beast, Amazon, and CloudFlare possibly have access to your data (it's nice to at least mention this in the privacy policy: many providers don't even bother admitting to this kind of delegation)
  • Element 2.2.1: mentions many more third parties (Twilio, Stripe, Quaderno, LinkedIn, Twitter, Google, Outplay, PipeDrive, HubSpot, Posthog, Sentry, and Matomo (phew!) used when you are paying Matrix.org for hosting

I'm not super happy with all the trackers they have on the Element platform, but then again you don't have to use that service. Your favorite homeserver (assuming you are not on Matrix.org) probably has their own Element deployment, hopefully without all that garbage.

Overall, this is all a huge improvement over the previous privacy policy, so hats off to the Matrix people for figuring out a reasonable policy in such a tricky context. I particularly like this bit:

We will forget your copy of your data upon your request. We will also forward your request to be forgotten onto federated homeservers. However - these homeservers are outside our span of control, so we cannot guarantee they will forget your data.

It's great they implemented those mechanisms and, after all, if there's an hostile party in there, nothing can prevent them from using screenshots to just exfiltrate your data away from the client side anyways, even with services typically seen as more secure, like Signal.

As an aside, I also appreciate that Matrix.org has a fairly decent code of conduct, based on the TODO CoC which checks all the boxes in the geekfeminism wiki.

Metadata handling

Overall, privacy protections in Matrix mostly concern message contents, not metadata. In other words, who's talking with who, when and from where is not well protected. Compared to a tool like Signal, which goes through great lengths to anonymize that data with features like private contact discovery, disappearing messages, sealed senders, and private groups, Matrix is definitely behind. (Note: there is an issue open about message lifetimes in Element since 2020, but it's not at even at the MSC stage yet.)

This is a known issue (opened in 2019) in Synapse, but this is not just an implementation issue, it's a flaw in the protocol itself. Home servers keep join/leave of all rooms, which gives clear text information about who is talking to. Synapse logs may also contain privately identifiable information that home server admins might not be aware of in the first place. Those log rotation policies are separate from the server-level retention policy, which may be confusing for a novice sysadmin.

Combine this with the federation: even if you trust your home server to do the right thing, the second you join a public room with third-party home servers, those ideas kind of get thrown out because those servers can do whatever they want with that information. Again, a problem that is hard to solve in any federation.

To be fair, IRC doesn't have a great story here either: any client knows not only who's talking to who in a room, but also typically their client IP address. Servers can (and often do) obfuscate this, but often that obfuscation is trivial to reverse. Some servers do provide "cloaks" (sometimes automatically), but that's kind of a "slap-on" solution that actually moves the problem elsewhere: now the server knows a little more about the user.

Overall, I would worry much more about a Matrix home server seizure than a IRC or Signal server seizure. Signal does get subpoenas, and they can only give out a tiny bit of information about their users: their phone number, and their registration, and last connection date. Matrix carries a lot more information in its database.

Amplification attacks on URL previews

I (still!) run an Icecast server and sometimes share links to it on IRC which, obviously, also ends up on (more than one!) Matrix home servers because some people connect to IRC using Matrix. This, in turn, means that Matrix will connect to that URL to generate a link preview.

I feel this outlines a security issue, especially because those sockets would be kept open seemingly forever. I tried to warn the Matrix security team but somehow, I don't think this issue was taken very seriously. Here's the disclosure timeline:

  • January 18: contacted Matrix security
  • January 19: response: already reported as a bug
  • January 20: response: can't reproduce
  • January 31: timeout added, considered solved
  • January 31: I respond that I believe the security issue is underestimated, ask for clearance to disclose
  • February 1: response: asking for two weeks delay after the next release (1.53.0) including another patch, presumably in two weeks' time
  • February 22: Matrix 1.53.0 released
  • April 14: I notice the release, ask for clearance again
  • April 14: response: referred to the public disclosure

There are a couple of problems here:

  1. the bug was publicly disclosed in September 2020, and not considered a security issue until I notified them, and even then, I had to insist

  2. no clear disclosure policy timeline was proposed or seems established in the project (there is a security disclosure policy but it doesn't include any predefined timeline)

  3. I wasn't informed of the disclosure

  4. the actual solution is a size limit (10MB, already implemented), a time limit (30 seconds, implemented in PR 11784), and a content type allow list (HTML, "media" or JSON, implemented in PR 11936), and I'm not sure it's adequate

  5. (pure vanity:) I did not make it to their Hall of fame

I'm not sure those solutions are adequate because they all seem to assume a single home server will pull that one URL for a little while then stop. But in a federated network, many (possibly thousands) home servers may be connected in a single room at once. If an attacker drops a link into such a room, all those servers would connect to that link all at once. This is an amplification attack: a small amount of traffic will generate a lot more traffic to a single target. It doesn't matter there are size or time limits: the amplification is what matters here.

It should also be noted that clients that generate link previews have more amplification because they are more numerous than servers. And of course, the default Matrix client (Element) does generate link previews as well.

That said, this is possibly not a problem specific to Matrix: any federated service that generates link previews may suffer from this.

I'm honestly not sure what the solution is here. Maybe moderation? Maybe link previews are just evil? All I know is there was this weird bug in my Icecast server and I tried to ring the bell about it, and it feels it was swept under the rug. Somehow I feel this is bound to blow up again in the future, even with the current mitigation.

Moderation

In Matrix like elsewhere, Moderation is a hard problem. There is a detailed moderation guide and much of this problem space is actively worked on in Matrix right now. A fundamental problem with moderating a federated space is that a user banned from a room can rejoin the room from another server. This is why spam is such a problem in Email, and why IRC networks have stopped federating ages ago (see the IRC history for that fascinating story).

The mjolnir bot

The mjolnir moderation bot is designed to help with some of those things. It can kick and ban users, redact all of a user's message (as opposed to one by one), all of this across multiple rooms. It can also subscribe to a federated block list published by matrix.org to block known abusers (users or servers). Bans are pretty flexible and can operate at the user, room, or server level.

Matrix people suggest making the bot admin of your channels, because you can't take back admin from a user once given.

The command-line tool

There's also a new command line tool designed to do things like:

  • System notify users (all users/users from a list, specific user)
  • delete sessions/devices not seen for X days
  • purge the remote media cache
  • select rooms with various criteria (external/local/empty/created by/encrypted/cleartext)
  • purge history of theses rooms
  • shutdown rooms

This tool and Mjolnir are based on the admin API built into Synapse.

Rate limiting

Synapse has pretty good built-in rate-limiting which blocks repeated login, registration, joining, or messaging attempts. It may also end up throttling servers on the federation based on those settings.

Fundamental federation problems

Because users joining a room may come from another server, room moderators are at the mercy of the registration and moderation policies of those servers. Matrix is like IRC's +R mode ("only registered users can join") by default, except that anyone can register their own homeserver, which makes this limited.

Server admins can block IP addresses and home servers, but those tools are not easily available to room admins. There is an API (m.room.server_acl in /devtools) but it is not reliable (thanks Austin Huang for the clarification).

Matrix has the concept of guest accounts, but it is not used very much, and virtually no client or homeserver supports it. This contrasts with the way IRC works: by default, anyone can join an IRC network even without authentication. Some channels require registration, but in general you are free to join and look around (until you get blocked, of course).

I have seen anecdotal evidence (CW: Twitter, nitter link) that "moderating bridges is hell", and I can imagine why. Moderation is already hard enough on one federation, when you bridge a room with another network, you inherit all the problems from that network but without the entire abuse control tools from the original network's API...

Room admins

Matrix, in particular, has the problem that room administrators (which have the power to redact messages, ban users, and promote other users) are bound to their Matrix ID which is, in turn, bound to their home servers. This implies that a home server administrators could (1) impersonate a given user and (2) use that to hijack the room. So in practice, the home server is the trust anchor for rooms, not the user themselves.

That said, if server B administrator hijack user joe on server B, they will hijack that room on that specific server. This will not (necessarily) affect users on the other servers, as servers could refuse parts of the updates or ban the compromised account (or server).

It does seem like a major flaw that room credentials are bound to Matrix identifiers, as opposed to the E2E encryption credentials. In an encrypted room even with fully verified members, a compromised or hostile home server can still take over the room by impersonating an admin. That admin (or even a newly minted user) can then send events or listen on the conversations.

This is even more frustrating when you consider that Matrix events are actually signed and therefore have some authentication attached to them, acting like some sort of Merkle tree (as it contains a link to previous events). That signature, however, is made from the homeserver PKI keys, not the client's E2E keys, which makes E2E feel like it has been "bolted on" later.

Availability

While Matrix has a strong advantage over Signal in that it's decentralized (so anyone can run their own homeserver,), I couldn't find an easy way to run a "multi-primary" setup, or even a "redundant" setup (even if with a single primary backend), short of going full-on "replicate PostgreSQL and Redis data", which is not typically for the faint of heart.

How this works in IRC

On IRC, it's quite easy to setup redundant nodes. All you need is:

  1. a new machine (with it's own public address with an open port)

  2. a shared secret (or certificate) between that machine and an existing one on the network

  3. a connect {} block on both servers

That's it: the node will join the network and people can connect to it as usual and share the same user/namespace as the rest of the network. The servers take care of synchronizing state: you do not need to worry about replicating a database server.

(Now, experienced IRC people will know there's a catch here: IRC doesn't have authentication built in, and relies on "services" which are basically bots that authenticate users (I'm simplifying, don't nitpick). If that service goes down, the network still works, but then people can't authenticate, and they can start doing nasty things like steal people's identity if they get knocked offline. But still: basic functionality still works: you can talk in rooms and with users that are on the reachable network.)

User identities

Matrix is more complicated. Each "home server" has its own identity namespace: a specific user (say @anarcat:matrix.org) is bound to that specific home server. If that server goes down, that user is completely disconnected. They could register a new account elsewhere and reconnect, but then they basically lose all their configuration: contacts, joined channels are all lost.

(Also notice how the Matrix IDs don't look like a typical user address like an email in XMPP. They at least did their homework and got the allocation for the scheme.)

Rooms

Users talk to each other in "rooms", even in one-to-one communications. (Rooms are also used for other things like "spaces", they're basically used for everything, think "everything is a file" kind of tool.) For rooms, home servers act more like IRC nodes in that they keep a local state of the chat room and synchronize it with other servers. Users can keep talking inside a room if the server that originally hosts the room goes down. Rooms can have a local, server-specific "alias" so that, say, #room:matrix.org is also visible as #room:example.com on the example.com home server. Both addresses refer to the same room underlying room.

(Finding this in the Element settings is not obvious though, because that "alias" are actually called a "local address" there. So to create such an alias (in Element), you need to go in the room settings' "General" section, "Show more" in "Local address", then add the alias name (e.g. foo), and then that room will be available on your example.com homeserver as #foo:example.com.)

So a room doesn't belong to a server, it belongs to the federation, and anyone can join the room from any serer (if the room is public, or if invited otherwise). You can create a room on server A and when a user from server B joins, the room will be replicated on server B as well. If server A fails, server B will keep relaying traffic to connected users and servers.

A room is therefore not fundamentally addressed with the above alias, instead ,it has a internal Matrix ID, which basically a random string. It has a server name attached to it, but that was made just to avoid collisions. That can get a little confusing. For example, the #fractal:gnome.org room is an alias on the gnome.org server, but the room ID is !hwiGbsdSTZIwSRfybq:matrix.org. That's because the room was created on matrix.org, but the preferred branding is gnome.org now.

As an aside, rooms, by default, live forever, even after the last user quits. There's an admin API to delete rooms and a tombstone event to redirect to another one, but neither have a GUI yet. The latter is part of MSC1501 ("Room version upgrades") which allows a room admin to close a room, with a message and a pointer to another room.

Spaces

Discovering rooms can be tricky: there is a per-server room directory, but Matrix.org people are trying to deprecate it in favor of "Spaces". Room directories were ripe for abuse: anyone can create a room, so anyone can show up in there. It's possible to restrict who can add aliases, but anyways directories were seen as too limited.

In contrast, a "Space" is basically a room that's an index of other rooms (including other spaces), so existing moderation and administration mechanism that work in rooms can (somewhat) work in spaces as well. This enables a room directory that works across federation, regardless on which server they were originally created.

New users can be added to a space or room automatically in Synapse. (Existing users can be told about the space with a server notice.) This gives admins a way to pre-populate a list of rooms on a server, which is useful to build clusters of related home servers, providing some sort of redundancy, at the room -- not user -- level.

Home servers

So while you can workaround a home server going down at the room level, there's no such thing at the home server level, for user identities. So if you want those identities to be stable in the long term, you need to think about high availability. One limitation is that the domain name (e.g. matrix.example.com) must never change in the future, as renaming home servers is not supported.

The documentation used to say you could "run a hot spare" but that has been removed. Last I heard, it was not possible to run a high-availability setup where multiple, separate locations could replace each other automatically. You can have high performance setups where the load gets distributed among workers, but those are based on a shared database (Redis and PostgreSQL) backend.

So my guess is it would be possible to create a "warm" spare server of a matrix home server with regular PostgreSQL replication, but that is not documented in the Synapse manual. This sort of setup would also not be useful to deal with networking issues or denial of service attacks, as you will not be able to spread the load over multiple network locations easily. Redis and PostgreSQL heroes are welcome to provide their multi-primary solution in the comments. In the meantime, I'll just point out this is a solution that's handled somewhat more gracefully in IRC, by having the possibility of delegating the authentication layer.

Delegations

If you do not want to run a Matrix server yourself, it's possible to delegate the entire thing to another server. There's a server discovery API which uses the .well-known pattern (or SRV records, but that's "not recommended" and a bit confusing) to delegate that service to another server. Be warned that the server still needs to be explicitly configured for your domain. You can't just put:

{ "m.server": "matrix.org:443" }

... on https://example.com/.well-known/matrix/server and start using @you:example.com as a Matrix ID. That's because Matrix doesn't support "virtual hosting" and you'd still be connecting to rooms and people with your matrix.org identity, not example.com as you would normally expect. This is also why you cannot rename your home server.

The server discovery API is what allows servers to find each other. Clients, on the other hand, use the client-server discovery API: this is what allows a given client to find your home server when you type your Matrix ID on login.

Performance

The high availability discussion brushed over the performance of Matrix itself, but let's now dig into that.

Horizontal scalability

There were serious scalability issues of the main Matrix server, Synapse, in the past. So the Matrix team has been working hard to improve its design. Since Synapse 1.22 the home server can horizontally scale to multiple workers (see this blog post for details) which can make it easier to scale large servers.

Other implementations

There are other promising home servers implementations from a performance standpoint (dendrite, Golang, entered beta in late 2020; conduit, Rust, beta; others), but none of those are feature-complete so there's a trade-off to be made there. Synapse is also adding a lot of feature fast, so it's an open question whether the others will ever catch up. (I have heard that Dendrite might actually surpass Synapse in features within a few years, which would put Synapse in a more "LTS" situation.)

Latency

Matrix can feel slow sometimes. For example, joining the "Matrix HQ" room in Element (from matrix.debian.social) takes a few minutes and then fails. That is because the home server has to sync the entire room state when you join the room. There was promising work on this announced in the lengthy 2021 retrospective, and some of that work landed (partial sync) in the 1.53 release already. Other improvements coming include sliding sync, lazy loading over federation, and fast room joins. So that's actually something that could be fixed in the fairly short term.

But in general, communication in Matrix doesn't feel as "snappy" as on IRC or even Signal. It's hard to quantify this without instrumenting a full latency test bed (for example the tools I used in the terminal emulators latency tests), but even just typing in a web browser feels slower than typing in a xterm or Emacs for me.

Even in conversations, I "feel" people don't immediately respond as fast. In fact, this could be an interesting double-blind experiment to make: have people guess whether they are talking to a person on Matrix, XMPP, or IRC, for example. My theory would be that people could notice that Matrix users are slower, if only because of the TCP round-trip time each message has to take.

Transport

Some courageous person actually made some tests of various messaging platforms on a congested network. His evaluation was basically:

  • Briar: uses Tor, so unusable except locally
  • Matrix: "struggled to send and receive messages", joining a room takes forever as it has to sync all history, "took 20-30 seconds for my messages to be sent and another 20 seconds for further responses"
  • XMPP: "worked in real-time, full encryption, with nearly zero lag"

So that was interesting. I suspect IRC would have also fared better, but that's just a feeling.

Other improvements to the transport layer include support for websocket and the CoAP proxy work from 2019 (targeting 100bps links), but both seem stalled at the time of writing. The Matrix people have also announced the pinecone p2p overlay network which aims at solving large, internet-scale routing problems. See also this talk at FOSDEM 2022.

Usability

Onboarding and workflow

The workflow for joining a room, when you use Element web, is not great:

  1. click on a link in a web browser
  2. land on (say) https://matrix.to/#/#matrix-dev:matrix.org
  3. offers "Element", yeah that's sounds great, let's click "Continue"
  4. land on https://app.element.io/#/room%2F%23matrix-dev%3Amatrix.org and then you need to register, aaargh

As you might have guessed by now, there is a specification to solve this, but web browsers need to adopt it as well, so that's far from actually being solved. At least browsers generally know about the matrix: scheme, it's just not exactly clear what they should do with it, especially when the handler is just another web page (e.g. Element web).

In general, when compared with tools like Signal or WhatsApp, Matrix doesn't fare so well in terms of user discovery. I probably have some of my normal contacts that have a Matrix account as well, but there's really no way to know. It's kind of creepy when Signal tells you "this person is on Signal!" but it's also pretty cool that it works, and they actually implemented it pretty well.

Registration is also less obvious: in Signal, the app confirms your phone number automatically. It's friction-less and quick. In Matrix, you need to learn about home servers, pick one, register (with a password! aargh!), and then setup encryption keys (not default), etc. It's a lot more friction.

And look, I understand: giving away your phone number is a huge trade-off. I don't like it either. But it solves a real problem and makes encryption accessible to a ton more people. Matrix does have "identity servers" that can serve that purpose, but I don't feel confident sharing my phone number there. It doesn't help that the identity servers don't have private contact discovery: giving them your phone number is a more serious security compromise than with Signal.

There's a catch-22 here too: because no one feels like giving away their phone numbers, no one does, and everyone assumes that stuff doesn't work anyways. Like it or not, Signal forcing people to divulge their phone number actually gives them critical mass that means actually a lot of my relatives are on Signal and I don't have to install crap like WhatsApp to talk with them.

5 minute clients evaluation

Throughout all my tests I evaluated a handful of Matrix clients, mostly from Flathub because almost none of them are packaged in Debian.

Right now I'm using Element, the flagship client from Matrix.org, in a web browser window, with the PopUp Window extension. This makes it look almost like a native app, and opens links in my main browser window (instead of a new tab in that separate window), which is nice. But I'm tired of buying memory to feed my web browser, so this indirection has to stop. Furthermore, I'm often getting completely logged off from Element, which means re-logging in, recovering my security keys, and reconfiguring my settings. That is extremely annoying.

Coming from Irssi, Element is really "GUI-y" (pronounced "gooey"). Lots of clickety happening. To mark conversations as read, in particular, I need to click-click-click on all the tabs that have some activity. There's no "jump to latest message" or "mark all as read" functionality as far as I could tell. In Irssi the former is built-in (alt-a) and I made a custom /READ command for the latter:

/ALIAS READ script exec \$_->activity(0) for Irssi::windows

And yes, that's a Perl script in my IRC client. I am not aware of any Matrix client that does stuff like that, except maybe Weechat, if we can call it a Matrix client, or Irssi itself, now that it has a Matrix plugin (!).

As for other clients, I have looked through the Matrix Client Matrix (confusing right?) to try to figure out which one to try, and, even after selecting Linux as a filter, the chart is just too wide to figure out anything. So I tried those, kind of randomly:

  • Fractal
  • Mirage
  • Nheko
  • Quaternion

Unfortunately, I lost my notes on those, I don't actually remember which one did what. I still have a session open with Mirage, so I guess that means it's the one I preferred, but I remember they were also all very GUI-y.

Maybe I need to look at weechat-matrix or gomuks. At least Weechat is scriptable so I could continue playing the power-user. Right now my strategy with messaging (and that includes microblogging like Twitter or Mastodon) is that everything goes through my IRC client, so Weechat could actually fit well in there. Going with gomuks, on the other hand, would mean running it in parallel with Irssi or ... ditching IRC, which is a leap I'm not quite ready to take just yet.

Oh, and basically none of those clients (except Nheko and Element) support VoIP, which is still kind of a second-class citizen in Matrix. It does not support large multimedia rooms, for example: Jitsi was used for FOSDEM instead of the native videoconferencing system.

Bots

This falls a little aside the "usability" section, but I didn't know where to put this... There's a few Matrix bots out there, and you are likely going to be able to replace your existing bots with Matrix bots. It's true that IRC has a long and impressive history with lots of various bots doing various things, but given how young Matrix is, there's still a good variety:

  • maubot: generic bot with tons of usual plugins like sed, dice, karma, xkcd, echo, rss, reminder, translate, react, exec, gitlab/github webhook receivers, weather, etc
  • opsdroid: framework to implement "chat ops" in Matrix, connects with Matrix, GitHub, GitLab, Shell commands, Slack, etc
  • matrix-nio: another framework, used to build lots more bots like:
    • hemppa: generic bot with various functionality like weather, RSS feeds, calendars, cron jobs, OpenStreetmaps lookups, URL title snarfing, wolfram alpha, astronomy pic of the day, Mastodon bridge, room bridging, oh dear
    • devops: ping, curl, etc
    • podbot: play podcast episodes from AntennaPod
    • cody: Python, Ruby, Javascript REPL
    • eno: generic bot, "personal assistant"
  • mjolnir: moderation bot
  • hookshot: bridge with GitLab/GitHub
  • matrix-monitor-bot: latency monitor

One thing I haven't found an equivalent for is Debian's MeetBot. There's an archive bot but it doesn't have topics or a meeting chair, or HTML logs.

Working on Matrix

As a developer, I find Matrix kind of intimidating. The specification is huge. The official specification itself looks somewhat digestable: it's only 6 APIs so that looks, at first, kind of reasonable. But whenever you start asking complicated questions about Matrix, you quickly fall into the Matrix Spec Change specification (which, yes, is a separate specification). And there are literally hundreds of MSCs flying around. It's hard to tell what's been adopted and what hasn't, and even harder to figure out if your specific client has implemented it.

(One trendy answer to this problem is to "rewrite it in rust": Matrix are working on implementing a lot of those specifications in a matrix-rust-sdk that's designed to take the implementation details away from users.)

Just taking the latest weekly Matrix report, you find that three new MSCs proposed, just last week! There's even a graph that shows the number of MSCs is progressing steadily, at 600+ proposals total, with the majority (300+) "new". I would guess the "merged" ones are at about 150.

That's a lot of text which includes stuff like 3D worlds which, frankly, I don't think you should be working on when you have such important security and usability problems. (The internet as a whole, arguably, doesn't fare much better. RFC600 is a really obscure discussion about "INTERFACING AN ILLINOIS PLASMA TERMINAL TO THE ARPANET". Maybe that's how many MSCs will end up as well, left forgotten in the pits of history.)

And that's the thing: maybe the Matrix people have a different objective than I have. They want to connect everything to everything, and make Matrix a generic transport for all sorts of applications, including virtual reality, collaborative editors, and so on.

I just want secure, simple messaging. Possibly with good file transfers, and video calls. That it works with existing stuff is good, and it should be federated to remove the "Signal point of failure". So I'm a bit worried with the direction all those MSCs are taking, especially when you consider that clients other than Element are still struggling to keep up with basic features like end-to-end encryption or room discovery, never mind voice or spaces...

Conclusion

Overall, Matrix is somehow in the space XMPP was a few years ago. It has a ton of features, pretty good clients, and a large community. It seems to have gained some of the momentum that XMPP has lost. It may have the most potential to replace Signal if something bad would happen to it (like, I don't know, getting banned or going nuts with cryptocurrency)...

But it's really not there yet, and I don't see Matrix trying to get there either, which is a bit worrisome.

Looking back at history

I'm also worried that we are repeating the errors of the past. The history of federated services is really fascinating:. IRC, FTP, HTTP, and SMTP were all created in the early days of the internet, and are all still around (except, arguably, FTP, which was removed from major browsers recently). All of them had to face serious challenges in growing their federation.

IRC had numerous conflicts and forks, both at the technical level but also at the political level. The history of IRC is really something that anyone working on a federated system should study in detail, because they are bound to make the same mistakes if they are not familiar with it. The "short" version is:

  • 1988: Finish researcher publishes first IRC source code
  • 1989: 40 servers worldwide, mostly universities
  • 1990: EFnet ("eris-free network") fork which blocks the "open relay", named Eris - followers of Eris form the A-net, which promptly dissolves itself, with only EFnet remaining
  • 1992: Undernet fork, which offered authentication ("services"), routing improvements and timestamp-based channel synchronisation
  • 1994: DALnet fork, from Undernet, again on a technical disagreement
  • 1995: Freenode founded
  • 1996: IRCnet forks from EFnet, following a flame war of historical proportion, splitting the network between Europe and the Americas
  • 1997: Quakenet founded
  • 1999: (XMPP founded)
  • 2001: 6 million users, OFTC founded
  • 2002: DALnet peaks at 136,000 users
  • 2003: IRC as a whole peaks at 10 million users, EFnet peaks at 141,000 users
  • 2004: (Facebook founded), Undernet peaks at 159,000 users
  • 2005: Quakenet peaks at 242,000 users, IRCnet peaks at 136,000 (Youtube founded)
  • 2006: (Twitter founded)
  • 2009: (WhatsApp, Pinterest founded)
  • 2010: (TextSecure AKA Signal, Instagram founded)
  • 2011: (Snapchat founded)
  • ~2013: Freenode peaks at ~100,000 users
  • 2016: IRCv3 standardisation effort started (TikTok founded)
  • 2021: Freenode self-destructs, Libera chat founded
  • 2022: Libera peaks at 50,000 users, OFTC peaks at 30,000 users

(The numbers were taken from the Wikipedia page and Netsplit.de. Note that I also include other networks launch in parenthesis for context.)

Pretty dramatic, don't you think? Eventually, somehow, IRC became irrelevant for most people: few people are even aware of it now. With less than a million users active, it's smaller than Mastodon, XMPP, or Matrix at this point.1 If I were to venture a guess, I'd say that infighting, lack of a standardization body, and a somewhat annoying protocol meant the network could not grow. It's also possible that the decentralised yet centralised structure of IRC networks limited their reliability and growth.

But large social media companies have also taken over the space: observe how IRC numbers peak around the time the wave of large social media companies emerge, especially Facebook (2.9B users!!) and Twitter (400M users).

Where the federated services are in history

Right now, Matrix, and Mastodon (and email!) are at the "pre-EFnet" stage: anyone can join the federation. Mastodon has started working on a global block list of fascist servers which is interesting, but it's still an open federation. Right now, Matrix is totally open, but matrix.org publishes a (federated) block list of hostile servers (#matrix-org-coc-bl:matrix.org, yes, of course it's a room).

Interestingly, Email is also in that stage, where there are block lists of spammers, and it's a race between those blockers and spammers. Large email providers, obviously, are getting closer to the EFnet stage: you could consider they only accept email from themselves or between themselves. It's getting increasingly hard to deliver mail to Outlook and Gmail for example, partly because of bias against small providers, but also because they are including more and more machine-learning tools to sort through email and those systems are, fundamentally, unknowable. It's not quite the same as splitting the federation the way EFnet did, but the effect is similar.

HTTP has somehow managed to live in a parallel universe, as it's technically still completely federated: anyone can start a web server if they have a public IP address and anyone can connect to it. The catch, of course, is how you find the darn thing. Which is how Google became one of the most powerful corporations on earth, and how they became the gatekeepers of human knowledge online.

I have only briefly mentioned XMPP here, and my XMPP fans will undoubtedly comment on that, but I think it's somewhere in the middle of all of this. It was co-opted by Facebook and Google, and both corporations have abandoned it to its fate. I remember fondly the days where I could do instant messaging with my contacts who had a Gmail account. Those days are gone, and I don't talk to anyone over Jabber anymore, unfortunately. And this is a threat that Matrix still has to face.

It's also the threat Email is currently facing. On the one hand corporations like Facebook want to completely destroy it and have mostly succeeded: many people just have an email account to register on things and talk to their friends over Instagram or (lately) TikTok (which, I know, is not Facebook, but they started that fire).

On the other hand, you have corporations like Microsoft and Google who are still using and providing email services — because, frankly, you still do need email for stuff, just like fax is still around — but they are more and more isolated in their own silo. At this point, it's only a matter of time they reach critical mass and just decide that the risk of allowing external mail coming in is not worth the cost. They'll simply flip the switch and work on an allow-list principle. Then we'll have closed the loop and email will be dead, just like IRC is "dead" now.

I wonder which path Matrix will take. Could it liberate us from these vicious cycles?

Update: this generated some discussions on lobste.rs.


  1. According to Wikipedia, there are currently about 500 distinct IRC networks operating, on about 1,000 servers, serving over 250,000 users. In contrast, Mastodon seems to be around 5 million users, Matrix.org claimed at FOSDEM 2021 to have about 28 million globally visible accounts, and Signal lays claim to over 40 million souls. XMPP claims to have "millions" of users on the xmpp.org homepage but the FAQ says they don't actually know. On the proprietary silo side of the fence, this page says

    • Facebook: 2.9 billion users
    • WhatsApp: 2B
    • Instagram: 1.4B
    • TikTok: 1B
    • Snapchat: 500M
    • Pinterest: 480M
    • Twitter: 397M

    Notable omission from that list: Youtube, with its mind-boggling 2.6 billion users...

    Those are not the kind of numbers you just "need to convince a brother or sister" to grow the network...

...
@ooni June 17, 2022 - 00:00 • 19 days ago
Measuring DoT/DoH Blocking Using OONI Probe: A Preliminary Study
When you enter a URL such as https://example.com/, under the hood, your web browser resolves the example.com domain to one or more IP addresses using the Domain Name System (DNS), a set of federated servers and protocols providing this name-to-IP-address mapping. For example, as of 2022-06-16, example.com resolves to the 93.184.216.34 (IPv4) and 2606:2800:220:1:248:1893:25c8:1946 (IPv6) addresses. Once it knows the IP addresses for the domain, the browser then uses them to fetch the requested webpage. ...
@blog June 16, 2022 - 00:00 • 20 days ago
The Tor Project 2020-2021 Annual Report

Hello Tor community,

On behalf of the team, I am thrilled to share with you the Tor Project's latest Annual Report!

In the last year, the Tor community has mobilized against increasing global censorship, more than doubled the number of bridges on the Tor network (!!), and responded to tens of thousands of user support requests. We've also worked hard to make Tor faster, improve the health of our network, and connect with our community through training and events.

One element of this year's work that inspires me, and shows the power of the Tor community, is the response to the internet censorship in Russia and Ukraine. The entire Tor community immediately jumped into action to keep people online. Seeing this passion in action, while keeping tens of thousands of Russians connected to the open internet, has been inspiring. This anonymous user shared their story about Tor, which underlines the importance of our anti-censorship efforts:

"Tor helped me a lot. Here in Russia, blocking on the Internet is extremely common... Tor helps me bypass blocking and get more privacy. For example, many wonderful websites, such as foreign services or the websites of the Russian opposition, have been blocked. I have been using Tor for many years... without it, many very important sources of useful information would be inaccessible, or accessible with great difficulty."

I hope you take a moment to read more about mobilizing against internet censorship in this year's annual report (page 4), alongside other achievements this year. Thank you for making Tor and the freedom it provides online possible.

Beyond the programatic accomplishments, I hope you also take a look at page 12, where we share the Tor Project's expenses and revenue for the 2020-2021 financial year, based on our audited 990 tax returns. We are proud to highlight that 87% of our expenses are releated to programmatic costs. That means that a significant majority of our expenses are directly releated to building Tor, improving Tor, and ensuring that Tor is accessible to everyone.

Please take a look at our Annual Report, let us know what you think about what we've shared, and while you're at it, check out the whole reports section of our website for previous year's reports!

...
@ooni June 16, 2022 - 00:00 • 20 days ago
A Quick Look at QUIC Censorship
This blog post was originally published by the Open Technology Fund to disseminate Kathrin Elmenhorst’s QUIC-and-HTTP/3 censorship research as part of her ICFP fellowship with OONI. Last year, the new network protocol QUIC was introduced. QUIC is a general-purpose transport layer network with the goal of reducing latency compared to existing protocols. Since the introduction of QUIC, we have seen rising volumes of QUIC-based web traffic in the form of HTTP/3. ...
@kushal June 15, 2022 - 10:03 • 21 days ago
Story of a space

In my case the story continued for around 2 hours. Yesterday I was trying to implement something from a given SPEC, and tried to match my output (from Rust) with the output from the Python code written by Dr. Fett.

The problem

I had to get the JSON encoding from an ordered Array (in Python it is a simple list), say ["kushal", "das"], and get the base64urlencode(sha256sum(JSON_ENCODED_STRING)) in Rust using serde_json.

Easy, isn’t?

Let us see what the encoded string looks like:

"[\"6Ij7tM-a5iVPGboS5tmvVA\",\"John\"]"

And the final checksum is otcxXDbO4_xzK0DXL_DO2INIFXIJdO1PqVM-n_gt-3g.

But, the reference implementation in Python has the checksum as fUMdn88aaoyKTHrvZd6AuLmPraGhPJ0zF5r_JhxCVZs.

It took me a few hours to notice the space after comma in the JSON encoded string from Python:

"[\"6Ij7tM-a5iVPGboS5tmvVA\", \"John\"]"

This is due to the following line from JSON spec

Insignificant whitespace is allowed before or after any of the six structural characters.

Yes, I know I should have known better, and read the original spec properly. But, in this case I learned it in the hard way.

...
@ooni June 15, 2022 - 00:00 • 21 days ago
OONI’s submission for the OHCHR report on internet shutdowns and human rights
Currently, the 50th session of the UN Human Rights Council is taking place. In response to the UN High Commissioner for Human Rights’s call for submissions in support of the OHCHR report on internet shutdowns and human rights to the 50th session of the Human Rights Council in June 2022, the OONI team provided a submission with relevant information on the occurrence of mandated disruptions of access to social media and messaging platforms over the past 5 years based on empirical OONI network measurement data. ...
@blog June 14, 2022 - 00:00 • 22 days ago
Volunteer as an alpha tester

Tor Browser receives hundreds of changes a year: from updates to Firefox – the underlying browser on which Tor Browser is based – to entirely new features designed to help protect at-risk and censored users. However, each change made to Tor Browser has the potential to introduce new and sometimes elusive bugs.

In order to find and fix these bugs before they reach the majority of our users, we apply updates to an early version of Tor Browser known as Tor Browser Alpha before releasing them more widely. Then, as a small nonprofit, we rely on a community of volunteer testers to try out our alphas before their general release in order to keep Tor Browser available on so many platforms.

Volunteering to become an alpha tester is one of the most accessible and effective ways you can help at-risk users stay connected to Tor. By spending a few minutes testing each new alpha release and reporting bugs back to our developers, you'll help provide a more stable Tor Browser for millions of users around the world.

It's also important for us to reach as diverse a group of alpha testers as possible, including those on different platforms, in different countries, who speak different languages and use Tor Browser in different ways. For example, users who are based in countries in which the Tor Network is blocked (such as Belarus, China, Russia and Turkmenistan) are invaluable in helping us test Connection Assist – a new feature designed to detect and circumvent censorship of the Tor Network automatically. Even disparities in bandwidth can be enough for one user to experience a bug while another won't.

In addition, alpha testers have the opportunity to preview new features early, like the experimental redesign of Connection Settings (formally Tor Network Settings) that we're currently testing. You can help shape the direction of major new features like this by providing feedback on their user experience as we iterate on them between each alpha release.

How to volunteer as an alpha tester

Step 1

Step 2

Step 3

Should you find a bug in Tor Browser Alpha:

  1. Check to see whether it occurs in the general release of Tor Browser. If it does, then it's an issue with Tor Browser itself, and you should follow these steps to get in touch.
  2. If this bug does not occur in the general release of Tor Browser, report it to the Tor Browser Alpha Feedback category on the Tor Forum.

As a small thank you for helping us keep Tor Browser stable and bug-free, alpha testers can collect special badges by reporting bugs on the Tor Forum too.

Alpha tester forum badges

Lastly, it's important to understand that Tor Browser Alpha is an unstable version of Tor Browser and should not be used for activities that could put you at risk. Instead, please limit your use of alpha to preview new features, test their performance and provide feedback before their release.

Happy testing!

...
@blog June 7, 2022 - 00:00 • 29 days ago
New Release: Tor Browser 11.0.14 (Android, Windows, macOS, Linux)

Tor Browser 11.0.14 is now available from the Tor Browser download page and also from our distribution directory.

This version includes important security updates to Firefox.

Tor Browser 11.0.14 updates Firefox on Windows, macOS, and Linux to 91.10.0esr.

We use the opportunity as well to update various other components of Tor Browser:

  • NoScript 11.4.6

The full changelog since Tor Browser 11.0.13 is:

  • All Platforms
  • Windows + OS X + Linux
    • Update Firefox to 91.10.0esr
  • Build System
...
@kushal May 31, 2022 - 17:08 • 1 months ago
Tor sysadmin 101 workshop for new relay operators

Tor log

On 4th June, at 19:00 UTC, we are doing an online workshop to help out new relay operators. If you ever wanted to help the Tor Project, or just curious about what is required to become a relay/bridge operator, you should join into the workshop.

The workshop is specially geared towards folks who are new to the land of Internet facing services. You will get to chat with many other operators and people from the Tor Project, and ask any doubts you have.

Register for the event, and share the news at your local groups/lists. Ask your friends to join :)

...
@blog May 31, 2022 - 00:00 • 1 months ago
New Alpha Release: Tor Browser 11.5a12 (Android, Windows, macOS, Linux)

Tor Browser 11.5a12 is now available from the Tor Browser download page and also from our distribution directory.

This version includes important security updates to Firefox.

The full changelog since Tor Browser 11.5a11 is:

...
@blog May 31, 2022 - 00:00 • 1 months ago
Arti 0.4.0 is released: Robustness and API improvements

Arti is our ongoing project to create a working embeddable Tor client in Rust. It’s not ready to replace the main Tor implementation in C, but we believe that it’s the future.

Right now, our focus is on making Arti production-quality, by stress-testing the code, hunting for likely bugs, and adding missing features that we know from experience that users will need. We're going to try not to break backward compatibility too much, but we'll do so when we think it's a good idea.

What's new in 0.4.0?

For a complete list of changes, have a look at the CHANGELOG.

This release wraps up our changes to the configuration logic, detects several kinds of unsafe filesystem configuration, and has a refactored directory manager to help us tolerate far more kinds of bootstrap failures.

There are also a bunch of smaller features, bugfixes, and infrastructure improvements; again, see the CHANGELOG for a more complete list.

And what's next?

In the short term, we're working for feature-parity with Tor in netflow resistance and congestion control. We hope to have some progress by our next release, but we might not be finished by then.

Beyond that, between now and our 1.0.0 milestone in September, we're aiming to make Arti a production-quality Tor client for direct internet access. (Onion services aren't funded yet, but we hope to change that soon.)

To do so, we need to bring Arti up to par with the C tor implementation in terms of its CPU usage, and security features. You can follow our progress on our 1.0.0 milestone.

We still plan to continue regular releases between now and then.

Here's how to try it out

We rely on users and volunteers to find problems in our software and suggest directions for its improvement. Although Arti isn't yet ready for production use, you can test it as a SOCKS proxy (if you're willing to compile from source) and as an embeddable library (if you don't mind a little API instability).

Assuming you've installed Arti (with cargo install arti, or directly from a cloned repository), you can use it to start a simple SOCKS proxy for making connections via Tor with:

$ arti proxy -p 9150

and use it more or less as you would use the C Tor implementation!

(It doesn't support onion services yet. If compilation doesn't work, make sure you have development files for libsqlite installed on your platform.)

If you want to build a program with Arti, you probably want to start with the arti-client crate. Be sure to check out the examples too.

For more information, check out the README file. (For now, it assumes that you're comfortable building Rust programs from the command line). Our CONTRIBUTING file has more information on installing development tools, and on using Arti inside of Tor Browser. (If you want to try that, please be aware that Arti doesn't support onion services yet.)

When you find bugs, please report them on our bugtracker. You can request an account or report a bug anonymously.

And if this documentation doesn't make sense, please ask questions! The questions you ask today might help improve the documentation tomorrow.

Call for feedback

Our priority for the coming months is to make Arti a production-quality Tor client, for the purposes of direct connections to the internet. (Onion services will come later.) We know some of the steps we'll need to take to get there, but not all of them: we need to know what's missing for your use-cases.

Whether you're a user or a developer, please give Arti a try, and let us know what you think. The sooner we learn what you need, the better our chances of getting it into an early milestone.

Acknowledgments

Thanks to everybody who has contributed to this release, including Alex Xu, Dimitris Apostolou, Jim Newsome, Michael Mccune, and Trinity Pointard.

And thanks, of course, to Zcash Community Grants (formerly Zcash Open Major Grants (ZOMG)) for funding this project!

...
@blog May 29, 2022 - 00:00 • 1 months ago
Projects from Google Summer of Code 2022

We are happy that the Tor Project is again participating in Google Summer of Code. Mentorship programs are very important for open source projects as they help us have one more way of bringing new contributors to our projects. Starting in June this year we will have the following mentors and their projects working with a contributor from Google Summer of Code:

Tor Weather: Improving the Tor network by Sarthik Gupta

This project will bring back Tor Weather. It is an email notification service that relay operators can subscribe to and choose which notifications they want to receive about their relay.

OONI Probe Network Experiments: Detecting TLS Middleboxes and Port-based Blocking

This project is about researching and developing new network experiments to be integrated inside the OONI Probe CLI. The contributor will write network measurements code and unit/integration tests for OONI's measurement engine.

Tor Project: OONI Explorer & Design System Improvements

This project will improve OONI Explorer. Specifically improvements on the OONI Explorer country pages to enrich them with data coming from new experiments and make use of new backend API calls to display more rich information about blocking. Moreover this work will entail modernizing the code base, improving the user interface, fixing usability issues, and writing more unit/integration tests.

The contributors will start working with the Tor Project and OONI in the next few weeks.

...
@blog May 26, 2022 - 00:00 • 1 months ago
New Alpha Release: Tor Browser 11.5a10 (Android)

Tor Browser 11.5a10 is now available from the Tor Browser download page and also from our distribution directory.

Tor Browser 11.5a10 updates Firefox to 99.0b3 and includes bugfixes and stability improvements.

We also use the opportunity to update various components of Tor Browser, bumping Tor to 0.4.7.7, NoScript to 11.4.5, OpenSSL to 1.1.1o, while switching to Go 1.18.1 for building our Go-related projects.

As usual, please report any issues you find in this alpha version, so we can fix them for the next upcoming major stable release (11.5).

The full changelog since Tor Browser 11.5a7 is:

...
@blog May 23, 2022 - 00:00 • 1 months ago
New Release: Tor Browser 11.0.13

Tor Browser 11.0.13 is now available from the Tor Browser download page and also from our distribution directory.

This version includes important security updates to Firefox.

We also updated Tor to 0.4.7.7 (the first stable Tor release with support for congestion control).

Note: the Android version 11.0.13 will be available later during the week.

The full changelog since Tor Browser 11.0.12 is:

...
@blog May 20, 2022 - 00:00 • 2 months ago
New Alpha Release: Tor Browser 11.5a11 (Windows/macOS/Linux)

Tor Browser 11.5a11 is now available from the Tor Browser download page and also from our distribution directory.

This version includes important security updates to Firefox.

Tor Browser 11.5a11 updates Firefox on Windows, macOS, and Linux to 91.9.0esr.

We use the opportunity as well to update various other components of Tor Browser:

  • NoScript 11.4.5
  • Tor 0.4.7.7
  • OpenSSL to 1.1.1o
  • Tor Launcher 0.2.35

The full changelog since Tor Browser 11.5a9 is:

...
@kushal May 16, 2022 - 15:46 • 2 months ago
OAuth Security Workshop 2022

Last week I attended OAuth Security Workshop at Trondheim, Norway. It was a 3-day single-track conference, where the first half of the days were pre-selected talks, and the second parts were unconference talks/side meetings. This was also my first proper conference after COVID emerged in the world.

osw starting

Back to the starting line

After many years felt the whole excitement of being a total newbie in something and suddenly being able to meet all the people behind the ideas. I reached the conference hotel in the afternoon of day 0 and met the organizers as they were in the lobby area. That chat went on for a long, and as more and more people kept checking into the hotel, I realized that it was a kind of reunion for many of the participants. Though a few of them met at a conference in California just a week ago, they all were excited to meet again.

To understand how welcoming any community is, just notice how the community behaves towards new folks. I think the Python community stands high in this regard. And I am very happy to say the whole OAuth/OIDC/Identity-related community folks are excellent in this regard. Even though I kept introducing myself as the new person in this identity land, not even for a single time I felt unwelcome. I attended OpenID-related working group meetings during the conference, multiple hallway chats, or talked to people while walking around the beautiful city. Everyone was happy to explain things in detail to me. Even though most of the people there have already spent 5-15+ years in the identity world.

The talks & meetings

What happens in Trondheim, stays in Trondheim.

I generally do not attend many talks at conferences, as they get recorded. But here, the conference was a single track, and also, there were no recordings.

The first talk was related to formal verification, and this was the first time I saw those (scary in my mind) maths on the big screen. But, full credit to the speakers as they explained things in such a way so that even an average programmer like me understood each step. And after this talk, we jumped into the world of OAuth/OpenID. One funny thing was whenever someone mentioned some RFC number, we found the authors inside the meeting room.

In the second half, we had the GNAP master class from Justin Richer. And once again, the speaker straightforwardly explained such deep technical details so that everyone in the room could understand it.

Now, in the evening before, a few times, people mentioned that in heated technical details, many RFC numbers will be thrown at each other, though it was not that many for me to get too scared :)

rfc count

I also managed to meet Roland for the first time. We had longer chats about the status of Python in the identity ecosystem and also about Identity Python. I took some notes about how we can improve the usage of Python in this, and I will most probably start writing about those in the coming weeks.

In multiple talks, researchers & people from the industry pointed out the mistakes made in the space from the security point of view. Even though, for many things, we have clear instructions in the SPECs, there is no guarantee that the implementors will follow them properly, thus causing security gaps.

At the end of day 1, we had a special Organ concert at the beautiful Trondheim Cathedral. On day 2, we had a special talk, “The Viking Kings of Norway”.

If you let me talk about my experience at the conference, I don’t think I will stop before 2 hours. It was so much excitement, new information, the whole feeling of going back into my starting days where I knew nothing much. Every discussion was full of learning opportunities (all discussions are anyway, but being a newbie is a different level of excitement) or the sadness of leaving Anwesha & Py back in Stockholm. This was the first time I was staying away from them after moving to Sweden.

surprise

Just before the conference ended, Aaron Parecki gave me a surprise gift. I spent time with it during the whole flight back to Stockholm.

This conference had the best food experience of my life for a conference. Starting from breakfast to lunch, big snack tables, dinners, or restaurant foods. In front of me, at least 4 people during the conference said, “oh, it feels like we are only eating and sometimes talking”.

Another thing I really loved to see is that the two primary conference organizers are university roommates who are continuing the friendship and journey in a very beautiful way. After midnight, standing outside of the hotel and talking about random things about life and just being able to see two longtime friends excited about similar things, it felt so nice.

Trondheim

I also want to thank the whole organizing team, including local organizers, Steinar, and the rest of the team did a superb job.

...
@anarcat May 13, 2022 - 20:19 • 2 months ago
NVMe/SSD disk failure

Yesterday, my workstation (curie) was hung when I came in the office. After a "skinny elephant", the box rebooted, but it couldn't find the primary disk (in the BIOS). Instead, it booted on the secondary HDD drive, still running an old Fedora 27 install which somehow survived to this day, possibly because ?BTRFS is incomprehensible.

Somehow, I blindly accepted the Fedora prompt asking me to upgrade to Fedora 28, not realizing that:

  1. Fedora is now at release 36, not 28
  2. major upgrades take about an hour...
  3. ... and happen at boot time, blocking the entire machine (I'll remember this next time I laugh at Windows and Mac OS users stuck on updates on boot)
  4. you can't skip more than one major upgrade

Which means that upgrading to latest would take over 4 hours. Thankfully, it's mostly automated and seems to work pretty well (which is not exactly the case for Debian). It still seems like a lot of wasted time -- it would probably be better to just reinstall the machine at this point -- and not what I had planned to do that morning at all.

In any case, after waiting all that time, the machine booted (in Fedora) again, and now it could detect the SSD disk. The BIOS could find the disk too, so after I reinstalled grub (from Fedora) and fixed the boot order, it rebooted, but secureboot failed, so I turned that off (!?), and I was back in Debian.

I did an emergency backup with ddrescue, from the running system which probably doesn't really work as a backup (because the filesystem is likely to be corrupt) but it was fast enough (20 minutes) and gave me some peace of mind. My offsites backup have been down for a while and since I treat my workstations as "cattle" (not "pets"), I don't have a solid recovery scenario for those situations other than "just reinstall and run Puppet", which takes a while.

Now I'm wondering what the next step is: probably replace the disk anyways (the new one is bigger: 1TB instead of 500GB), or keep the new one as a hot backup somehow. Too bad I don't have a snapshotting filesystem on there... (Technically, I have LVM, but LVM snapshots are heavy and slow, and can't atomically cover the entire machine.)

It's kind of scary how this thing failed: totally dropped off the bus, just not in the BIOS at all. I prefer the way spinning rust fails: clickety sounds, tons of warnings beforehand, partial recovery possible. With this new flashy junk, you just lose everything all at once. Not fun.

...
@anarcat May 13, 2022 - 20:04 • 2 months ago
BTRFS notes

I'm not a fan of BTRFS. This page serves as a reminder of why, but also a cheat sheet to figure out basic tasks in a BTRFS environment because those are not obvious to me, even after repeatedly having to deal with them.

Content warning: there might be mentions of ZFS.

Stability concerns

I'm worried about BTRFS stability, which has been historically ... changing. RAID-5 and RAID-6 are still marked unstable, for example. It's kind of a lucky guess whether your current kernel will behave properly with your planned workload. For example, in Linux 4.9, RAID-1 and RAID-10 were marked as "mostly OK" with a note that says:

Needs to be able to create two copies always. Can get stuck in irreversible read-only mode if only one copy can be made.

Even as of now, RAID-1 and RAID-10 has this note:

The simple redundancy RAID levels utilize different mirrors in a way that does not achieve the maximum performance. The logic can be improved so the reads will spread over the mirrors evenly or based on device congestion.

Granted, that's not a stability concern anymore, just performance. A reviewer of a draft of this article actually claimed that BTRFS only reads from one of the drives, which hopefully is inaccurate, but goes to show how confusing all this is.

There are other warnings in the Debian wiki that are quite scary. Even the legendary Arch wiki has a warning on top of their BTRFS page, still.

Even if those issues are now fixed, it can be hard to tell when they were fixed. There is a changelog by feature but it explicitly warns that it doesn't know "which kernel version it is considered mature enough for production use", so it's also useless for this.

It would have been much better if BTRFS was released into the world only when those bugs were being completely fixed. Or that, at least, features were announced when they were stable, not just "we merged to mainline, good luck". Even now, we get mixed messages even in the official BTRFS documentation which says "The Btrfs code base is stable" (main page) while at the same time clearly stating unstable parts in the status page (currently RAID56).

There are much harsher BTRFS critics than me out there so I will stop here, but let's just say that I feel a little uncomfortable trusting server data with full RAID arrays to BTRFS. But surely, for a workstation, things should just work smoothly... Right? Well, let's see the snags I hit.

My BTRFS test setup

Before I go any further, I should probably clarify how I am testing BTRFS in the first place.

The reason I tried BTRFS is that I was ... let's just say "strongly encouraged" by the LWN editors to install Fedora for the terminal emulators series. That, in turn, meant the setup was done with BTRFS, because that was somewhat the default in Fedora 27 (or did I want to experiment? I don't remember, it's been too long already).

So Fedora was setup on my 1TB HDD and, with encryption, the partition table looks like this:

NAME                   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                      8:0    0 931,5G  0 disk  
├─sda1                   8:1    0   200M  0 part  /boot/efi
├─sda2                   8:2    0     1G  0 part  /boot
├─sda3                   8:3    0   7,8G  0 part  
│ └─fedora_swap        253:5    0   7.8G  0 crypt [SWAP]
└─sda4                   8:4    0 922,5G  0 part  
  └─fedora_crypt       253:4    0 922,5G  0 crypt /

(This might not entirely be accurate: I rebuilt this from the Debian side of things.)

This is pretty straightforward, except for the swap partition: normally, I just treat swap like any other logical volume and create it in a logical volume. This is now just speculation, but I bet it was setup this way because "swap" support was only added in BTRFS 5.0.

I fully expect BTRFS experts to yell at me now because this is an old setup and BTRFS is so much better now, but that's exactly the point here. That setup is not that old (2018? old? really?), and migrating to a new partition scheme isn't exactly practical right now. But let's move on to more practical considerations.

No builtin encryption

BTRFS aims at replacing the entire mdadm, LVM, and ext4 stack with a single entity, and adding new features like deduplication, checksums and so on.

Yet there is one feature it is critically missing: encryption. See, my typical stack is actually mdadm, LUKS, and then LVM and ext4. This is convenient because I have only a single volume to decrypt.

If I were to use BTRFS on servers, I'd need to have one LUKS volume per-disk. For a simple RAID-1 array, that's not too bad: one extra key. But for large RAID-10 arrays, this gets really unwieldy.

The obvious BTRFS alternative, ZFS, supports encryption out of the box and mixes it above the disks so you only have one passphrase to enter. The main downside of ZFS encryption is that it happens above the "pool" level so you can typically see filesystem names (and possibly snapshots, depending on how it is built), which is not the case with a more traditional stack.

Subvolumes, filesystems, and devices

I find BTRFS's architecture to be utterly confusing. In the traditional LVM stack (which is itself kind of confusing if you're new to that stuff), you have those layers:

  • disks: let's say /dev/nvme0n1 and nvme1n1
  • RAID arrays with mdadm: let's say the above disks are joined in a RAID-1 array in /dev/md1
  • volume groups or VG with LVM: the above RAID device (technically a "physical volume" or PV) is assigned into a VG, let's call it vg_tbbuild05 (multiple PVs can be added to a single VG which is why there is that abstraction)
  • LVM logical volumes: out of that volume group actually "virtual partitions" or "logical volumes" are created, that is where your filesystem lives
  • filesystem, typically with ext4: that's your normal filesystem, which treats the logical volume as just another block device

A typical server setup would look like this:

NAME                      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme0n1                   259:0    0   1.7T  0 disk  
├─nvme0n1p1               259:1    0     8M  0 part  
├─nvme0n1p2               259:2    0   512M  0 part  
│ └─md0                     9:0    0   511M  0 raid1 /boot
├─nvme0n1p3               259:3    0   1.7T  0 part  
│ └─md1                     9:1    0   1.7T  0 raid1 
│   └─crypt_dev_md1       253:0    0   1.7T  0 crypt 
│     ├─vg_tbbuild05-root 253:1    0    30G  0 lvm   /
│     ├─vg_tbbuild05-swap 253:2    0 125.7G  0 lvm   [SWAP]
│     └─vg_tbbuild05-srv  253:3    0   1.5T  0 lvm   /srv
└─nvme0n1p4               259:4    0     1M  0 part

I stripped the other nvme1n1 disk because it's basically the same.

Now, if we look at my BTRFS-enabled workstation, which doesn't even have RAID, we have the following:

  • disk: /dev/sda with, again, /dev/sda4 being where BTRFS lives
  • filesystem: fedora_crypt, which is, confusingly, kind of like a volume group. it's where everything lives. i think.
  • subvolumes: home, root, /, etc. those are actually the things that get mounted. you'd think you'd mount a filesystem, but no, you mount a subvolume. that is backwards.

It looks something like this to lsblk:

NAME                   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                      8:0    0 931,5G  0 disk  
├─sda1                   8:1    0   200M  0 part  /boot/efi
├─sda2                   8:2    0     1G  0 part  /boot
├─sda3                   8:3    0   7,8G  0 part  [SWAP]
└─sda4                   8:4    0 922,5G  0 part  
  └─fedora_crypt       253:4    0 922,5G  0 crypt /srv

Notice how we don't see all the BTRFS volumes here? Maybe it's because I'm mounting this from the Debian side, but lsblk definitely gets confused here. I frankly don't quite understand what's going on, even after repeatedly looking around the rather dismal documentation. But that's what I gather from the following commands:

root@curie:/home/anarcat# btrfs filesystem show
Label: 'fedora'  uuid: 5abb9def-c725-44ef-a45e-d72657803f37
    Total devices 1 FS bytes used 883.29GiB
    devid    1 size 922.47GiB used 916.47GiB path /dev/mapper/fedora_crypt

root@curie:/home/anarcat# btrfs subvolume list /srv
ID 257 gen 108092 top level 5 path home
ID 258 gen 108094 top level 5 path root
ID 263 gen 108020 top level 258 path root/var/lib/machines

I only got to that point through trial and error. Notice how I use an existing mountpoint to list the related subvolumes. If I try to use the filesystem path, the one that's listed in filesystem show, I fail:

root@curie:/home/anarcat# btrfs subvolume list /dev/mapper/fedora_crypt 
ERROR: not a btrfs filesystem: /dev/mapper/fedora_crypt
ERROR: can't access '/dev/mapper/fedora_crypt'

Maybe I just need to use the label? Nope:

root@curie:/home/anarcat# btrfs subvolume list fedora
ERROR: cannot access 'fedora': No such file or directory
ERROR: can't access 'fedora'

This is really confusing. I don't even know if I understand this right, and I've been staring at this all afternoon. Hopefully, the lazyweb will correct me eventually.

(As an aside, why are they called "subvolumes"? If something is a "sub" of "something else", that "something else" must exist right? But no, BTRFS doesn't have "volumes", it only has "subvolumes". Go figure. Presumably the filesystem still holds "files" though, at least empirically it doesn't seem like it lost anything so far.

In any case, at least I can refer to this section in the future, the next time I fumble around the btrfs commandline, as I surely will. I will possibly even update this section as I get better at it, or based on my reader's judicious feedback.

Mounting BTRFS subvolumes

So how did I even get to that point? I have this in my /etc/fstab, on the Debian side of things:

UUID=5abb9def-c725-44ef-a45e-d72657803f37   /srv    btrfs  defaults 0   2

This thankfully ignores all the subvolume nonsense because it relies on the UUID. mount tells me that's actually the "root" (? /?) subvolume:

root@curie:/home/anarcat# mount | grep /srv
/dev/mapper/fedora_crypt on /srv type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/)

Let's see if I can mount the other volumes I have on there. Remember that subvolume list showed I had home, root, and var/lib/machines. Let's try root:

mount -o subvol=root /dev/mapper/fedora_crypt /mnt

Interestingly, root is not the same as /, it's a different subvolume! It seems to be the Fedora root (/, really) filesystem. No idea what is happening here. I also have a home subvolume, let's mount it too, for good measure:

mount -o subvol=home /dev/mapper/fedora_crypt /mnt/home

Note that lsblk doesn't notice those two new mountpoints, and that's normal: it only lists block devices and subvolumes (rather inconveniently, I'd say) do not show up as devices:

root@curie:/home/anarcat# lsblk 
NAME                   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                      8:0    0 931,5G  0 disk  
├─sda1                   8:1    0   200M  0 part  
├─sda2                   8:2    0     1G  0 part  
├─sda3                   8:3    0   7,8G  0 part  
└─sda4                   8:4    0 922,5G  0 part  
  └─fedora_crypt       253:4    0 922,5G  0 crypt /srv

This is really, really confusing. Maybe I did something wrong in the setup. Maybe it's because I'm mounting it from outside Fedora. Either way, it just doesn't feel right.

No disk usage per volume

If you want to see what's taking up space in one of those subvolumes, tough luck:

root@curie:/home/anarcat# df -h  /srv /mnt /mnt/home
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/fedora_crypt  923G  886G   31G  97% /srv
/dev/mapper/fedora_crypt  923G  886G   31G  97% /mnt
/dev/mapper/fedora_crypt  923G  886G   31G  97% /mnt/home

(Notice, in passing, that it looks like the same filesystem is mounted in different places. In that sense, you'd expect /srv and /mnt (and /mnt/home?!) to be exactly the same, but no: they are entirely different directory structures, which I will not call "filesystems" here because everyone's head will explode in sparks of confusion.)

Yes, disk space is shared (that's the Size and Avail columns, makes sense). But nope, no cookie for you: they all have the same Used columns, so you need to actually walk the entire filesystem to figure out what each disk takes.

(For future reference, that's basically:

root@curie:/home/anarcat# time du -schx /mnt/home /mnt /srv
124M    /mnt/home
7.5G    /mnt
875G    /srv
883G    total

real    2m49.080s
user    0m3.664s
sys 0m19.013s

And yes, that was painfully slow.)

ZFS actually has some oddities in that regard, but at least it tells me how much disk each volume (and snapshot) takes:

root@tubman:~# time df -t zfs -h
Filesystem         Size  Used Avail Use% Mounted on
rpool/ROOT/debian  3.5T  1.4G  3.5T   1% /
rpool/var/tmp      3.5T  384K  3.5T   1% /var/tmp
rpool/var/spool    3.5T  256K  3.5T   1% /var/spool
rpool/var/log      3.5T  2.0G  3.5T   1% /var/log
rpool/home/root    3.5T  2.2G  3.5T   1% /root
rpool/home         3.5T  256K  3.5T   1% /home
rpool/srv          3.5T   80G  3.5T   3% /srv
rpool/var/cache    3.5T  114M  3.5T   1% /var/cache
bpool/BOOT/debian  571M   90M  481M  16% /boot

real    0m0.003s
user    0m0.002s
sys 0m0.000s

That's 56360 times faster, by the way.

But yes, that's not fair: those in the know will know there's a different command to do what df does with BTRFS filesystems, the btrfs filesystem usage command:

root@curie:/home/anarcat# time btrfs filesystem usage /srv
Overall:
    Device size:         922.47GiB
    Device allocated:        916.47GiB
    Device unallocated:        6.00GiB
    Device missing:          0.00B
    Used:            884.97GiB
    Free (estimated):         30.84GiB  (min: 27.84GiB)
    Free (statfs, df):        30.84GiB
    Data ratio:               1.00
    Metadata ratio:           2.00
    Global reserve:      512.00MiB  (used: 0.00B)
    Multiple profiles:              no

Data,single: Size:906.45GiB, Used:881.61GiB (97.26%)
   /dev/mapper/fedora_crypt  906.45GiB

Metadata,DUP: Size:5.00GiB, Used:1.68GiB (33.58%)
   /dev/mapper/fedora_crypt   10.00GiB

System,DUP: Size:8.00MiB, Used:128.00KiB (1.56%)
   /dev/mapper/fedora_crypt   16.00MiB

Unallocated:
   /dev/mapper/fedora_crypt    6.00GiB

real    0m0,004s
user    0m0,000s
sys 0m0,004s

Almost as fast as ZFS's df! Good job. But wait. That doesn't actually tell me usage per subvolume. Notice it's filesystem usage, not subvolume usage, which unhelpfully refuses to exist. That command only shows that one "filesystem" internal statistics that are pretty opaque.. You can also appreciate that it's wasting 6GB of "unallocated" disk space there: I probably did something Very Wrong and should be punished by Hacker News. I also wonder why it has 1.68GB of "metadata" used...

At this point, I just really want to throw that thing out of the window and restart from scratch. I don't really feel like learning the BTRFS internals, as they seem oblique and completely bizarre to me. It feels a little like the state of PHP now: it's actually pretty solid, but built upon so many layers of cruft that I still feel it corrupts my brain every time I have to deal with it (needle or haystack first? anyone?)...

Conclusion

I find BTRFS utterly confusing and I'm worried about its reliability. I think a lot of work is needed on usability and coherence before I even consider running this anywhere else than a lab, and that's really too bad, because there are really nice features in BTRFS that would greatly help my workflow. (I want to use filesystem snapshots as high-performance, high frequency backups.)

So now I'm experimenting with OpenZFS. It's so much simpler, just works, and it's rock solid. After this 8 minute read, I had a good understanding of how ZFS worked. Here's the 30 seconds overview:

  • vdev: a RAID array
  • vpool: a volume group of vdevs
  • datasets: normal filesystems (or block device, if you want to use another filesystem on top of ZFS)

There's also other special volumes like caches and logs that you can (really easily, compared to LVM caching) use to tweak your setup. You might also want to look at recordsize or ashift to tweak the filesystem to fit better your workload (or deal with drives lying about their sector size, I'm looking at you Samsung), but that's it.

Running ZFS on Linux currently involves building kernel modules from scratch on every host, which I think is pretty bad. But I was able to setup a ZFS-only server using this excellent documentation without too much problem.

I'm hoping some day the copyright issues are resolved and we can at least ship binary packages, but the politics (e.g. convincing Debian that is the right thing to do) and the logistics (e.g. DKMS auto-builders? is that even a thing? how about signed DKMS packages? fun-fun-fun!) seem really impractical. Who knows, maybe hell will freeze over (again) and Oracle will fix the CDDL. I personally think that we should just completely ignore this problem (which wasn't even supposed to be a problem) and ship binary packages directly, but I'm a pragmatic and do not always fit well with the free software fundamentalists.

All of this to say that, short term, we don't have a reliable, advanced filesystem/logical disk manager in Linux. And that's really too bad.

...
@ooni May 13, 2022 - 00:00 • 2 months ago
Job Opening: OONI Community Coordinator
Are you passionate about defending human rights on the internet? Are you extremely organized and enjoy supporting communities around the world? We have a job opening for you! The OONI team (a non-profit fighting internet censorship) is looking for a dedicated community coordinator to help grow and support the OONI community around the world. The application deadline is Sunday, 12th June 2022. Job description By joining our team, you will support our partners and global community of human rights defenders to measure and fight internet censorship. ...