A difficult 23.11 nixos upgrade story

2024-02-06 - Debugging, diffing configurations, reading change logs
Tag: nixos


Back in December I upgraded my nixos servers from the 23.05 release to 23.11. I had to debug a strange issue where my servers were no longer reachable after rebooting the new version.

The problem

I am using LUKS encryption for the root filesystem, and am used to the comfort of unlocking the partition thanks to an SSH server embedded in the initrd. This setup has the security flaw that the initrd could be replaced by a malicious party, but this is not something I am concerned about for personal stuff so please ignore it.

The following configuration made it work on nixos 23.05:

{ config, pkgs, ... }:
        boot.initrd.network = {
                enable = true;
                ssh = {
                        enable = true;
                        port = 22;
                        authorizedKeys = [ "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AABCDLOJV3913FRYgCVA2plFB8W8sF9LfbzXZOrxqaOrrwco" ];
                        hostKeys = [ "/etc/ssh/ssh_host_rsa_key" "/etc/ssh/ssh_host_ed25519_key" ];

What happened

Being a good sysadmin I read the release notes and caught:

The boot.initrd.network.udhcp.enable option allows control over DHCP during Stage 1 regardless of what networking.useDHCP is set to.

I thought nothing of it… But I should have!

Behind this message is the fact that if you did not set networking.useDHCP = true; globally, your initrd in nixos 23.11 will no longer do a DHCP lookup. This is a behavioral change I find baffling because it worked perfectly in 23.05! My configuration used DHCP but set explicitly on the interfaces that need it, not globally. As a networking engineer I loathe useless traffic on my networks, this includes DHCP requests for devices that do not need it.

Nixos 23.11 needs a boot.initrd.network.udhcpc.enable = true; in order to boot correctly again. Finding this new setting was not too hard - a few minutes of head scratching and intuition did the trick - but as usual I am on the lookout for a learning opportunity.

Configuration diffs

The first thing I looked for is a way to diff between two nixos configurations. I ended up disappointed because I did not find a way to do it neither easily nor exhaustively! There are quite advanced things for nix itself, but for nixos it is quite terse.

The most advanced thing I managed is to have a diff between configurations that were activated on the same machine: diff on just the build server does not work, this needs to happen on the machine where the configuration is deployed live.

The nixos diffs I managed are limited to installed packages or installed files and their size changes, nothing seems to allow me to dive into what is inside the initrd.

nix --extra-experimental-features nix-command profile diff-closures --profile /nix/var/nix/profiles/system


This upgrade experience did not inspire a lot of confidence in me. Nixos is a great project and I wholeheartedly thank all its contributors for their efforts and dedication, but as a sysadmin this is not the kind of defaults that I ever want to see change silently.

I still think nixos has great potential and deserves more recognition.