Wednesday, February 24, 2021

Deploying mutable multi-process Docker containers with the Nix process management framework (or running Hydra in a Docker container)

In a blog post written several months ago, I have shown that the Nix process management framework can also be used to conveniently construct multi-process Docker images.

Although Docker is primarily used for managing single root application process containers, multi-process containers can sometimes be useful to deploy systems that consist of multiple, tightly coupled, processes.

The Docker manual has a section that describes how to construct images for multi-process containers, but IMO the configuration process is a bit tedious and cumbersome.

To make this process more convenient, I have built a wrapper function: createMultiProcessImage around the dockerTools.buildImage function (provided by Nixpkgs) that does the following:

  • It constructs an image that runs a Linux and Docker compatible process manager as an entry point. Currently, it supports supervisord, sysvinit, disnix and s6-rc.
  • The Nix process management framework is used to build a configuration for a system that consists of multiple processes, that will be managed by any of the supported process managers.

Although the framework makes the construction of multi-process images convenient, a big drawback of multi-process Docker containers is upgrading them -- for example, for Debian-based containers you can imperatively upgrade packages by connecting to the container:

$ docker exec -it mycontainer /bin/bash

and upgrade the desired packages, such as file:

$ apt install file

The upgrade instruction above is not reproducible -- apt may install file version 5.38 today, and 5.39 tomorrow.

To cope with these kinds of side-effects, Docker works with images that snapshot the outcomes of all the installation steps. Constructing a container from the same image will always provide the same versions of all dependencies.

As a consequence, to perform a reproducible container upgrade, it is required to construct a new image, discard the container and reconstruct the container from the new image version, causing the system as a whole to be terminated, including the processes that have not changed.

For a while, I have been thinking about this limitation and developed a solution that makes it possible to upgrade multi-process containers without stopping and discarding them. The only exception is the process manager.

To make deployments reproducible, it combines the reproducibility properties of Docker and Nix.

In this blog post, I will describe how this solution works and how it can be used.

Creating a function for building mutable Docker images


As explained in an earlier blog post, that compares the deployment properties of Nix and Docker, both solutions support reproducible deployment, albeit for different application domains.

Moreover, their reproducibility properties are built around different concepts:

  • Docker containers are reproducible, because they are constructed from images that consist of immutable layers identified by hash codes derived from their contents.
  • Nix package builds are reproducible, because they are stored in isolation in a Nix store and made immutable (the files' permissions are set read-only). In the construction process of the packages, many side effects are mitigated.

    As a result, when the hash code prefix of a package (derived from all build inputs) is the same, then the build output is also (nearly) bit-identical, regardless of the machine on which the package was built.

By taking these reproducibilty properties into account, we can create a reproducible deployment process for upgradable containers by using a specific separation of responsibilities.

Deploying the base system


For the deployment of the base system that includes the process manager, we can stick ourselves to the traditional Docker deployment workflow based on images (the only unconventional aspect is that we use Nix to build a Docker image, instead of Dockerfiles).

The process manager that the image provides deploys its configuration from a dynamic configuration directory.

To support supervisord, we can invoke the following command as the container's entry point:

supervisord --nodaemon \
  --configuration /etc/supervisor/supervisord.conf \
  --logfile /var/log/supervisord.log \
  --pidfile /var/run/supervisord.pid

The above command starts the supervisord service (in foreground mode), using the supervisord.conf configuration file stored in /etc/supervisord.

The supervisord.conf configuration file has the following structure:

[supervisord]

[include]
files=conf.d/*

The above configuration automatically loads all program definitions stored in the conf.d directory. This directory is writable and initially empty. It can be populated with configuration files generated by the Nix process management framework.

For the other process managers that the framework supports (sysvinit, disnix and s6-rc), we follow a similar strategy -- we configure the process manager in such a way that the configuration is loaded from a source that can be dynamically updated.

Deploying process instances


Deployment of the process instances is not done in the construction of the image, but by the Nix process management framework and the Nix package manager running in the container.

To allow a processes model deployment to refer to packages in the Nixpkgs collection and install binary substitutes, we must configure a Nix channel, such as the unstable Nixpkgs channel:

$ nix-channel --add https://nixos.org/channels/nixpkgs-unstable
$ nix-channel --update

(As a sidenote: it is also possible to subscribe to a stable Nixpkgs channel or a specific Git revision of Nixpkgs).

The processes model (and relevant sub models, such as ids.nix that contains numeric ID assignments) are copied into the Docker image.

We can deploy the processes model for supervisord as follows:

$ nixproc-supervisord-switch

The above command will deploy the processes model in the NIXPROC_PROCESSES environment variable, which defaults to: /etc/nixproc/processes.nix:

  • First, it builds supervisord configuration files from the processes model (this step also includes deploying all required packages and service configuration files)
  • It creates symlinks for each configuration file belonging to a process instance in the writable conf.d directory
  • It instructs supervisord to reload the configuration so that only obsolete processes get deactivated and new services activated, causing unchanged processes to remain untouched.

(For the other process managers, we have equivalent tools: nixproc-sysvinit-switch, nixproc-disnix-switch and nixproc-s6-rc-switch).

Initial deployment of the system


Because only the process manager is deployed as part of the image (with an initially empty configuration), the system is not yet usable when we start a container.

To solve this problem, we must perform an initial deployment of the system on first startup.

I used my lessons learned from the chainloading techniques in s6 (in the previous blog post) and developed hacky generated bootstrap script (/bin/bootstrap) that serves as the container's entry point:

cat > /bin/bootstrap <<EOF
#! ${pkgs.stdenv.shell} -e

# Configure Nix channels
nix-channel --add ${channelURL}
nix-channel --update

# Deploy the processes model (in a child process)
nixproc-${input.processManager}-switch &

# Overwrite the bootstrap script, so that it simply just
# starts the process manager the next time we start the
# container
cat > /bin/bootstrap <<EOR
#! ${pkgs.stdenv.shell} -e
exec ${cmd}
EOR

# Chain load the actual process manager
exec ${cmd}
EOF
chmod 755 /bin/bootstrap

The generated bootstrap script does the following:

  • First, a Nix channel is configured and updated so that we can install packages from the Nixpkgs collection and obtain substitutes.
  • The next step is deploying the processes model by running the nixproc-*-switch tool for a supported process manager. This process is started in the background (as a child process) -- we can use this trick to force the managing bash shell to load our desired process supervisor as soon as possible.

    Ultimately, we want the process manager to become responsible for supervising any other process running in the container.
  • After the deployment process is started in the background, the bootstrap script is overridden by a bootstrap script that becomes our real entry point -- the process manager that we want to use, such as supervisord.

    Overriding the bootstrap script makes sure that the next time we start the container, it will start instantly without attempting to deploy the system again.
  • Finally, the bootstrap script "execs" into the real process manager, becoming the new PID 1 process. When the deployment of the system is done (the nixproc-*-switch process that still runs in the background), the process manager becomes responsible for reaping it.

With the above script, the workflow of deploying an upgradable/mutable multi-process container is the same as deploying an ordinary container from a Docker image -- the only (minor) difference is that the first time that we start the container, it may take some time before the services become available, because the multi-process system needs to be deployed by Nix and the Nix process management framework.

A simple usage scenario


Similar to my previous blog posts about the Nix process management framework, I will use the trivial web application system to demonstrate how the functionality of the framework can be used.

The web application system consists of one or more webapp processes (with an embedded HTTP server) that only return static HTML pages displaying their identities.

An Nginx reverse proxy forwards incoming requests to the appropriate webapp instance -- each webapp service can be reached by using its unique virtual host value.

To construct a mutable multi-process Docker image with Nix, we can write the following Nix expression (default.nix):

let
  pkgs = import <nixpkgs> {};

  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  createMutableMultiProcessImage = import "${nix-processmgmt}/nixproc/create-image-from-steps/create-mutable-multi-process-image-universal.nix" {
    inherit pkgs;
  };
in
createMutableMultiProcessImage {
  name = "multiprocess";
  tag = "test";
  contents = [ pkgs.mc ];
  exprFile = ./processes.nix;
  idResourcesFile = ./idresources.nix;
  idsFile = ./ids.nix;
  processManager = "supervisord"; # sysvinit, disnix, s6-rc are also valid options
}

The above Nix expression invokes the createMutableMultiProcessImage function that constructs a Docker image that provides a base system with a process manager, and a bootstrap script that deploys the multi-process system:

  • The name, tag, and contents parameters specify the image name, tag and the packages that need to be included in the image.
  • The exprFile parameter refers to a processes model that captures the configurations of the process instances that need to be deployed.
  • The idResources parameter refers to an ID resources model that specifies from which resource pools unique IDs need to be selected.
  • The idsFile parameter refers to an IDs model that contains the unique ID assignments for each process instance. Unique IDs resemble TCP/UDP port assignments, user IDs (UIDs) and group IDs (GIDs).
  • We can use the processManager parameter to select the process manager we want to use. In the above example it is supervisord, but other options are also possible.

We can use the following processes model (processes.nix) to deploy a small version of our example system:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

  sharedConstructors = import "${nix-processmgmt}/examples/services-agnostic/constructors/constructors.nix" {
    inherit pkgs stateDir runtimeDir logDir cacheDir tmpDir forceDisableUserChange processManager ids;
  };

  constructors = import "${nix-processmgmt}/examples/webapps-agnostic/constructors/constructors.nix" {
    inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager ids;
  };
in
rec {
  webapp = rec {
    port = ids.webappPorts.webapp or 0;
    dnsName = "webapp.local";

    pkg = constructors.webapp {
      inherit port;
    };

    requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];
  };

  nginx = rec {
    port = ids.nginxPorts.nginx or 0;

    pkg = sharedConstructors.nginxReverseProxyHostBased {
      webapps = [ webapp ];
      inherit port;
    } {};

    requiresUniqueIdsFor = [ "nginxPorts" "uids" "gids" ];
  };
}

The above Nix expression configures two process instances, one webapp process that returns a static HTML page with its identity and an Nginx reverse proxy that forwards connections to it.

A notable difference between the expression shown above and the processes models of the same system shown in my previous blog posts, is that this expression does not contain any references to files on the local filesystem, with the exception of the ID assignments expression (ids.nix).

We obtain all required functionality from the Nix process management framework by invoking builtins.fetchGit. Eliminating local references is required to allow the processes model to be copied into the container and deployed from within the container.

We can build a Docker image as follows:

$ nix-build

load the image into Docker:

$ docker load -i result

and create and start a Docker container:

$ docker run -it --name webapps --network host multiprocess:test
unpacking channels...
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
created 1 symlinks in user environment
2021-02-21 15:29:29,878 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2021-02-21 15:29:29,878 WARN No file matches via include "/etc/supervisor/conf.d/*"
2021-02-21 15:29:29,897 INFO RPC interface 'supervisor' initialized
2021-02-21 15:29:29,897 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2021-02-21 15:29:29,898 INFO supervisord started with pid 1
these derivations will be built:
  /nix/store/011g52sj25k5k04zx9zdszdxfv6wy1dw-credentials.drv
  /nix/store/1i9g728k7lda0z3mn1d4bfw07v5gzkrv-credentials.drv
  /nix/store/fs8fwfhalmgxf8y1c47d0zzq4f89fz0g-nginx.conf.drv
  /nix/store/vxpm2m6444fcy9r2p06dmpw2zxlfw0v4-nginx-foregroundproxy.sh.drv
  /nix/store/4v3lxnpapf5f8297gdjz6kdra8g7k4sc-nginx.conf.drv
  /nix/store/mdldv8gwvcd5fkchncp90hmz3p9rcd99-builder.pl.drv
  /nix/store/r7qjyr8vr3kh1lydrnzx6nwh62spksx5-nginx.drv
  /nix/store/h69khss5dqvx4svsc39l363wilcf2jjm-webapp.drv
  /nix/store/kcqbrhkc5gva3r8r0fnqjcfhcw4w5il5-webapp.conf.drv
  /nix/store/xfc1zbr92pyisf8lw35qybbn0g4f46sc-webapp.drv
  /nix/store/fjx5kndv24pia1yi2b7b2bznamfm8q0k-supervisord.d.drv
these paths will be fetched (78.80 MiB download, 347.06 MiB unpacked):
...

As may be noticed by looking at the output, on first startup the Nix process management framework is invoked to deploy the system with Nix.

After the system has been deployed, we should be able to connect to the webapp process via the Nginx reverse proxy:

$ curl -H 'Host: webapp.local' http://localhost:8080
<!DOCTYPE html>
<html>
  <head>
    <title>Simple test webapp</title>
  </head>
  <body>
    Simple test webapp listening on port: 5000
  </body>
</html>

When it is desired to upgrade the system, we can change the system's configuration by connecting to the container instance:

$ docker exec -it webapps /bin/bash

In the container, we can edit the processes.nix configuration file:

$ mcedit /etc/nixproc/processes.nix

and make changes to the configuration of the system. For example, we can change the processes model to include a second webapp process:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

  sharedConstructors = import "${nix-processmgmt}/examples/services-agnostic/constructors/constructors.nix" {
    inherit pkgs stateDir runtimeDir logDir cacheDir tmpDir forceDisableUserChange processManager ids;
  };

  constructors = import "${nix-processmgmt}/examples/webapps-agnostic/constructors/constructors.nix" {
    inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager ids;
  };
in
rec {
  webapp = rec {
    port = ids.webappPorts.webapp or 0;
    dnsName = "webapp.local";

    pkg = constructors.webapp {
      inherit port;
    };

    requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];
  };

  webapp2 = rec {
    port = ids.webappPorts.webapp2 or 0;
    dnsName = "webapp2.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "2";
    };

    requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];
  };

  nginx = rec {
    port = ids.nginxPorts.nginx or 0;

    pkg = sharedConstructors.nginxReverseProxyHostBased {
      webapps = [ webapp webapp2 ];
      inherit port;
    } {};

    requiresUniqueIdsFor = [ "nginxPorts" "uids" "gids" ];
  };
}

In the above process model model, a new process instance named: webapp2 was added that listens on a unique port that can be reached with the webapp2.local virtual host value.

By running the following command, the system in the container gets upgraded:

$ nixproc-supervisord-switch

resulting in two webapp process instances running in the container:

$ supervisorctl 
nginx                            RUNNING   pid 847, uptime 0:00:08
webapp                           RUNNING   pid 459, uptime 0:05:54
webapp2                          RUNNING   pid 846, uptime 0:00:08
supervisor>

The first instance: webapp was left untouched, because its configuration was not changed.

The second instance: webapp2 can be reached as follows:

$ curl -H 'Host: webapp2.local' http://localhost:8080
<!DOCTYPE html>
<html>
  <head>
    <title>Simple test webapp</title>
  </head>
  <body>
    Simple test webapp listening on port: 5001
  </body>
</html>

After upgrading the system, the new configuration should also get reactivated after a container restart.

A more interesting example: Hydra


As explained earlier, to create upgradable containers we require a fully functional Nix installation in a container. This observation made a think about a more interesting example than the trivial web application system.

A prominent example of a system that requires Nix and is composed out of multiple tightly integrated process is Hydra: the Nix-based continuous integration service.

To make it possible to deploy a minimal Hydra service in a container, I have packaged all its relevant components for the Nix process management framework.

The processes model looks as follows:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  nix-processmgmt-services = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt-services.git;
    ref = "master";
  };

  constructors = import "${nix-processmgmt-services}/services-agnostic/constructors.nix" {
    inherit nix-processmgmt pkgs stateDir runtimeDir logDir tmpDir cacheDir forceDisableUserChange processManager;
  };

  instanceSuffix = "";
  hydraUser = hydraInstanceName;
  hydraInstanceName = "hydra${instanceSuffix}";
  hydraQueueRunnerUser = "hydra-queue-runner${instanceSuffix}";
  hydraServerUser = "hydra-www${instanceSuffix}";
in
rec {
  nix-daemon = {
    pkg = constructors.nix-daemon;
  };

  postgresql = rec {
    port = 5432;
    postgresqlUsername = "postgresql";
    postgresqlPassword = "postgresql";
    socketFile = "${runtimeDir}/postgresql/.s.PGSQL.${toString port}";

    pkg = constructors.simplePostgresql {
      inherit port;
      authentication = ''
        # TYPE  DATABASE   USER   ADDRESS    METHOD
        local   hydra      all               ident map=hydra-users
      '';
      identMap = ''
        # MAPNAME       SYSTEM-USERNAME          PG-USERNAME
        hydra-users     ${hydraUser}             ${hydraUser}
        hydra-users     ${hydraQueueRunnerUser}  ${hydraUser}
        hydra-users     ${hydraServerUser}       ${hydraUser}
        hydra-users     root                     ${hydraUser}
        # The postgres user is used to create the pg_trgm extension for the hydra database
        hydra-users     postgresql               postgresql
      '';
    };
  };

  hydra-server = rec {
    port = 3000;
    hydraDatabase = hydraInstanceName;
    hydraGroup = hydraInstanceName;
    baseDir = "${stateDir}/lib/${hydraInstanceName}";
    inherit hydraUser instanceSuffix;

    pkg = constructors.hydra-server {
      postgresqlDBMS = postgresql;
      user = hydraServerUser;
      inherit nix-daemon port instanceSuffix hydraInstanceName hydraDatabase hydraUser hydraGroup baseDir;
    };
  };

  hydra-evaluator = {
    pkg = constructors.hydra-evaluator {
      inherit nix-daemon hydra-server;
    };
  };

  hydra-queue-runner = {
    pkg = constructors.hydra-queue-runner {
      inherit nix-daemon hydra-server;
      user = hydraQueueRunnerUser;
    };
  };

  apache = {
    pkg = constructors.reverseProxyApache {
      dependency = hydra-server;
      serverAdmin = "admin@localhost";
    };
  };
}

In the above processes model, each process instance represents a component of a Hydra installation:

  • The nix-daemon process is a service that comes with Nix package manager to facilitate multi-user package installations. The nix-daemon carries out builds on behalf of a user.

    Hydra requires it to perform builds as an unprivileged Hydra user and uses the Nix protocol to more efficiently orchestrate large builds.
  • Hydra uses a PostgreSQL database backend to store data about projects and builds.

    The postgresql process refers to the PostgreSQL database management system (DBMS) that is configured in such a way that the Hydra components are authorized to manage and modify the Hydra database.
  • hydra-server is the front-end of the Hydra service that provides a web user interface. The initialization procedure of this service is responsible for initializing the Hydra database.
  • The hydra-evaluator regularly updates the repository checkouts and evaluates the Nix expressions to decide which packages need to be built.
  • The hydra-queue-runner builds all jobs that were evaluated by the hydra-evaluator.
  • The apache server is used as a reverse proxy server forwarding requests to the hydra-server.

With the following commands, we can build the image, load it into Docker, and deploy a container that runs Hydra:

$ nix-build hydra-image.nix
$ docker load -i result
$ docker run -it --name hydra-test --network host hydra:test

After deploying the system, we can connect to the container:

$ docker exec -it hydra-test /bin/bash

and observe that all processes are running and managed by supervisord:

$ supervisorctl
apache                           RUNNING   pid 1192, uptime 0:00:42
hydra-evaluator                  RUNNING   pid 1297, uptime 0:00:38
hydra-queue-runner               RUNNING   pid 1296, uptime 0:00:38
hydra-server                     RUNNING   pid 1188, uptime 0:00:42
nix-daemon                       RUNNING   pid 1186, uptime 0:00:42
postgresql                       RUNNING   pid 1187, uptime 0:00:42
supervisor>

With the following commands, we can create our initial admin user:

$ su - hydra
$ hydra-create-user sander --password secret --role admin
creating new user `sander'

We can connect to the Hydra front-end in a web browser by opening http://localhost (this works because the container uses host networking):


and configure a job set to a build a project, such as libprocreact:


Another nice bonus feature of having multiple process managers supported is that if we build Hydra's Nix process management configuration for Disnix, we can also visualize the deployment architecture of the system with disnix-visualize:


The above diagram displays the following properties:

  • The outer box indicates that we are deploying to a single machine: localhost
  • The inner box indicates that all components are managed as processes
  • The ovals correspond to process instances in the processes model and the arrows denote dependency relationships.

    For example, the apache reverse proxy has a dependency on hydra-server, meaning that the latter process instance should be deployed first, otherwise the reverse proxy is not able to forward requests to it.

Building a Nix-enabled container image


As explained in the previous section, mutable Docker images require a fully functional Nix package manager in the container.

Since this may also be an interesting sub use case, I have created a convenience function: createNixImage that can be used to build an image whose only purpose is to provide a working Nix installation:

let
  pkgs = import <nixpkgs> {};

  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  createNixImage = import "${nix-processmgmt}/nixproc/create-image-from-steps/create-nix-image.nix" {
    inherit pkgs;
  };
in
createNixImage {
  name = "foobar";
  tag = "test";
  contents = [ pkgs.mc ];
}

The above Nix expression builds a Docker image with a working Nix setup and a custom package: the Midnight Commander.

Conclusions


In this blog post, I have described a new function in the Nix process management framework: createMutableMultiProcessImage that creates reproducible mutable multi-process container images, by combining the reproducibility properties of Docker and Nix. With the exception of the process manager, process instances in a container can be upgraded without bringing the entire container down.

With this new functionality, the deployment workflow of a multi-process container configuration has become very similar to how physical and virtual machines are managed with NixOS -- you can edit a declarative specification of a system and run a single command-line instruction to deploy the new configuration.

Moreover, this new functionality allows us to deploy a complex, tightly coupled multi-process system, such as Hydra: the Nix-based continuous integration service. In the Hydra example case, we are using Nix for three deployment aspects: constructing the Docker image, deploying the multi-process system configuration and building the projects that are configured in Hydra.

A big drawback of mutable multi-process images is that there is no sharing possible between multiple multi-process containers. Since the images are not built from common layers, the Nix store is private to each container and all packages are deployed in the writable custom layer, this may lead to substantial disk and RAM overhead per container instance.

Deploying the processes model to a container instance can probably be made more convenient by using Nix flakes -- a new Nix feature that is still experimental. With flakes we can easily deploy an arbitrary number of Nix expressions to a container and pin the deployment to a specific version of Nixpkgs.

Another interesting observation is the word: mutable. I am not completely sure if it is appropriate -- both the layers of a Docker image, as well as the Nix store paths are immutable and never change after they have been built. For both solutions, immutability is an important ingredient in making sure that a deployment is reproducible.

I have decided to still call these deployments mutable, because I am looking at the problem from a Docker perspective -- the writable layer of the container (that is mounted on top of the immutable layers of an image) is modified each time that we upgrade a system.

Future work


Although I am quite happy with the ability to create mutable multi-process containers, there is still quite a bit of work that needs to be done to make the Nix process management framework more usable.

Most importantly, trying to deploy Hydra revealed all kinds of regressions in the framework. To cope with all these breaking changes, a structured testing approach is required. Currently, such an approach is completely absent.

I could also (in theory) automate the still missing parts of Hydra. For example, I have not automated the process that updates the garbage collector roots, which needs to run in a timely manner. To solve this, I need to use a cron service or systemd timer units, which is beyond the scope of my experiment.

Availability


The createMutableMultiProcessImage function is part of the experimental Nix process management framework GitHub repository that is still under heavy development.

Because the amount of services that can be deployed with the framework has grown considerably, I have moved all non-essential services (not required for testing) into a separate repository. The Hydra constructor functions can be found in this repository as well.

Monday, February 1, 2021

Developing an s6-rc backend for the Nix process management framework

One of my major blog topics last year was my experimental Nix process management framework, that is still under heavy development.

As explained in many of my earlier blog posts, one of its major objectives is to facilitate high-level deployment specifications of running processes that can be translated to configurations for all kinds of process managers and deployment solutions.

The backends that I have implemented so far, were picked for the following reasons:

  • Multiple operating systems support. The most common process management service was chosen for each operating system: On Linux, sysvinit (because this used to be the most common solution) and systemd (because it is used by many conventional Linux distributions today), bsdrc on FreeBSD, launchd for macOS, and cygrunsrv for Cygwin.
  • Supporting unprivileged user deployments. To supervise processes without requiring a service that runs on PID 1, that also works for unprivileged users, supervisord is very convenient because it was specifically designed for this purpose.
  • Docker was selected because it is a very popular solution for managing services, and process management is one of its sub responsibilities.
  • Universal process management. Disnix was selected because it can be used as a primitive process management solution that works on any operating system supported by the Nix package manager. Moreover, the Disnix services model is a super set of the processes model used by the process management framework.

Not long after writing my blog post about the process manager-agnostic abstraction layer, somebody opened an issue on GitHub with the suggestion to also support s6-rc. Although I was already aware that more process/service management solutions exist, s6-rc was a solution that I did not know about.

Recently, I have implemented the suggested s6-rc backend. Although deploying s6-rc services now works quite conveniently, getting to know s6-rc and its companion tools was somewhat challenging for me.

In this blog post, I will elaborate about my learning experiences and explain how the s6-rc backend was implemented.

The s6 tool suite


s6-rc is a software projected published on skarnet and part of a bigger tool ecosystem. s6-rc is a companion tool of s6: skarnet.org's small & secure supervision software suite.

On Linux and many other UNIX-like systems, the initialization process (typically /sbin/init) is a highly critical program:

  • It is the first program loaded by the kernel and responsible for setting the remainder of the boot procedure in motion. This procedure is responsible for mounting additional file systems, loading device drivers, and starting essential system services, such as SSH and logging services.
  • The PID 1 process supervises all processes that were directly loaded by it, as well as indirect child processes that get orphaned -- when this happens they get automatically adopted by the process that runs as PID 1.

    As explained in an earlier blog post, traditional UNIX services that daemonize on their own, deliberately orphan themselves so that they remain running in the background.
  • When a child process terminates, the parent process must take notice or the terminated process will stay behind as a zombie process.

    Because the PID 1 process is the common ancestor of all other processes, it is required to automatically reap all relevant zombie processes that become a child of it.
  • The PID 1 process runs with root privileges and, as a result, has full access to the system. When the security of the PID 1 process gets compromised, the entire system is at risk.
  • If the PID 1 process crashes, the kernel crashes (and hence the entire system) with a kernel panic.

There are many kinds of programs that you can use as a system's PID 1. For example, you can directly use a shell, such as bash, but is far more common to use an init system, such as sysvinit or systemd.

According to the author of s6, an init system is made out of four parts:

  1. /sbin/init: the first userspace program that is run by the kernel at boot time (not counting an initramfs).
  2. pid 1: the program that will run as process 1 for most of the lifetime of the machine. This is not necessarily the same executable as /sbin/init, because /sbin/init can exec into something else.
  3. a process supervisor.
  4. a service manager.

In the s6 tool eco-system, most of these parts are implemented by separate tools:

  • The first userspace program: s6-linux-init takes care of the coordination of the initialization process. It does a variety of one-time boot things: for example, it traps the ctrl-alt-del keyboard combination, it starts the shutdown daemon (that is responsible for eventually shutting down the system), and runs the initial boot script (rc.init).

    (As a sidenote: this is almost true -- the /sbin/init process is a wrapper script that "execs" into s6-linux-linux-init with the appropriate parameters).
  • When the initialization is done, s6-linux-init execs into a process called s6-svscan provided by the s6 toolset. s6-svscan's task is to supervise an entire process supervision tree, which I will explain later.
  • Starting and stopping services is done by a separate service manager started from the rc.init script. s6-rc is the most prominent option (that we will use in this blog post), but also other tools can be used.

Many conventional init systems, implement most (or sometimes all) of these aspects in a single executable.

In particular, the s6 author is highly critical of systemd: the init system that is widely used by many conventional Linux distributions today -- he dedicated an entire page with criticisms about it.

The author of s6 advocates a number of design principles for his tool eco-system (that systemd violates in many ways):

  • The Unix philosophy: do one job and do it well.
  • Doing less instead of more (preventing feature creep).
  • Keeping tight quality control over every tool by only opening up repository access to small teams only (or rather a single person).
  • Integration support: he is against the bazaar approach on project level, but in favor of the bazaar approach on an eco-system level in which everybody can write their own tools that integrate with existing tools.

The concepts implemented by the s6 tool suite were not completely "invented" from scratch. daemontools is what the author considers the ancestor of s6 (if you look at the web page then you will notice that the concept of a "supervision tree" was pioneered there and that some of the tools listed resemble the same tools in the s6 tool suite), and runit its cousin (that is also heavily inspired by daemontools).

A basic usage scenario of s6 and s6-rc


Although it is possible to use Linux distributions in which the init system, supervisor and service manager are all provided by skarnet tools, a sub set of s6 and s6-rc can also be used on any Linux distribution and other supported operating systems, such as the BSDs.

Root privileges are not required to experiment with these tools.

For example, with the following command we can use the Nix package manager to deploy the s6 supervision toolset in a development shell session:

$ nix-shell -p s6

In this development shell session, we can start the s6-svscan service as follows:

$ mkdir -p $HOME/var/run/service
$ s6-svscan $HOME/var/run/service

The s6-svscan is a service that supervises an entire process supervision tree, including processes that may accidentally become a child of it, such as orphaned processes.

The directory parameter is a scan directory that maintains the configurations of the processes that are currently supervised. So far, no supervised process have been deployed yet.

We can actually deploy services by using the s6-rc toolset.

For example, I can easily configure my trivial example system used in previous blog posts that consists of one or multiple web application processes (with an embedded HTTP server) returning static HTML pages and an Nginx reverse proxy that forwards requests to one of the web application processes based on the appropriate virtual host header.

Contrary to the other process management solutions that I have investigated earlier, s6-rc does not have an elaborate configuration language. It does not implement a parser (for very good reasons as explained by the author, because it introduces extra complexity and bugs).

Instead, you have to create directories with text files, in which each file represents a configuration property.

With the following command, I can spawn a development shell with all the required utilities to work with s6-rc:

$ nix-shell -p s6 s6-rc execline

The following shell commands create an s6-rc service configuration directory and a configuration for a single webapp process instance:

$ mkdir -p sv/webapp
$ cd sv/webapp

$ echo "longrun" > type

$ cat > run <<EOF
$ #!$(type -p execlineb) -P

envfile $HOME/envfile
exec $HOME/webapp/bin/webapp
EOF

The above shell script creates a configuration directory for a service named: webapp with the following properties:

  • It creates a service with type: longrun. A long run service deploys a process that runs in the foreground that will get supervised by s6.
  • The run file refers to an executable that starts the service. For s6-rc services it is common practice to implement wrapper scripts using execline: a non-interactive scripting language.

    The execline script shown above loads an environment variable config file with the following content: PORT=5000. This environment variable is used to configure the TCP port number to which the service should bind to and then "execs" into a new process that runs the webapp process.

    (As a sidenote: although it is a common habit to use execline for writing wrapper scripts, this is not a hard requirement -- any executable implemented in any language can be used. For example, we could also write the above run wrapper script as a bash script).

We can also configure the Nginx reverse proxy service in a similar way:

$ mkdir -p ../nginx
$ cd ../nginx

$ echo "longrun" > type
$ echo "webapp" > dependencies

$ cat > run <<EOF
$ #!$(type -p execlineb) -P

foreground { mkdir -p $HOME/var/nginx/logs $HOME/var/cache/nginx }
exec $(type -p nginx) "-p" "$HOME/var/nginx" "-c" "$HOME/nginx/nginx.conf" "-g" "daemon off;"
EOF

The above shell script creates a configuration directory for a service named: nginx with the following properties:

  • It again creates a service of type: longrun because Nginx should be started as a foreground process.
  • It declares the webapp service (that we have configured earlier) a dependency ensuring that webapp is started before nginx. This dependency relationship is important to prevent Nginx doing a redirect to a non-existent service.
  • The run script first creates all mandatory state directories and finally execs into the Nginx process, with a configuration file using the above state directories, and turning off daemon mode so that it runs in the foreground.

In addition to configuring the above services, we also want to deploy the system as a whole. This can be done by creating bundles that encapsulate collections of services:

mkdir -p ../default
cd ../default

echo "bundle" > type

cat > contents <<EOF
webapp
nginx
EOF

The above shell instructions create a bundle named: default referring to both the webapp and nginx reverse proxy service that we have configured earlier.

Our s6-rc configuration directory structure looks as follows:

$ find ./sv
./sv
./sv/default
./sv/default/contents
./sv/default/type
./sv/nginx/run
./sv/nginx/type
./sv/webapp/dependencies
./sv/webapp/run
./sv/webapp/type

If we want to deploy the service directory structure shown above, we first need to compile it into a configuration database. This can be done with the following command:

$ mkdir -p $HOME/etc/s6/rc
$ s6-rc-compile $HOME/etc/s6/rc/compiled-1 $HOME/sv

The above command creates a compiled database file in: $HOME/etc/s6/rc/compiled-1 stored in: $HOME/sv.

With the following command we can initialize the s6-rc system with our compiled configuration database:

$ s6-rc-init -c $HOME/etc/s6/rc/compiled-1 -l $HOME/var/run/s6-rc \
  $HOME/var/run/service

The above command generates a "live directory" in: $HOME/var/run/s6-rc containing the state of s6-rc.

With the following command, we can start all services in the: default bundle:

$ s6-rc -l $HOME/var/run/s6-rc -u change default

The above command deploys a running system with the following process tree:


As as can be seen in the diagram above, the entire process tree is supervised by s6-svscan (the program that we have started first). Every longrun service deployed by s6-rc is supervised by a process named: s6-supervise.

Managing service logging


Another important property of s6 and s6-rc is the way it handles logging. By default, all output that the supervised processes produce on the standard output and standard error are captured by s6-svscan and written to a single log stream (in our case, it will be redirected to the terminal).

When it is desired to capture the output of a service into its own dedicated log file, you need to configure the service in such a way that it writes all relevant information to a pipe. A companion logging service is required to capture the data that is sent over the pipe.

The following command-line instructions modify the webapp service (that we have created earlier) to let it send its output to another service:

$ cd sv
$ mv webapp webapp-srv
$ cd webapp-srv

$ echo "webapp-log" > producer-for
$ cat > run <<EOF
$ #!$(type -p execlineb) -P

envfile $HOME/envfile
fdmove -c 2 1
exec $HOME/webapp/bin/webapp
EOF

In the script above, we have changed the webapp service configuration as follows:

  • We rename the service from: webapp to webapp-srv. Using suffixes is a convention commonly used for s6-rc services that also have a log companion service.
  • With the producer-for property, we specify that the webapp-srv is a service that produces output for another service named: webapp-log. We will configure this service later.
  • We create a new run script that adds the following command: fdmove -c 2 1.

    The purpose of this added instruction is to redirect all output that is sent over the standard error (file descriptor: 2) to the standard output (file descriptor: 1). This redirection makes it possible that all data can be captured by the log companion service.

We can configure the log companion service: webapp-log with the following command-line instructions:

$ mkdir ../webapp-log
$ cd ../webapp-log

$ echo "longrun" > type
$ echo "webapp-srv" > consumer-for
$ echo "webapp" > pipeline-name
$ echo 3 > notification-fd

$ cat > run <<EOF
#!$(type -p execlineb) -P

foreground { mkdir -p $HOME/var/log/s6-log/webapp }
exec -c s6-log -d3 $HOME/var/log/s6-log/webapp
EOF

The service configuration created above does the following:

  • We create a service named: webapp-log that is a long running service.
  • We declare the service to be a consumer for the webapp-srv (earlier, we have already declared the companion service: webapp-srv to be a producer for this logging service).
  • We configure a pipeline name: webapp causing s6-rc to automatically generate a bundle with the name: webapp in which all involved services are its contents.

    This generated bundle allows us to always manage the service and logging companion as a single deployment unit.
  • The s6-log service supports readiness notifications. File descriptor: 3 is configured to receive that notification.
  • The run script creates the log directory in which the output should be stored and starts the s6-log service to capture the output and store the data in the corresponding log directory.

    The -d3 parameter instructs it to send a readiness notification over file descriptor 3.

After modifying the configuration files in such a way that each longrun service has a logging companion, we need to compile a new database that provides s6-rc our new configuration:

$ s6-rc-compile $HOME/etc/s6/rc/compiled-2 $HOME/sv

The above command creates a database with a new filename in: $HOME/etc/s6/rc/compiled-2. We are required to give it a new name -- the old configuration database (compiled-1) must be retained to make the upgrade process work.

With the following command, we can upgrade our running configuration:

$ s6-rc-update -l $HOME/var/run/s6-rc $HOME/etc/s6/rc/compiled-2

The result is the following process supervision tree:


As you may observe by looking at the diagram above, every service has a companion s6-log service that is responsible for capturing and storing its output.

The log files of the services can be found in $HOME/var/log/s6-log/webapp and $HOME/var/log/s6-log/nginx.

One shot services


In addition to longrun services that are useful for managing system services, more aspects need to be automated in a boot process, such as mounting file systems.

These kinds of tasks can be automated with oneshot services, that execute an up script on startup, and optionally, a down script on shutdown.

The following service configuration can be used to mount the kernel's /proc filesystem:

mkdir -p ../mount-proc
cd ../mount-proc

echo "oneshot" > type

cat > run <<EOF
$ #!$(type -p execlineb) -P
foreground { mount -t proc proc /proc }
EOF

Chain loading


The execline scripts shown in this blog post resemble shell scripts in many ways. One particular aspect that sets execline scripts apart from shell scripts is that all commands make intensive use of a concept called chain loading.

Every instruction in an execline script executes a task, may imperatively modify the environment (e.g. by changing environment variables, or changing the current working directory etc.) and then "execs" into a new chain loading task.

The last parameter of each command-line instruction refers to the command-line instruction that it needs to "execs into" -- typically this command-line instruction is put on the next line.

The execline package, as well as many packages in the s6 ecosystem, contain many programs that support chain loading.

It is also possible to implement custom chain loaders that follow the same protocol.

Developing s6-rc function abstractions for the Nix process management framework


In the Nix process management framework, I have added function abstractions for each s6-rc service type: longrun, oneshot and bundle.

For example, with the following Nix expression we can generate an s6-rc longrun configuration for the webapp process:

{createLongRunService, writeTextFile, execline, webapp}:

let
  envFile = writeTextFile {
    name = "envfile";
    text = ''
      PORT=5000
    '';
  };
in
createLongRunService {
  name = "webapp";
  run = writeTextFile {
    name = "run";
    executable = true;
    text = ''
      #!${execline}/bin/execlineb -P

      envfile ${envFile}
      fdmove -c 2 1
      exec ${webapp}/bin/webapp
    '';
  };
  autoGenerateLogService = true;
}

Evaluating the Nix expression above does the following:

  • It generates a service directory that corresponds to the: name parameter with a longrun type property file.
  • It generates a run execline script, that uses a generated envFile for configuring the service's port number, redirects the standard error to the standard output and starts the webapp process (that runs in the foreground).
  • The autoGenerateLogService parameter is a concept I introduced myself, to conveniently configure a companion log service, because this a very common operation -- I cannot think of any scenario in which you do not want to have a dedicated log file for a long running service.

    Enabling this option causes the service to automatically become a producer for the log companion service (having the same name with a -log suffix) and automatically configures a logging companion service that consumes from it.

In addition to constructing long run services from Nix expressions, there are also abstraction functions to create one shots: createOneShotService and bundles: createServiceBundle.

The function that generates a log companion service can also be directly invoked with: createLogServiceForLongRunService, if desired.

Generating a s6-rc service configuration from a process-manager agnostic configuration


The following Nix expression is a process manager-agnostic configuration for the webapp service, that can be translated to a configuration for any supported process manager in the Nix process management framework:

{createManagedProcess, tmpDir}:
{port, instanceSuffix ? "", instanceName ? "webapp${instanceSuffix}"}:

let
  webapp = import ../../webapp;
in
createManagedProcess {
  name = instanceName;
  description = "Simple web application";
  inherit instanceName;

  process = "${webapp}/bin/webapp";
  daemonArgs = [ "-D" ];

  environment = {
    PORT = port;
  };

  overrides = {
    sysvinit = {
      runlevels = [ 3 4 5 ];
    };
  };
}

The Nix expression above specifies the following high-level configuration concepts:

  • The name and description attributes are just meta data. The description property is ignored by the s6-rc generator, because s6-rc has no equivalent configuration property for capturing a description.
  • A process manager-agnostic configuration can specify both how the service can be started as a foreground process or as a process that daemonizes itself.

    In the above example, the process attribute specifies that the same executable needs to invoked for both a foregroundProcess and daemon. The daemonArgs parameter specifies the command-line arguments that need to be propagated to the executable to let it daemonize itself.

    s6-rc has a preference for managing foreground processes, because these can be more reliably managed. When a foregroundProcess executable can be inferred, the generator will automatically compose a longrun service making it possible for s6 to supervise it.

    If only a daemon can be inferred, the generator will compose a oneshot service that starts the daemon with the up script, and on shutdown, terminates the daemon by dereferencing the PID file in the down script.
  • The environment attribute set parameter is automatically translated to an envfile that the generated run script consumes.
  • Similar to the sysvinit backend, it is also possible to override the generated arguments for the s6-rc backend, if desired.

As already explained in the blog post that covers the framework's concepts, the Nix expression above needs to be complemented with a constructors expression that composes the common parameters of every process configuration and a processes model that constructs process instances that need to be deployed.

The following processes model can be used to deploy a webapp process and an nginx reverse proxy instance that connects to it:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  constructors = import ./constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir;
    inherit forceDisableUserChange processManager;
  };
in
rec {
  webapp = rec {
    port = 5000;
    dnsName = "webapp.local";

    pkg = constructors.webapp {
      inherit port;
    };
  };

  nginx = rec {
    port = 8080;

    pkg = constructors.nginxReverseProxyHostBased {
      webapps = [ webapp ];
      inherit port;
    } {};
  };
}

With the following command-line instruction, we can automatically create a scan directory and start s6-svscan:

$ nixproc-s6-svscan --state-dir $HOME/var

The --state-dir causes the scan directory to be created in the user's home directory making unprivileged deployments possible.

With the following command, we can deploy the entire system, that will get supervised by the s6-svscan service that we just started:

$ nixproc-s6-rc-switch --state-dir $HOME/var \
  --force-disable-user-change processes.nix

The --force-disable-user-change parameter prevents the deployment system from creating users and groups and changing user privileges, allowing the deployment as an unprivileged user to succeed.

The result is a running system that allows us to connect to the webapp service via the Nginx reverse proxy:

$ curl -H 'Host: webapp.local' http://localhost:8080
<!DOCTYPE html>
<html>
  <head>
    <title>Simple test webapp</title>
  </head>
  <body>
    Simple test webapp listening on port: 5000
  </body>
</html>

Constructing multi-process Docker images supervised by s6


Another feature of the Nix process management framework is constructing multi-process Docker images in which multiple process instances are supervised by a process manager of choice.

s6 can also be used as a supervisor in a container. To accomplish this, we can use s6-linux-init as an entry point.

The following attribute generates a skeleton configuration directory:

let
  skelDir = pkgs.stdenv.mkDerivation {
    name = "s6-skel-dir";
    buildCommand = ''
      mkdir -p $out
      cd $out

      cat > rc.init <<EOF
      #! ${pkgs.stdenv.shell} -e
      rl="\$1"
      shift

      # Stage 1
      s6-rc-init -c /etc/s6/rc/compiled /run/service
      
      # Stage 2
      s6-rc -v2 -up change default
      EOF
      
      chmod 755 rc.init
      
      cat > rc.shutdown <<EOF
      #! ${pkgs.stdenv.shell} -e
      
      exec s6-rc -v2 -bDa change
      EOF

      chmod 755 rc.shutdown
      
      cat > rc.shutdown.final <<EOF
      #! ${pkgs.stdenv.shell} -e
      # Empty
      EOF
      chmod 755 rc.shutdown.final
    '';
  };

The skeleton directory generated by the above sub expression contains three configuration files:

  • rc.init is the script that the init system starts, right after starting the supervisor: s6-svscan. It is responsible for initializing the s6-rc system and starting all services in the default bundle.
  • rc.shutdown script is executed on shutdown and stops all previously started services by s6-rc.
  • rc.shutdown.final runs at the very end of the shutdown procedure, after all processes have been killed and all file systems have been unmounted. In the above expression, it does nothing.

In the initialization process of the image (the runAsRoot parameter of dockerTools.buildImage), we need to execute a number of dynamic initialization steps.

First, we must initialize s6-linux-init to read its configuration files from /etc/s6/current using the skeleton directory (that we have configured in the sub expression shown earlier) as its initial contents (the -f parameter) and run the init system in container mode (the -C parameter):

mkdir -p /etc/s6
s6-linux-init-maker -c /etc/s6/current -p /bin -m 0022 -f ${skelDir} -N -C -B /etc/s6/current
mv /etc/s6/current/bin/* /bin
rmdir etc/s6/current/bin

s6-linux-init-maker generates an /bin/init script, that we can use as the container's entry point.

I want the logging services to run as an unprivileged user (s6-log) requiring me to create the user and corresponding group first:

groupadd -g 2 s6-log
useradd -u 2 -d /dev/null -g s6-log s6-log

We must also compile a database from the s6-rc configuration files, by running the following command-line instructions:

mkdir -p /etc/s6/rc
s6-rc-compile /etc/s6/rc/compiled ${profile}/etc/s6/sv

As can be seen in the rc.init script that we have generated earlier, the compiled database: /etc/s6/rc/compiled is propagated to s6-rc-init as a command-line parameter.

With the following Nix expression, we can build an s6-rc managed multi-process Docker image that deploys all the process instances in the processes model that we have written earlier:

let
  pkgs = import <nixpkgs> {};

  createMultiProcessImage = import ../../nixproc/create-multi-process-image/create-multi-process-image-universal.nix {
    inherit pkgs system;
    inherit (pkgs) dockerTools stdenv;
  };
in
createMultiProcessImage {
  name = "multiprocess";
  tag = "test";
  exprFile = ./processes.nix;
  stateDir = "/var";
  processManager = "s6-rc";
}

With the following command, we can build the image:

$ nix-build

and load the image into Docker with the following command:

$ docker load -i result

Discussion


With the addition of the s6-rc backend in the Nix process management framework, we have a modern alternative to systemd at our disposal.

We can easily let services be managed by s6-rc using the same agnostic high-level deployment configurations that can also be used to target other process management backends, including systemd.

What I particularly like about the s6 tool ecosystem (and this also applies in some extent to its ancestor: daemontools and cousin project: runit) is the idea to construct the entire system's initialization process and its sub concerns (process supervision, logging and service management) from separate tools, each having clear/fixed scopes.

This kind of design reminds me of microkernels -- in a microkernel design, the kernel is basically split into multiple collaborating processes each having their own responsibilities (e.g. file systems, drivers).

The microkernel is the only process that has full access to the system and typically only has very few responsibilities (e.g. memory management, task scheduling, interrupt handling).

When a process crashes, such as a driver, this failure should not tear the entire system down. Systems can even recover from problems, by restarting crashed processes.

Furthermore, these non-kernel processes typically have very few privileges. If a process' security gets compromised (such as a leaky driver), the system as a whole will not be affected.

Aside from a number of functional differences compared to systemd, there are also some non-functional differences as well.

systemd can only be used on Linux using glibc as the system's libc, s6 can also be used on different operating systems (e.g. the BSDs) with different libc implementations, such as musl.

Moreover, the supervisor service (s6-svscan) can also be used as a user-level supervisor that does not need to run as PID 1. Although systemd supports user sessions (allowing service deployments from unprivileged users), it still has the requirement to have systemd as an init system that needs to run as the system's PID 1.

Improvement suggestions


Although the s6 ecosystem provides useful tools and has all kinds of powerful features, I also have a number of improvement suggestions. They are mostly usability related:

  • I have noticed that the command-line tools have very brief help pages -- they only enumerate the available options, but they do not provide any additional information explaining what these options do.

    I have also noticed that there are no official manpages, but there is a third-party initiative that seems to provide them.

    The "official" source of reference are the HTML pages. For me personally, it is not always convenient to access HTML pages on limited machines with no Internet connection and/or only terminal access.
  • Although each individual tool is well documented (albeit in HTML), I was having quite a few difficulties figuring out how to use them together -- because every tool has a very specific purpose, you typically need to combine them in interesting ways to do something meaningful.

    For example, I could not find any clear documentation on skarnet describing typical combined usage scenarios, such as how to use s6-rc on a conventional Linux distribution that already has a different service management solution.

    Fortunately, I discovered a Linux distribution that turned out to be immensely helpful: Artix Linux. Artix Linux provides s6 as one of its supported process management solutions. I ended up installing Artix Linux in a virtual machine and reading their documentation.

    This kind of unclarity seems to be somewhat analogous to common criticisms of microkernels: one of Linus Torvalds' criticisms is that in microkernel designs, the pieces are simplified, but the coordination of the entire system is more difficult.
  • Updating existing service configurations is difficult and cumbersome. Each time I want to change something (e.g. adding a new service), then I need to compile a new database, make sure that the newly compiled database co-exists with the previous database, and then run s6-rc-update.

    It is very easy to make mistakes. For example, I ended up overwriting the previous database several times. When this happens, the upgrade process gets stuck.

    systemd, on the other hand, allows you to put a new service configuration file in the configuration directory, such as: /etc/systemd/system. We can conveniently reload the configuration with a single command-line instruction:

    $ systemctl daemon-reload
        
    I believe that the updating process can still be somewhat simplified in s6-rc. Fortunately, I have managed to hide that complexity in the nixproc-s6-rc-deploy tool.
  • It was also difficult to find out all the available configuration properties for s6-rc services -- I ended up looking at the examples and studying the documentation pages for s6-rc-compile, s6-supervise and service directories.

    I think that it could be very helpful to write a dedicated documentation page that describes all configurable properties of s6-rc services.
  • I believe it is also very common that for each longrun service (with a -srv suffix), that you want a companion logging service (with a -log suffix).

    As a matter of fact, I can hardly think of a situation in which you do not want this. Maybe it helps to introduce a convenience property to automatically facilitate the generation of log companion services.

Availability


The s6-rc backend described in this blog post is part of the current development version of the Nix process management framework, that is still under heavy development.

The framework can be obtained from my GitHub page.

Thursday, December 31, 2020

Annual blog reflection over 2020

In my previous blog post that I wrote yesterday, I celebrated my blog's 10th anniversary and did a reflection over the last decade. However, I did not elaborate much about 2020.

Because 2020 is a year for the history books, I have decided to also do an annual reflection over the last year (similar to my previous annual blog reflections).

A summary of blog posts written in 2020


Nearly all of the blog posts that I have written this year were in service of only two major goals: developing the Nix process management framework and implementing service container support in Disnix.

Both of them took a substantial amount of development effort. Much more than I initially anticipated.

Investigating process management


I already started working on this topic last year. In November 2019, I wrote a blog post about packaging sysvinit scripts and a Nix-based functional organization for configuring process instances, that could potentially also be applied to other process management solutions, such as systemd and supervisord.

After building my first version of a framework (which was already a substantial leap in reaching my full objective), I thought it would not take me that much time to get all details that I originally planned finished. It turns out that I heavily underestimated the complexity.

To test my framework, I needed a simple test program that could daemonize on its own, which was (and still is) a common practice for running services on Linux and many other UNIX-like operating systems.

I thought writing such a tool that daemonizes would be easy, but after some research, I discovered that it is actually quite complicated to do it properly. I wrote a blog post about my findings.

It took me roughly three months to finish the first implementation of the process manager-agnostic abstraction layer that makes it possible to write a high-level specification of a running process, that could universally target all kinds of process managers, such as sysvinit, systemd, supervisord and launchd.

After completing the abstraction layer, I also discovered that a sufficiently high-level deployment specification of running processes could also target other kinds deployment solutions.

I have developed two additional backends for the Nix process management framework: one that uses Docker and another using Disnix. Both solutions are technically not qualified as process managers, but they can still be used as such by only using a limited set of features of these tools.

To be able to develop the Docker backend, I needed to dive deep into the underlying concepts of Docker. I wrote a blog post about the relevant Docker deployment concepts, and also gave a presentation about it at Mendix.

While implementing more examples, I also realized that to more securely run long-running services, they typically need to run as unprivileged users. To get predictable results, these unprivileged users require stable user IDs and group IDs.

Several years ago, I have already worked on a port assigner tool that could already assign unique TCP/UDP port numbers to services, so that multiple instances can co-exist.

I have extended the port assigner tool to assign arbitrary numeric IDs to generically solve this problem. In turns out that implementing this tool was much more difficult than expected -- the Dynamic Disnix toolset was originally developed under very high time pressure and had a substantial amount of technical debt.

In order to implement the numeric ID assigner tool, I needed to revise the model parsing libraries, that broke the implementations of some of the deployment planner algorithms.

To fix them, I was forced to study how these algorithms worked again. I wrote a blog post about the graph-based deployment planning algorithms and a new implementation that should be better maintainable. Retrospectively, I wish I did my homework better at the time when I wrote the original implementation.

In September, I gave a talk about the Nix process management framework at NixCon 2020, that was held online.

I pretty much reached all my objectives that I initially set for the Nix process management framework, but there is still some leftover work to bring it at an acceptable usability level -- to be able to more easily add new backends (somebody gave me s6 as an option) the transformation layer needs to be standardized.

Moreover, I still need to develop a test strategy for services so that you can be (reasonably) sure that they work with a variety of process managers and under a variety of conditions (e.g. unprivileged user deployments).

Exposing services as containers in Disnix


Disnix is a Nix-based distributed service deployment tool. Services can basically be any kind of deployment unit whose life-cycle can be managed by a companion tool called Dysnomia.

There is one practical problem though, in order to deploy a service-oriented system with Disnix, it typically requires the presence of already deployed containers (not be confused with Linux containers), that are environments in which services are managed by another service.

Some examples of container providers and corresponding services are:

  • The MySQL DBMS (as a container) and multiple hosted MySQL databases (as services)
  • Apache Tomcat (as a container) and multiple hosted Java web applications (as services)
  • systemd (as a container) and multiple hosted systemd unit configuration files (as services)

Disnix deploys the services (as described above), but not the containers. These need to be deployed by other means first.

In the past, I have been working on solutions that manage the underlying infrastructure of services as well (I typically used to call this problem domain: infrastructure deployment). For example, NixOps can deploy a network of NixOS machines that also expose container services that can be used by Disnix. It is also possible to deploy the containers as services, in a separate deployment layer managed by Disnix.

When the Nix process management framework became more usable, I wanted to make the deployment of container providers also a more accessible feature. I heavily revised Disnix with a new feature that makes it possible to expose services as container providers, making it possible to deploy both the container services and application services from a single deployment model.

To make this feature work reliably, I was again forced to revise the model transformation pipeline. This time I concluded that the lack of references in the Nix expression language was an impediment.

Another nice feature by combining the Nix process management framework and Disnix is that you can more easily deploy a heterogeneous system locally.

I have released a new version of Disnix: version 0.10, that provides all these new features.

The Monitoring playground


Besides working on the two major topics shown above, the only other thing I did was a Mendix crafting project in which I developed a monitoring playground, allowing me to locally experiment with alerting scripts, including the visualization and testing.

Some thoughts


From a blogging perspective, I am happy what I have accomplished this year -- not only have I managed to reach my usual level of productivity again (last year was somewhat disappointing), I also managed to both develop a working Nix process management framework (making it possible to use all kinds of process managers), and use Disnix to deploy both container and application services. Both of these features are on my wish list for many years.

In the Nix community, having the ability to also use other process managers than systemd, is something we have been discussing already since late 2014.

However, there are also two major things that kept me mentally occupied in the last year.

Open source work


Many blog posts are about the open source work I do. Some of my open source work is done as part of my day time job as a software engineer -- sometimes we can write a new useful feature, make an extension to an existing project that may come in handy, or do something as part of an investigation/learning project.

However, the majority of my open source work is done in my spare time -- in many cases, my motivation is not as altruistic as people may think: typically I need something to solve my own problems or there is some technical concept that I would like to explore. However, I still do a substantial amount of work to help other people or for the "greater good".

Open source projects are typically quite satisfying the work on, but they also have negative aspects (typically the negative aspects are negligible in the early stages of a project). Sadly as projects get more popular and gain more exposure, the negativity attached to them also grows.

For example, although I got quite a few positive reactions on my work on the Nix process management framework (especially at NixCon 2020), I know that not everybody is or will be happy about it.

I have worked with people in the past, who consider this kind of work a complete waste of time -- in their opinion, we already have Kubernetes that has already solved all relevant service management problems (some people even think it is a solution to all problems).

I have to admit that, while Kubernetes can be used to solve similar kind of problems (and what is not supported as a first-class feature, can still be scripted in ad-hoc ways), there is still much to think about:

  • As explained in my blog post about Docker concepts, the Nix store typically supports much more efficient sharing of common dependencies between packages than layered Docker images, resulting in much lower disk space consumption and RAM consumption of processes that have common dependencies.
  • Docker containers support the deployment of so-called microservices because of a common denominator: processes. Almost all modern operating systems and programming-languages have a notion of processes.

    As a consequence, lots of systems nowadays typically get constructed in such a way that they can be easily decomposed into processes (translating to container instances), imposing substantial overhead on each process instance (because these containers typically need to embed common sub services).

    Services can also be more efficiently deployed (in terms of storage and RAM usage) as units by managed by a common runtime (e.g. multiple Java web applications managed by Apache Tomcat or multple PHP applications managed by the Apache HTTP server).

    The latter form of reuse is now slowly disappearing, because it does not fit nicely in a container model. In Disnix, this form of reuse is a first-class concept.
  • Microservices managed by Docker (somewhat) support technology diversity, because of the fact that all major programming languages support the concept of processes.

    However, one particular kind of technology that you cannot choose is the the operating system -- Docker/Kubernetes relies on non-standardized Linux-only concepts.

    I have also been thinking about the option to pick your operating system as well: you need security, then pick: OpenBSD, you want performance, then pick: Linux etc. The Nix process management framework allows you to also target process managers on different operating systems than Linux, such as BSD rc scripts and Apple's launchd.

I personally believe that these goals are still important, and that keeps me motivated to work on it.

Furthermore, I also believe that it is important to have multiple implementations of tools that solve the same or similar kind of problems -- in the open source world, there are lot of "battles" between communities about which technology should be the standard for a certain problem.

My favourite example of such a battle is the system's process manager -- many Linux distributions nowadays have adopted systemd, but this is not without any controversy, such as in the Debian project.

It took them many years to come to the decision to adopt it, and still there are people who want to discuss "init system diversity". Likewise, there are people who find the systemd-adoption decision unacceptable, and have forked Debian into Devuan, providing a Debian-based distribution without systemd.

With the Nix process management framework the fact that systemd exists (and may not be everybody's first choice) is not a big deal -- you can actually switch to other solutions, if desired. A battle between service managers is not required. A sufficiently high-level specification of a well understood problem allows you to target multiple solutions.

Another problem I face is that these two projects are not the only projects I have been working on or maintain. There are many other projects I have been working on the past.

Sadly, I am also a very bad multitasker. If there are problems reported with my other projects, and the fix is straight forward, or there is a straight forward pull request, then it is typically no big deal to respond.

However, I also learned that some for some of the problems other people face, there is no quick fix. Sometimes I get pull requests that partially solves a problem, or in other cases: fix a specific problem, but breaks others features. These pull requests cannot always be directly accepted and also need a substantial amount of my time for reviewing.

For certain kinds of reported problems, I need to work on a fundamental revision that requires a substantial amount of development effort -- however, it is impossible to pick up such a task while working on another "major project".

Alternatively, I need to make the decision to abandon what I am currently working on and make the switch. However, this option also does not have my preference because I know it will significantly delay my original goal.

I have noticed that lots of people get dissatisfied and frustrated, including myself. Moreover, I also consider it a bad thing to feel pressure on the things I am working on in my spare time.

So what to do about it? Maybe I can write a separate blog post on this subject.

Anyway, I was not planning to abandon or stop anything. Eventually, I will pick up these other problems as well -- my strategy for now, is to do it when I am ready. People simply have to wait (so if you are reading this and waiting for something: yes, I will pick it up eventually, just be patient).

The COVID-19 crisis


The other subject that kept me (and pretty much everybody in the world) busy is the COVID-19 crisis.

I still remember the beginning of 2020 -- for me personally, it started out very well. I visited some friends that I have not seen in a long time, and then FOSDEM came, the Free and Open Source Developers European Meeting.

Already in January, I heard about this new virus that was rapidly spreading in the Wuhan region on the news. At that time, nobody in the Netherlands (or in Europe) was really worried yet. Even to questions, such as: "what will happen when it reaches Europe?", people typically responded with: "ah yes, well influenza has a impact on people too, it will not be worse!".

A few weeks later, it started to spread to countries close to Europe. The first problematic country I heard about was Iran, and a couple of weeks later it reached Italy. In Italy, it spread so rapidly that within only a few weeks, the intensive care capacity was completely drained, forcing medical personnel to make choices who could be helped and who could not.

By then, it sounded pretty serious to me. Furthermore, I was already quite sure that it was only a matter of time before it would reach the Netherlands. And indeed, at the end of February, the first COVID-19 case was reported. Apparently this person contracted the virus in Italy.

Then the spreading went quickly -- every day, more and more COVID-19 cases were reported and this amount grew exponentially. Similar to other countries, we also slowly ran into capacity problems in hospitals (materials, equipment, personnel, intensive care capacity etc.). In particular, the intensive care capacity reached at a very critical level. Fortunately, there were hospitals in Germany willing to help us out.

In March, a country-wide lockdown was announced -- basically all group activities were forbidden, schools and non-essential shops were closed, and everybody who is capable of working from should work from home. As a consequence, since March, I have been permanently working from home.

As with pretty much everybody in the world, COVID-19 has negative consequences for me as well. Fortunately, I have not much to complain about -- I did not get unemployed, I did not get sick, and also nobody in my direct neighbourhood ran into any serious problems.

The biggest implication of the COVID-19 pandemic for me is social contacts -- despite the lockdown I still regularly meet up with family and frequent acquaintances, but I have barely met any new people. For example, at Mendix, I typically came in contact with all kinds of people in the company, especially those that do not work in the same team.

Moreover, I also learned that quite a few of my contacts got isolated because of all group activities that were canceled -- for example I did not have any music rehearsals in a while, causing me not to see or speak to any of my friends there.

Same thing with conferences and meet ups -- because most of them were canceled or turned into online events, it is very difficult to have good interactions with new people.

I also did not do any traveling -- my summer holiday was basically a staycation. Fortunately, in the summer, we have managed to minimize the amount of infections, making it possible to open up public places. I visited some touristic places in the Netherlands, that are normally crowded by people from abroad. That by itself was quite interesting -- I normally tend to neglect national touristic sites.

Although the COVID-19 pandemic brought all kinds of negative things, there were also a couple of things that I consider a good thing:

  • At Mendix, we have an open office space that typically tends to be very crowded and noisy. It is not that I cannot work in such an environment, but I also realize that I do appreciate silence, especially for programming tasks that require concentration. At home, it is quiet, I have much fewer distractions and I also typically feel much less tired after a busy work day.
  • I also typically used to neglect home improvements a lot. The COVID-19 crisis helped me to finally prioritize some non-urgent home improvements tasks -- for example, on the attic, where my musical instruments are stored, I finally took the time to organize everything in such a way that I can rehearse conveniently.
  • Besides the fact that rehearsals and concerts were cancelled, I actually practiced a lot -- I even studied many advanced solo pieces that I have not looked at in years. Playing music became a standard activity between my programming tasks, to clear my mind. Normally, I would use this time to talk to people at the coffee machine in the office.
  • During busy times I also used to tend to neglect house keeping tasks a lot. I still remember (many years ago) when I just moved into my first house, doing the dishes was already a problem (I had no dish washer at that time). When working from home, it is not a problem to keep everything tidy.
  • It is also much easier to maintain healthy daily habits. In the first lockdown (that was in spring), cycling/walking/running was a daily routine that I could maintain with ease.

In the Netherlands, we have managed to overcome the first lockdown in just a matter of weeks by social distancing. Sadly, after the restrictions were relaxed we got sloppy and at the end of the summer the infection rate started to grow. We also ran into all kinds of problems to mitigate the infections -- limited test capacity, people who got tired of all the countermeasures not following the rules, illegal parties etc.

Since a couple of weeks we are in our second lockdown with a comparable level of strictness -- again, the schools and non-essentials shops are closed etc. The second lockdown feels a lot worse than the first -- now it is in the winter, people are no longer motivated (the amount of people that revolt in the Netherlands have grown substantially, including people spreading news that everything is a Hoax and/or a one big scheme organized by left-wing politicians) and it is already taking much longer than the first.

Fortunately, there is a tiny light at the end of the tunnel. In Europe, one vaccine (the Pfizer vaccine) has been approved and more submissions are pending (with good results). By Monday, the authorities will start to vaccinate people in the Netherlands.

If we can keep the infection rates and the mutations under control (such as the mutation that appeared in England) then we will eventually build up the required group immunity to finally get the situation under control (this probably is going to take many more months, but at least it is a start).

Conclusion


This elaborate reflection blog post (that is considerably longer than all my previous yearly reflections combined) reflects over 2020 that is probably a year that will no go unnoticed in the history books.

I hope everybody remains in good health and stays motivated to do what it is needed to get the virus under control.

Moreover, when the crisis is over, I also hope we can retain the positive things learned in this crisis, such as making it more a habit to allow people to work (at least partially) from home. The open-source complaint in this blog post is just a minor inconvenience compared to the COVID-19 crisis and the impact that it has on many people in the world.

The final thing I would like to say is:


HAPPY NEW YEAR!!!!!!!!!!!!