Tuesday, June 1, 2021

An unconventional method for creating backups and exchanging files


I have written many blog posts about software deployment and configuration management. For example, a couple of years ago, I have discussed a very basic configuration management process for small organizations, in which I explained that one of the worst things that could happen is that a machine breaks down and everything that it provides gets lost.

Fortunately, good configuration management practices and deployment tools (such as Nix) can help you to restore a machine's configuration with relative ease.

Another problem is managing a machine's data, which in many ways is even more important and complicated -- software packages can be typically obtained from a variety of sources, but data is typically unique (and therefore more valuable).

Even if a machine stays operational, the data that it stores can still be at risk -- it may get deleted by accident, or corrupted (for example, by the user, or a hardware problem).

It also does not matter whether a machine is used for business (for example, storing data for information systems) or personal use (for example, documents, pictures, and audio files). In both cases, data is valuable, and as a result, needs to be protected from loss and corruption.

In addition to recovery, the availability of data is often also very important -- many users (including me) typically own multiple devices (e.g. a desktop PC, laptop and phone) and typically want access to the same data from multiple places.

Because of the importance of data, I sometimes get questions from non-technical users that want to know how I manage my personal data (such as documents, images and audio files) and what tools I would recommend.

Similar to most computer users, I too have faced my own share of reliability problems -- of all the desktop computers I owned, I ended up with a completely broken hard drive three times, and a completely broken laptop once. Furthermore, I have also worked with all kinds of external media (e.g. floppy disks, CD-ROMs etc.) each having their own share of reliability problems.

To cope with data availability and loss, I came up with a custom script that I have been conveniently using to create backups and synchronize my data between the machines that I use.

In this blog post, I will explain how this script works.

About storage media


To cope with the potential loss of data, I have always made it a habit to transfer data to external media. I have worked with a variety of them, each having their advantages and disadvantages:

  • In the old days, I used floppy disks. Most people who are (at the time reading this blog post) in their early twenties or younger, may probably have no clue what I am talking about (for those people perhaps the 'Save icon' used in many desktop applications looks familiar).

    Roughly 25 years ago, floppy disks were a common means to exchange data between computers.

    Although they were common, they had many drawbacks. Probably the biggest drawback was their limited storage capacity -- I used to own 5.25 inch disks that (on PCs) were capable of storing ~360 KiB (if both sides are used), and the more sturdy 3.5 inch disks providing double density (720 KiB) and high density capacity (1.44 MiB).

    Furthermore, floppy disks were also quite slow and could be easily damaged, for example, by toughing the magnetic surface.
  • When I switched from the Commodore Amiga to the PC, I also used tapes for a while in addition to floppy disks. They provided a substantial amount of storage capacity (~500 MiB in 1996). As of 2019 (and this probably still applies to today), tapes are still considered very cheap and reliable media for archival of data.

    What I found impractical about tapes is that they are difficult to use as random access memory -- data on a tape is stored sequentially. As a consequence, it is typically very slow to find files or to "update" existing files. Typically, a backup tool needs to scan the tape from the beginning to the end or maintain a database with known storage locations.

    Many of my personal files (such as documents) are regularly updated and older versions do not have to be retained. Instead, they should be removed to clear up storage space. With tapes this is very difficult to do.
  • When writable CD/DVDs became affordable, I used them as a backup media for a while. Similar to tapes, they also have substantial storage capacity. Furthermore, they are very fast and convenient to read.

    A similar disadvantage is that they are not a very convenient medium for updating files. Although it is possible to write multi-sessions discs, in which files can be added, overwritten, or made invisible (essentially a "soft delete"), it remained inconvenient because you can not clear up the storage space that a deleted file used to occupy.

    I also learned the hard way that writable discs (and in particular rewritable discs) are not very reliable for long term storage -- I have discarded many old writable discs (10 years or older) that can no longer be read.

Nowadays, I use a variety of USB storage devices (such as memory sticks, hard drives) as backup media. They are relatively cheap, fast, have more than enough storage capacity, and I can use them as random access memory -- it is no problem at all to update and delete data existing data.

To cope with the potential breakage of USB storage media, I always make sure that I have at least two copies of my important data.

About data availability


As already explained in the introduction, I have multiple devices for which I want the same data to be available. For example, on both my desktop PC and company laptop, I want to have access to my music and research papers collection.

A possible solution is to use a shared storage medium, such as a network drive. The advantage of this approach is that there is a single source of truth and I only need to maintain a single data collection -- when I add a new document it will immediately be available to both devices.

Although a network drive may be a possible solution, it is not a good fit for my use cases -- I typically use laptops for traveling. When I am not at home, I can no longer access my data stored on the network drive.

Another solution is to transfer all required files to the hard drive on my laptop. Doing a bulk transfer for the first time is typically not a big problem (in particular, if you use orthodox file managers), but keeping collections of files up-to-date between machines is in my experience quite tedious to do by hand.

Automating data synchronization


For both backing up and synchronizing files to other machines I need to regularly compare and update files in directories. In the former case, I need to sync data between local directories, and for the latter I need to sync data between directories on remote machines.

Each time I want make updates to my files, I want to inspect what has changed, and see which files require updating before actually doing it, so that I do not end up wasting time or risk modifying the wrong files.

Initially, I started to investigate how to implement a synchronization tool myself, but quite quickly I realized that there is already a tool available that is quite suitable for the job: rsync.

rsync is designed to efficiently transfer and synchronize files between drivers and machines across networks by comparing the modification times and sizes of files.

The only thing that I consider a drawback is that it is not fully optimized to conveniently automate my personal workflow -- to accomplish what I want, I need to memorize all the relevant rsync command-line options and run multiple command-line instructions.

To alleviate this problem, I have created a custom script, that evolved into a tool that I have named: gitlike-rsync.

Usage


gitlike-rsync is a tool that facilitates synchronisation of file collections between directories on local or remote machines using rsync and a workflow that is similar to managing Git projects.

Making backups


For example, if we have a data directory that we want to back up to another partition (for example, that refers to an external USB drive), we can open the directory:

$ cd /home/sander/Documents

and configure a destination directory, such as a directory on a backup drive (/media/MyBackupDrive/Documents):

$ gitlike-rsync destination-add /media/MyBackupDrive/Documents

By running the following command-line instruction, we can create a backup of the Documents folder:

$ gitlike-rsync push
sending incremental file list
.d..tp..... ./
>f+++++++++ bye.txt
>f+++++++++ hello.txt

sent 112 bytes  received 25 bytes  274.00 bytes/sec
total size is 10  speedup is 0.07 (DRY RUN)
Do you want to proceed (y/N)? y
sending incremental file list
.d..tp..... ./
>f+++++++++ bye.txt
              4 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=1/3)
>f+++++++++ hello.txt
              6 100%    5.86kB/s    0:00:00 (xfr#2, to-chk=0/3)

sent 202 bytes  received 57 bytes  518.00 bytes/sec
total size is 10  speedup is 0.04

The output above shows me the following:

  • When no additional command-line parameters have been provided, the script will first do a dry run and show the user what it intends to do. In the above example, it shows me that it wants to transfer the contents of the Documents folder that consists of only two files: hello.txt and bye.txt.
  • After providing my confirmation, the files in the destination directory will be updated -- the backup drive that is mounted on /media/MyBackupDrive.

I can conveniently make updates in my documents folder and update my backups.

For example, I can add a new file to the Documents folder named: greeting.txt, and run the push command again:

$ gitlike-rsync push
sending incremental file list
.d..t...... ./
>f+++++++++ greeting.txt

sent 129 bytes  received 22 bytes  302.00 bytes/sec
total size is 19  speedup is 0.13 (DRY RUN)
Do you want to proceed (y/N)? y
sending incremental file list
.d..t...... ./
>f+++++++++ greeting.txt
              9 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=1/4)

sent 182 bytes  received 38 bytes  440.00 bytes/sec
total size is 19  speedup is 0.09

In the above output, only the greeting.txt file is transferred to backup partition, leaving the other files untouched, because they have not changed.

Restoring files from a backup


In addition to the push command, gitlike-rsync also supports pull that can be used to sync data from the configured destination folders. The pull command can be used as a means to restore data from a backup partition.

For example, if I accidentally delete a file from the Documents folder:

$ rm hello.txt

and run the pull command:

$ gitlike-rsync pull
sending incremental file list
.d..t...... ./
>f+++++++++ hello.txt

sent 137 bytes  received 22 bytes  318.00 bytes/sec
total size is 19  speedup is 0.12 (DRY RUN)
Do you want to proceed (y/N)? y
sending incremental file list
.d..t...... ./
>f+++++++++ hello.txt
              6 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=0/4)

sent 183 bytes  received 38 bytes  442.00 bytes/sec
total size is 19  speedup is 0.09

the script is able to detect that hello.txt was removed and restore it from the backup partition.

Synchronizing files between machines in a network


In addition to local directories, that are useful for back ups, the gitlike-rsync script can also be used in a similar way to exchange files between machines, such as my desktop PC and office laptop.

With the following command-line instruction, I can automatically clone the Documents folder from my desktop PC to the Documents folder on my office laptop:

$ gitlike-rsync clone sander@desktop-pc:/home/sander/Documents

The above command connects to my desktop PC over SSH and retrieves the content of the Documents/ folder. It will also automatically configure the destination directory to synchronize with the Documents folder on the desktop PC.

When new documents have been added on the desktop PC, I just have to run the following command on my office laptop to update it:

$ gitlike-rsync pull

I can also modify the contents of the Documents folder on my office laptop and synchronize the changed files to my desktop PC with a push:

$ gitlike-rsync push

About versioning


As explained in the beginning of this blog post, in addition to the recovery of failing machines and equipment, another important reason to create backups is to protect yourself against accidental modifications.

Although gitlike-rsync can detect and display file changes, it does not do any versioning of any kind. This feature is deliberately left unimplemented, for very good reasons.

For most of my personal files (e.g. images, audio, video) I do not need any versioning. As soon as they are organized, they are not supposed to be changed.

However, for certain kinds of files I do need versioning, such as software development projects. Whenever I need versioning, my answer is very simple: I use the "ordinary" Git, even for projects that are private and not supposed to be shared on a public hosting service, such as GitHub.

As seasoned Git users may probably already know, you can turn any local directory into a Git repository, by running:

$ git init

The above command creates a local .git folder that tracks and stores changes locally.

When using a public hosting service, such as GitHub, and cloning a repository from GitHub, a remote: origin has been automatically configured to automatically push and pull changes to and from GitHub.

It is also possible to synchronize Git changes between arbitrary computers using a private SSH connection. I can, for example, configure a remote for a private repository, as follows:

$ git remote add origin sander@desktop-pc:/home/sander/Development/private-project

the above command configures the Git project that is stored in the /home/sander/Development/private-project directory on my desktop PC as a remote.

I can pull changes from the remote repository, by running:

$ git pull origin

and push locally stored changes, by running:

$ git push origin

As you may probably have already noticed, the above workflow is very similar to exchanging documents, shown earlier in this blog post.

What about backing up private Git repositories? To do this, I typically create tarballs of the Git project directories and sync them to my backup media with gitlike-rsync. The presence of the .git folder suffices to retain a project's history.

Conclusion


In this blog post, I have described gitlike-rsync, a simple opinionated wrapper script for exchanging files between local directories (for backups) and remote directories (for data exchange between machines).

As its name implies, it heavily builds on top of rsync for efficient data exchange, and the concepts of git as an inspiration for the workflow.

I have been conveniently using this script for over ten years, and it works extremely well for my own use cases and a variety of operating systems (Linux, Windows, macOS and FreeBSD).

My solution is obviously not rocket science -- my contribution is only the workflow automation. The "true credits" should go the developers of rsync and Git.

I also have to thank the COVID-19 crisis that allowed me to finally find the time to polish the script, document it and give it a name. In the Netherlands, as of today, there are still many restrictions, but the situation is slowly getting better.

Availability


I have added the gitlike-rsync script described in this blog post to my custom-scripts repository that can be obtained from my GitHub page.

Monday, April 26, 2021

A test framework for the Nix process management framework

As already explained in many previous blog posts, the Nix process management framework adds new ideas to earlier service management concepts explored in Nixpkgs and NixOS:

  • It makes it possible to deploy services on any operating system that can work with the Nix package manager, including conventional Linux distributions, macOS and FreeBSD. It also works on NixOS, but NixOS is not a requirement.
  • It allows you to construct multiple instances of the same service, by using constructor functions that identify conflicting configuration parameters. These constructor functions can be invoked in such a way that these configuration properties no longer conflict.
  • We can target multiple process managers from the same high-level deployment specifications. These high-level specifications are automatically translated to parameters for a target-specific configuration function for a specific process manager.

    It is also possible to override or augment the generated parameters, to work with configuration properties that are not universally supported.
  • There is a configuration option that conveniently allows you to disable user changes making it possible to deploy services as an unprivileged user.

Although the above features are interesting, one particular challenge is that the framework cannot guarantee that all possible variations will work after writing a high-level process configuration. The framework facilitates code reuse, but it is not a write once, run anywhere approach.

To make it possible to validate multiple service variants, I have developed a test framework that is built on top of the NixOS test driver that makes it possible to deploy and test a network of NixOS QEMU virtual machines with very minimal storage and RAM overhead.

In this blog post, I will describe how the test framework can be used.

Automating tests


Before developing the test framework, I was mostly testing all my packaged services manually. Because a manual test process is tedious and time consuming, I did not have any test coverage for anything but the most trivial example services. As a result, I frequently ran into many configuration breakages.

Typically, when I want to test a process instance, or a system that is composed of multiple collaborative processes, I perform the following steps:

  • First, I need to deploy the system for a specific process manager and configuration profile, e.g. for a privileged or unprivileged user, in an isolated environment, such as a virtual machine or container.
  • Then I need to wait for all process instances to become available. Readiness checks are critical and typically more complicated than expected -- for most services, there is a time window between a successful invocation of a process and its availability to carry out its primary task, such as accepting network connections. Executing tests before a service is ready, typically results in errors.

    Although there are process managers that can generally deal with this problem (e.g. systemd has the sd_notify protocol and s6 its own protocol and a sd_notify wrapper), the lack of a standardized protocol and its adoption still requires me to manually implement readiness checks.

    (As a sidenote: the only readiness check protocol that is standardized is for traditional System V services that daemonize on their own. The calling parent process should almost terminate immediately, but still wait until the spawned daemon child process notifies it to be ready.

    As described in an earlier blog post, this notification aspect is more complicated to implement than I thought. Moreover, not all traditional System V daemons follow this protocol.)
  • When all process instances are ready, I can check whether they properly carry out their tasks, and whether the integration of these processes work as expected.

An example


I have developed a Nix function: testService that automates the above process using the NixOS test driver -- I can use this function to create a test suite for systems that are made out of running processes, such as the webapps example described in my previous blog posts about the Nix process management framework.

The example system consists of a number of webapp processes with an embedded HTTP server returning HTML pages displaying their identities. Nginx reverse proxies forward incoming connections to the appropriate webapp processes by using their corresponding virtual host header values:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, libDir ? "${stateDir}/lib"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  sharedConstructors = import ../../../examples/services-agnostic/constructors/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir cacheDir libDir tmpDir forceDisableUserChange processManager;
  };

  constructors = import ../../../examples/webapps-agnostic/constructors/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager;
    webappMode = null;
  };
in
rec {
  webapp1 = rec {
    port = 5000;
    dnsName = "webapp1.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "1";
    };
  };

  webapp2 = rec {
    port = 5001;
    dnsName = "webapp2.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "2";
    };
  };

  webapp3 = rec {
    port = 5002;
    dnsName = "webapp3.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "3";
    };
  };

  webapp4 = rec {
    port = 5003;
    dnsName = "webapp4.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "4";
    };
  };

  nginx = rec {
    port = if forceDisableUserChange then 8080 else 80;
    webapps = [ webapp1 webapp2 webapp3 webapp4 ];

    pkg = sharedConstructors.nginxReverseProxyHostBased {
      inherit port webapps;
    } {};
  };

  webapp5 = rec {
    port = 5004;
    dnsName = "webapp5.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "5";
    };
  };

  webapp6 = rec {
    port = 5005;
    dnsName = "webapp6.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "6";
    };
  };

  nginx2 = rec {
    port = if forceDisableUserChange then 8081 else 81;
    webapps = [ webapp5 webapp6 ];

    pkg = sharedConstructors.nginxReverseProxyHostBased {
      inherit port webapps;
      instanceSuffix = "2";
    } {};
  };
}

The processes model shown above (processes-advanced.nix) defines the following process instances:

  • There are six webapp process instances, each running an embedded HTTP service, returning HTML pages with their identities. The dnsName property specifies the DNS domain name value that should be used as a virtual host header to make the forwarding from the reverse proxies work.
  • There are two nginx reverse proxy instances. The former: nginx forwards incoming connections to the first four webapp instances. The latter: nginx2 forwards incoming connections to webapp5 and webapp6.

With the following command, I can connect to webapp2 through the first nginx reverse proxy:

$ curl -H 'Host: webapp2.local' http://localhost:8080
<!DOCTYPE html>
<html>
  <head>
    <title>Simple test webapp</title>
  </head>
  <body>
    Simple test webapp listening on port: 5001
  </body>
</html>

Creating a test suite


I can create a test suite for the web application system as follows:

{ pkgs, testService, processManagers, profiles }:

testService {
  exprFile = ./processes.nix;

  readiness = {instanceName, instance, ...}:
    ''
      machine.wait_for_open_port(${toString instance.port})
    '';

  tests = {instanceName, instance, ...}:
    pkgs.lib.optionalString (instanceName == "nginx" || instanceName == "nginx2")
      (pkgs.lib.concatMapStrings (webapp: ''
        machine.succeed(
            "curl --fail -H 'Host: ${webapp.dnsName}' http://localhost:${toString instance.port} | grep ': ${toString webapp.port}'"
        )
      '') instance.webapps);

  inherit processManagers profiles;
}

The Nix expression above invokes testService with the following parameters:

  • processManagers refers to a list of names of all the process managers that should be tested.
  • profiles refers to a list of configuration profiles that should be tested. Currently, it supports privileged for privileged deployments, and unprivileged for unprivileged deployments in an unprivileged user's home directory, without changing user permissions.
  • The exprFile parameter refers to the processes model of the system: processes-advanced.nix shown earlier.
  • The readiness parameter refers to a function that does a readiness check for each process instance. In the above example, it checks whether each service is actually listening on the required TCP port.
  • The tests parameter refers to a function that executes tests for each process instance. In the above example, it ignores all but the nginx instances, because explicitly testing a webapp instance is a redundant operation.

    For each nginx instance, it checks whether all webapp instances can be reached from it, by running the curl command.

The readiness and tests functions take the following parameters: instanceName identifies the process instance in the processes model, and instance refers to the attribute set containing its configuration.

Furthermore, they can refer to global process model configuration parameters:

  • stateDir: The directory in which state files are stored (typically /var for privileged deployments)
  • runtimeDir: The directory in which runtime files are stored (typically /var/run for privileged deployments).
  • forceDisableUserChange: Indicates whether to disable user changes (for unprivileged deployments) or not.

In addition to writing tests that work on instance level, it is also possible to write tests on system level, with the following parameters (not shown in the example):

  • initialTests: instructions that run right after deploying the system, but before the readiness checks, and instance-level tests.
  • postTests: instructions that run after the instance-level tests.

The above functions also accept the same global configuration parameters, and processes that refers to the entire processes model.

We can also configure other properties useful for testing:

  • systemPackages: installs additional packages into the system profile of the test virtual machine.
  • nixosConfig defines a NixOS module with configuration properties that will be added to the NixOS configuration of the test machine.
  • extraParams propagates additional parameters to the processes model.

Composing test functions


The Nix expression above is not self-contained. It is a function definition that needs to be invoked with all required parameters including all the process managers and profiles that we want to test for.

We can compose tests in the following Nix expression:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, processManagers ? [ "supervisord" "sysvinit" "systemd" "disnix" "s6-rc" ]
, profiles ? [ "privileged" "unprivileged" ]
}:

let
  testService = import ../../nixproc/test-driver/universal.nix {
    inherit system;
  };
in
{

  nginx-reverse-proxy-hostbased = import ./nginx-reverse-proxy-hostbased {
    inherit pkgs processManagers profiles testService;
  };

  docker = import ./docker {
    inherit pkgs processManagers profiles testService;
  };

  ...
}

The above partial Nix expression (default.nix) invokes the function defined in the previous Nix expression that resides in the nginx-reverse-proxy-hostbased directory and propagates all required parameters. It also composes other test cases, such as docker.

The parameters of the composition expression allow you to globally configure all the desired service variants:

  • processManagers allows you to select the process managers you want to test for.
  • profiles allows you to select the configuration profiles.

With the following command, we can test our system as a privileged user, using systemd as a process manager:

$ nix-build -A nginx-reverse-proxy-hostbased.privileged.systemd

we can also run the same test, but then as an unprivileged user:

$ nix-build -A nginx-reverse-proxy-hostbased.unprivileged.systemd

In addition to systemd, any configured process manager can be used that works in NixOS. The following command runs a privileged test of the same service for sysvinit:

$ nix-build -A nginx-reverse-proxy-hostbased.privileged.sysvinit

Results


With the test driver in place, I have managed to expand my repository of example services, provided test coverage for them and fixed quite a few bugs in the framework caused by regressions.

Below is a screenshot of Hydra: the Nix-based continuous integration service showing an overview of test results for all kinds of variants of a service:


So far, the following services work multi-instance, with multiple process managers, and (optionally) as an unprivileged user:

  • Apache HTTP server. In the services repository, there are multiple constructors for deploying an Apache HTTP server: to deploy static web applications or dynamic web applications with PHP, and to use it as a reverse proxy (via HTTP and AJP) with HTTP basic authentication optionally enabled.
  • Apache Tomcat.
  • Nginx. For Nginx we also have multiple constructors. One to deploy a configuration for serving static web apps, and two for setting up reverse proxies using paths or virtual hosts to forward incoming requests to the appropriate services.

    The reverse proxy constructors can also generate configurations that will cache the responses of incoming requests.
  • MySQL/MariaDB.
  • PostgreSQL.
  • InfluxDB.
  • MongoDB.
  • OpenSSH.
  • svnserve.
  • xinetd.
  • fcron. By default, the fcron user and group are hardwired into the executable. To facilitate unprivileged user deployments, we automatically create a package build override to propagate the --with-run-non-privileged configuration flag so that it can run as unprivileged user. Similarly, for multiple instances we create an override to use a different user and group that does not conflict with the primary instance.
  • supervisord
  • s6-svscan

The following service also works with multiple instances and multiple process managers, but not as an unprivileged user:


The following services work with multiple process managers, but not multi-instance or as an unprivileged user:

  • D-Bus
  • Disnix
  • nix-daemon
  • Hydra

In theory, the above services could be adjusted to work as an unprivileged user, but doing so is not very useful -- for example, the nix-daemon's purpose is to facilitate multi-user package deployments. As an unprivileged user, you only want to facilitate package deployments for yourself.

Moreover, the multi-instance aspect is IMO also not very useful to explore for these services. For example, I can not think of a useful scenario to have two Hydra instances running next to each other.

Discussion


The test framework described in this blog post is an important feature addition to the Nix process management framework -- it allowed me to package more services and fix quite a few bugs caused by regressions.

I can now finally show that it is doable to package services and make them work under nearly all possible conditions that the framework supports (e.g. multiple instances, multiple process managers, and unprivileged user installations).

The only limitation of the test framework is that it is not operating system agnostic -- the NixOS test driver (that serves as its foundation), only works (as its name implies) with NixOS, which itself is a Linux distribution. As a result, we can not automatically test bsdrc scripts, launchd daemons, and cygrunsrv services.

In theory, it is also possible to make a more generalized test driver that works with multiple operating systems. The NixOS test driver is a combination of ideas (e.g. a shared Nix store between the host and guest system, an API to control QEMU, and an API to manage services). We could also dissect these ideas and run them on conventional QEMU VMs running different operating systems (with the Nix package manager).

Although making a more generalized test driver is interesting, it is beyond the scope of the Nix process management framework (which is about managing process instances, not entire systems).

Another drawback is that while it is possible to test all possible service variants on Linux, it may be very expensive to do so.

However, full process manager coverage is often not required to get a reasonable level of confidence. For many services, it typically suffices to implement the following strategy:

  • Pick two process managers: one that prefers foreground processes (e.g. supervisord) and one that prefers daemons (e.g. sysvinit). This is the most significant difference (from a configuration perspective) between all these different process managers.
  • If a service supports multiple configuration variants, and multiple instances, then create a processes model that concurrently deploys all these variants.

Implementing the above strategy only requires you to test four variants, providing a high degree of certainty that it will work with all other process managers as well.

Future work


Most of the interesting functionality required to work with the Nix process management framework is now implemented. I still need to implement more changes to make it more robust and "dog food" more of my own problems as much as possible.

Moreover, the docker backend still requires a bit more work to make it more usable.

Eventually, I will be thinking of an RFC that will upstream the interesting bits of the framework into Nixpkgs.

Availability


The Nix process management framework repository as well as the example services repository can be obtained from my GitHub page.

Friday, March 12, 2021

Using the Nix process management framework as an infrastructure deployment solution for Disnix

As explained in many previous blog posts, I have developed Disnix as a solution for automating the deployment of service-oriented systems -- it deploys heterogeneous systems, that consist of many different kinds of components (such as web applications, web services, databases and processes) to networks of machines.

The deployment models for Disnix are typically not fully self-contained. Foremost, a precondition that must be met before a service-oriented system can be deployed, is that all target machines in the network require the presence of Nix package manager, Disnix, and a remote connectivity service (e.g. SSH).

For multi-user Disnix installations, in which the user does not have super-user privileges, the Disnix service is required to carry out deployment operations on behalf of a user.

Moreover, the services in the services model typically need to be managed by other services, called containers in Disnix terminology (not to be confused with Linux containers).

Examples of container services are:

  • The MySQL DBMS container can manage multiple databases deployed by Disnix.
  • The Apache Tomcat servlet container can manage multiple Java web applications deployed by Disnix.
  • systemd can act as a container that manages multiple systemd units deployed by Disnix.

Managing the life-cycles of services in containers (such as activating or deactivating them) is done by a companion tool called Dysnomia.

In addition to Disnix, these container services also typically need to be deployed in advance to the target machines in the network.

The problem domain that Disnix works in is called service deployment, whereas the deployment of machines (bare metal or virtual machines) and the container services is called infrastructure deployment.

Disnix can be complemented with a variety of infrastructure deployment solutions:

  • NixOps can deploy networks of NixOS machines, both physical and virtual machines (in the cloud), such as Amazon EC2.

    As part of a NixOS configuration, the Disnix service can be deployed that facilitates multi-user installations. The Dysnomia NixOS module can expose all relevant container services installed by NixOS as container deployment targets.
  • disnixos-deploy-network is a tool that is included with the DisnixOS extension toolset. Since services in Disnix can be any kind of deployment unit, it is also possible to deploy an entire NixOS configuration as a service. This tool is mostly developed for demonstration purposes.

    A limitation of this tool is that it cannot instantiate virtual machines and bootstrap Disnix.
  • Disnix itself. The above solutions are all NixOS-based, a software distribution that is Linux-based and fully managed by the Nix package manager.

    Although NixOS is very powerful, it has two drawbacks for Disnix:

    • NixOS uses the NixOS module system for configuring system aspects. It is very powerful but you can only deploy one instance of a system service -- Disnix can also work with multiple container instances of the same type on a machine.
    • Services in NixOS cannot be deployed to other kinds software distributions: conventional Linux distributions, and other operating systems, such as macOS and FreeBSD.

    To overcome these limitations, Disnix can also be used as a container deployment solution on any operating system that is capable of running Nix and Disnix. Services deployed by Disnix can automatically be exposed as container providers.

    Similar to disnix-deploy-network, a limitation of this approach is that it cannot be used to bootstrap Disnix.

Last year, I have also added a new major feature to Disnix making it possible to deploy both application and container services in the same Disnix deployment models, minimizing the infrastructure deployment problem -- the only requirement is to have machines with Nix, Disnix, and a remote connectivity service (such as SSH) pre-installed on them.

Although this integrated feature is quite convenient, in particular for test setups, a separated infrastructure deployment process (that includes container services) still makes sense in many scenarios:

  • The infrastructure parts and service parts can be managed by different people with different specializations. For example, configuring and tuning an application server is a different responsibility than developing a Java web application.
  • The service parts typically change more frequently than the infrastructure parts. As a result, they typically have different kinds of update cycles.
  • The infrastructure components can typically be reused between projects (e.g. many systems use a database backend such as PostgreSQL or MySQL), whereas the service components are typically very project specific.

I also realized that my other project: the Nix process management framework can serve as a partial infrastructure deployment solution -- it can be used to bootstrap Disnix and deploy container services.

Moreover, it can also deploy multiple instances of container services and used on any operating system that the Nix process management framework supports, including conventional Linux distributions and other operating systems, such as macOS and FreeBSD.

Deploying and exposing the Disnix service with the Nix process management framework


As explained earlier, to allow Disnix to deploy services to a remote machine, a machine needs to have Disnix installed (and run the Disnix service for a multi-user installation), and be remotely connectible, e.g. through SSH.

I have packaged all required services as constructor functions for the Nix process management framework.

The following process model captures the configuration of a basic multi-user Disnix installation:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  ids = if builtins.pathExists ./ids-bare.nix then (import ./ids-bare.nix).ids else {};

  constructors = import ../../services-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };
in
rec {
  sshd = {
    pkg = constructors.sshd {
      extraSSHDConfig = ''
        UsePAM yes
      '';
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  dbus-daemon = {
    pkg = constructors.dbus-daemon {
      services = [ disnix-service ];
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  disnix-service = {
    pkg = constructors.disnix-service {
      inherit dbus-daemon;
    };

    requiresUniqueIdsFor = [ "gids" ];
  };
}

The above processes model (processes.nix) captures three process instances:

  • sshd is the OpenSSH server that makes it possible to remotely connect to the machine by using the SSH protocol.
  • dbus-daemon runs a D-Bus system daemon, that is a requirement for the Disnix service. The disnix-service is propagated as a parameter, so that its service directory gets added to the D-Bus system daemon configuration.
  • disnix-service is a service that executes deployment operations on behalf of an authorized unprivileged user. The disnix-service has a dependency on the dbus-service making sure that the latter gets activated first.

We can deploy the above configuration on a machine that has the Nix process management framework already installed.

For example, to deploy the configuration on a machine that uses supervisord, we can run:

$ nixproc-supervisord-switch processes.nix

Resulting in a system that consists of the following running processes:

$ supervisorctl 
dbus-daemon                      RUNNING   pid 2374, uptime 0:00:34
disnix-service                   RUNNING   pid 2397, uptime 0:00:33
sshd                             RUNNING   pid 2375, uptime 0:00:34

As may be noticed, the above supervised services correspond to the processes in the processes model.

On the coordinator machine, we can write a bootstrap infrastructure model (infra-bootstrap.nix) that only contains connectivity settings:

{
  test1.properties.hostname = "192.168.2.1";
}

and use the bootstrap model to capture the full infrastructure model of the system:

$ disnix-capture-infra infra-bootstrap.nix

resulting in the following configuration:

{
  "test1" = {
    properties = {
      "hostname" = "192.168.2.1";
      "system" = "x86_64-linux";
    };
    containers = {
      echo = {
      };
      fileset = {
      };
      process = {
      };
      supervisord-program = {
        "supervisordTargetDir" = "/etc/supervisor/conf.d";
      };
      wrapper = {
      };
    };
    "system" = "x86_64-linux";
  };
}

Despite the fact that we have not configured any containers explicitly, the above configuration (infrastructure.nix) already exposes a number of container services:

  • The echo, fileset and process container services are built-in container providers that any Dysnomia installation includes.

    The process container can be used to automatically deploy services that daemonize. Services that daemonize themselves do not require the presence of any external service.
  • The supervisord-program container refers to the process supervisor that manages the services deployed by the Nix process management framework. It can also be used as a container for processes deployed by Disnix.

With the above infrastructure model, we can deploy any system that depends on the above container services, such as the trivial Disnix proxy example:

{ system, distribution, invDistribution, pkgs
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager ? "supervisord"
, nix-processmgmt ? ../../../nix-processmgmt
}:

let
  customPkgs = import ../top-level/all-packages.nix {
    inherit system pkgs stateDir logDir runtimeDir tmpDir forceDisableUserChange processManager nix-processmgmt;
  };

  ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

  processType = import "${nix-processmgmt}/nixproc/derive-dysnomia-process-type.nix" {
    inherit processManager;
  };
in
rec {
  hello_world_server = rec {
    name = "hello_world_server";
    port = ids.ports.hello_world_server or 0;
    pkg = customPkgs.hello_world_server { inherit port; };
    type = processType;
    requiresUniqueIdsFor = [ "ports" ];
  };

  hello_world_client = {
    name = "hello_world_client";
    pkg = customPkgs.hello_world_client;
    dependsOn = {
      inherit hello_world_server;
    };
    type = "package";
  };
}

The services model shown above (services.nix) captures two services:

  • The hello_world_server service is a simple service that listens on a TCP port for a "hello" message and responds with a "Hello world!" message.
  • The hello_world_client service is a package providing a client executable that automatically connects to the hello_world_server.

With the following distribution model (distribution.nix), we can map all the services to our deployment machine (that runs the Disnix service managed by the Nix process management framework):

{infrastructure}:

{
  hello_world_client = [ infrastructure.test1 ];
  hello_world_server = [ infrastructure.test1 ];
}

and deploy the system by running the following command:

$ disnix-env -s services-without-proxy.nix \
  -i infrastructure.nix \
  -d distribution.nix \
  --extra-params '{ processManager = "supervisord"; }'

The last parameter: --extra-params configures the services model (that indirectly invokes the createManagedProcess abstraction function from the Nix process management framework) in such a way that supervisord configuration files are generated.

(As a sidenote: without the --extra-params parameter, the process instances will be built for the disnix process manager generating configuration files that can be deployed to the process container, expecting programs to daemonize on their own and leave a PID file behind with the daemon's process ID. Although this approach is convenient for experiments, because no external service is required, it is not as reliable as managing supervised processes).

The result of the above deployment operation is that the hello-world-service service is deployed as a service that is also managed by supervisord:

$ supervisorctl 
dbus-daemon                      RUNNING   pid 2374, uptime 0:09:39
disnix-service                   RUNNING   pid 2397, uptime 0:09:38
hello-world-server               RUNNING   pid 2574, uptime 0:00:06
sshd                             RUNNING   pid 2375, uptime 0:09:39

and we can use the hello-world-client executable on the target machine to connect to the service:

$ /nix/var/nix/profiles/disnix/default/bin/hello-world-client 
Trying 192.168.2.1...
Connected to 192.168.2.1.
Escape character is '^]'.
hello
Hello world!

Deploying container providers and exposing them


With Disnix, it is also possible to deploy systems that are composed of different kinds of components, such as web services and databases.

For example, the Java variant of the ridiculous Staff Tracker example consists of the following services:


The services in the diagram above have the following purpose:

  • The StaffTracker service is the front-end web application that shows an overview of staff members and their locations.
  • The StaffService service is web service with a SOAP interface that provides read and write access to the staff records. The staff records are stored in the staff database.
  • The RoomService service provides read access to the rooms records, that are stored in a separate rooms database.
  • The ZipcodeService service provides read access to zip codes, that are stored in a separate zipcodes database.
  • The GeolocationService infers the location of a staff member from its IP address using the GeoIP service.

To deploy the system shown above, we need a target machine that provides Apache Tomcat (for managing the web application front-end and web services) and MySQL (for managing the databases) as container provider services:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  ids = if builtins.pathExists ./ids-tomcat-mysql.nix then (import ./ids-tomcat-mysql.nix).ids else {};

  constructors = import ../../services-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };

  containerProviderConstructors = import ../../service-containers-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };
in
rec {
  sshd = {
    pkg = constructors.sshd {
      extraSSHDConfig = ''
        UsePAM yes
      '';
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  dbus-daemon = {
    pkg = constructors.dbus-daemon {
      services = [ disnix-service ];
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  tomcat = containerProviderConstructors.simpleAppservingTomcat {
    commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
    webapps = [
      pkgs.tomcat9.webapps # Include the Tomcat example and management applications
    ];

    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  mysql = containerProviderConstructors.mysql {
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  disnix-service = {
    pkg = constructors.disnix-service {
      inherit dbus-daemon;
      containerProviders = [ tomcat mysql ];
    };

    requiresUniqueIdsFor = [ "gids" ];
  };
}

The process model above is an extension of the previous processes model, adding two container provider services:

  • tomcat is the Apache Tomcat server. The constructor function: simpleAppServingTomcat composes a configuration for a supported process manager, such as supervisord.

    Moreover, it bundles a Dysnomia container configuration file, and a Dysnomia module: tomcat-webapplication that can be used to manage the life-cycles of Java web applications embedded in the servlet container.
  • mysql is the MySQL DBMS server. The constructor function also creates a process manager configuration file, and bundles a Dysnomia container configuration file and module that manages the life-cycles of databases.
  • The container services above are propagated as containerProviders to the disnix-service. This function parameter is used to update the search paths for container configuration and modules, so that services can be deployed to these containers by Disnix.

After deploying the above processes model, we should see the following infrastructure model after capturing it:

$ disnix-capture-infra infra-bootstrap.nix
{
  "test1" = {
    properties = {
      "hostname" = "192.168.2.1";
      "system" = "x86_64-linux";
    };
    containers = {
      echo = {
      };
      fileset = {
      };
      process = {
      };
      supervisord-program = {
        "supervisordTargetDir" = "/etc/supervisor/conf.d";
      };
      wrapper = {
      };
      tomcat-webapplication = {
        "tomcatPort" = "8080";
        "catalinaBaseDir" = "/var/tomcat";
      };
      mysql-database = {
        "mysqlPort" = "3306";
        "mysqlUsername" = "root";
        "mysqlPassword" = "";
        "mysqlSocket" = "/var/run/mysqld/mysqld.sock";
      };
    };
    "system" = "x86_64-linux";
  };
}

As may be observed, the tomcat-webapplication and mysql-database containers (with their relevant configuration properties) were added to the infrastructure model.

With the following command we can deploy the example system's services to the containers in the network:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

resulting in a fully functional system:


Deploying multiple container provider instances


As explained in the introduction, a limitation of the NixOS module system is that it is only possible to construct one instance of a service on a machine.

Process instances in a processes model deployed by the Nix process management framework as well as services in a Disnix services model are instantiated from functions that make it possible to deploy multiple instances of the same service to the same machine, by making conflicting properties configurable.

The following processes model was modified from the previous example to deploy two MySQL servers and two Apache Tomcat servers to the same machine:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  ids = if builtins.pathExists ./ids-tomcat-mysql-multi-instance.nix then (import ./ids-tomcat-mysql-multi-instance.nix).ids else {};

  constructors = import ../../services-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };

  containerProviderConstructors = import ../../service-containers-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };
in
rec {
  sshd = {
    pkg = constructors.sshd {
      extraSSHDConfig = ''
        UsePAM yes
      '';
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  dbus-daemon = {
    pkg = constructors.dbus-daemon {
      services = [ disnix-service ];
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  tomcat-primary = containerProviderConstructors.simpleAppservingTomcat {
    instanceSuffix = "-primary";
    httpPort = 8080;
    httpsPort = 8443;
    serverPort = 8005;
    ajpPort = 8009;
    commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
    webapps = [
      pkgs.tomcat9.webapps # Include the Tomcat example and management applications
    ];
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  tomcat-secondary = containerProviderConstructors.simpleAppservingTomcat {
    instanceSuffix = "-secondary";
    httpPort = 8081;
    httpsPort = 8444;
    serverPort = 8006;
    ajpPort = 8010;
    commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
    webapps = [
      pkgs.tomcat9.webapps # Include the Tomcat example and management applications
    ];
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  mysql-primary = containerProviderConstructors.mysql {
    instanceSuffix = "-primary";
    port = 3306;
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  mysql-secondary = containerProviderConstructors.mysql {
    instanceSuffix = "-secondary";
    port = 3307;
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  disnix-service = {
    pkg = constructors.disnix-service {
      inherit dbus-daemon;
      containerProviders = [ tomcat-primary tomcat-secondary mysql-primary mysql-secondary ];
    };

    requiresUniqueIdsFor = [ "gids" ];
  };
}

In the above processes model, we made the following changes:

  • We have configured two Apache Tomcat instances: tomcat-primary and tomcat-secondary. Both instances can co-exist because they have been configured in such a way that they listen to unique TCP ports and have a unique instance name composed from the instanceSuffix.
  • We have configured two MySQL instances: mysql-primary and mysql-secondary. Similar to Apache Tomcat, they can both co-exist because they listen to unique TCP ports (e.g. 3306 and 3307) and have a unique instance name.
  • Both the primary and secondary instances of the above services are propagated to the disnix-service (with the containerProviders parameter) making it possible for a client to discover them.

After deploying the above processes model, we can run the following command to discover the machine's configuration:

$ disnix-capture-infra infra-bootstrap.nix
{
  "test1" = {
    properties = {
      "hostname" = "192.168.2.1";
      "system" = "x86_64-linux";
    };
    containers = {
      echo = {
      };
      fileset = {
      };
      process = {
      };
      supervisord-program = {
        "supervisordTargetDir" = "/etc/supervisor/conf.d";
      };
      wrapper = {
      };
      tomcat-webapplication-primary = {
        "tomcatPort" = "8080";
        "catalinaBaseDir" = "/var/tomcat-primary";
      };
      tomcat-webapplication-secondary = {
        "tomcatPort" = "8081";
        "catalinaBaseDir" = "/var/tomcat-secondary";
      };
      mysql-database-primary = {
        "mysqlPort" = "3306";
        "mysqlUsername" = "root";
        "mysqlPassword" = "";
        "mysqlSocket" = "/var/run/mysqld-primary/mysqld.sock";
      };
      mysql-database-secondary = {
        "mysqlPort" = "3307";
        "mysqlUsername" = "root";
        "mysqlPassword" = "";
        "mysqlSocket" = "/var/run/mysqld-secondary/mysqld.sock";
      };
    };
    "system" = "x86_64-linux";
  };
}

As may be observed, the infrastructure model contains two Apache Tomcat instances and two MySQL instances.

With the following distribution model (distribution.nix), we can divide each database and web application over the two container instances:

{infrastructure}:

{
  GeolocationService = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-primary";
      }
    ];
  };
  RoomService = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-secondary";
      }
    ];
  };
  StaffService = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-primary";
      }
    ];
  };
  StaffTracker = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-secondary";
      }
    ];
  };
  ZipcodeService = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-primary";
      }
    ];
  };
  rooms = {
    targets = [
      { target = infrastructure.test1;
        container = "mysql-database-primary";
      }
    ];
  };
  staff = {
    targets = [
      { target = infrastructure.test1;
        container = "mysql-database-secondary";
      }
    ];
  };
  zipcodes = {
    targets = [
      { target = infrastructure.test1;
        container = "mysql-database-primary";
      }
    ];
  };
}

Compared to the previous distribution model, the above model uses a more verbose notation for mapping services.

As explained in an earlier blog post, in deployments in which only a single container is deployed, services are automapped to the container that has the same name as the service's type. When multiple instances exist, we need to manually specify the container where the service needs to be deployed to.

After deploying the system with the following command:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

we will get a running system with the following deployment architecture:


Using the Disnix web service for executing remote deployment operations


By default, Disnix uses SSH to communicate to target machines in the network. Disnix has a modular architecture and is also capable of communicating to target machines by other means, for example via NixOps, the backdoor client, D-Bus, and directly executing tasks on a local machine.

There is also an external package: DisnixWebService that remotely exposes all deployment operations from a web service with a SOAP API.

To use the DisnixWebService, we must deploy a Java servlet container (such as Apache Tomcat) with the DisnixWebService application, configured in such a way that it can connect to the disnix-service over the D-Bus system bus.

The following processes model is an extension of the non-multi containers Staff Tracker example, with an Apache Tomcat service that bundles the DisnixWebService:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  ids = if builtins.pathExists ./ids-tomcat-mysql.nix then (import ./ids-tomcat-mysql.nix).ids else {};

  constructors = import ../../services-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };

  containerProviderConstructors = import ../../service-containers-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };
in
rec {
  sshd = {
    pkg = constructors.sshd {
      extraSSHDConfig = ''
        UsePAM yes
      '';
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  dbus-daemon = {
    pkg = constructors.dbus-daemon {
      services = [ disnix-service ];
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  tomcat = containerProviderConstructors.disnixAppservingTomcat {
    commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
    webapps = [
      pkgs.tomcat9.webapps # Include the Tomcat example and management applications
    ];
    enableAJP = true;
    inherit dbus-daemon;

    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  apache = {
    pkg = constructors.basicAuthReverseProxyApache {
      dependency = tomcat;
      serverAdmin = "admin@localhost";
      targetProtocol = "ajp";
      portPropertyName = "ajpPort";

      authName = "DisnixWebService";
      authUserFile = pkgs.stdenv.mkDerivation {
        name = "htpasswd";
        buildInputs = [ pkgs.apacheHttpd ];
        buildCommand = ''
          htpasswd -cb ./htpasswd admin secret
          mv htpasswd $out
        '';
      };
      requireUser = "admin";
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  mysql = containerProviderConstructors.mysql {
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  disnix-service = {
    pkg = constructors.disnix-service {
      inherit dbus-daemon;
      containerProviders = [ tomcat mysql ];
      authorizedUsers = [ tomcat.name ];
      dysnomiaProperties = {
        targetEPR = "http://$(hostname)/DisnixWebService/services/DisnixWebService";
      };
    };

    requiresUniqueIdsFor = [ "gids" ];
  };
}

The above processes model contains the following changes:

  • The Apache Tomcat process instance is constructed with the containerProviderConstructors.disnixAppservingTomcat constructor function automatically deploying the DisnixWebService and providing the required configuration settings so that it can communicate with the disnix-service over the D-Bus system bus.

    Because the DisnixWebService requires the presence of the D-Bus system daemon, it is configured as a dependency for Apache Tomcat ensuring that it is started before Apache Tomcat.
  • Connecting to the Apache Tomcat server including the DisnixWebService requires no authentication. To secure the web applications and the DisnixWebService, I have configured an apache reverse proxy that forwards connections to Apache Tomcat using the AJP protocol.

    Moreover, the reverse proxy protects incoming requests by using HTTP basic authentication requiring a username and password.

We can use the following bootstrap infrastructure model to discover the machine's configuration:

{
  test1.properties.targetEPR = "http://192.168.2.1/DisnixWebService/services/DisnixWebService";
}

The difference between this bootstrap infrastructure model and the previous is that it uses a different connection property (targetEPR) that refers to the URL of the DisnixWebService.

By default, Disnix uses the disnix-ssh-client to communicate to target machines. To use a different client, we must set the following environment variables:

$ export DISNIX_CLIENT_INTERFACE=disnix-soap-client
$ export DISNIX_TARGET_PROPERTY=targetEPR

The above environment variables instruct Disnix to use the disnix-soap-client executable and the targetEPR property from the infrastructure model as a connection string.

To authenticate ourselves, we must set the following environment variables with a username and password:

$ export DISNIX_SOAP_CLIENT_USERNAME=admin
$ export DISNIX_SOAP_CLIENT_PASSWORD=secret

The following command makes it possible to discover the machine's configuration using the disnix-soap-client and DisnixWebService:

$ disnix-capture-infra infra-bootstrap.nix
{
  "test1" = {
    properties = {
      "hostname" = "192.168.2.1";
      "system" = "x86_64-linux";
      "targetEPR" = "http://192.168.2.1/DisnixWebService/services/DisnixWebService";
    };
    containers = {
      echo = {
      };
      fileset = {
      };
      process = {
      };
      supervisord-program = {
        "supervisordTargetDir" = "/etc/supervisor/conf.d";
      };
      wrapper = {
      };
      tomcat-webapplication = {
        "tomcatPort" = "8080";
        "catalinaBaseDir" = "/var/tomcat";
        "ajpPort" = "8009";
      };
      mysql-database = {
        "mysqlPort" = "3306";
        "mysqlUsername" = "root";
        "mysqlPassword" = "";
        "mysqlSocket" = "/var/run/mysqld/mysqld.sock";
      };
    };
    "system" = "x86_64-linux";
  }
  ;
}

After capturing the full infrastructure model, we can deploy the system with disnix-env if desired, using the disnix-soap-client to carry out all necessary remote deployment operations.

Miscellaneous: using Docker containers as light-weight virtual machines


As explained earlier in this blog post, the Nix process management framework is only a partial infrastructure deployment solution -- you still need to somehow obtain physical or virtual machines with a software distribution running the Nix package manager.

In a blog post written some time ago, I have explained that Docker containers are not virtual machines or even light-weight virtual machines.

In my previous blog post, I have shown that we can also deploy mutable Docker multi-process containers in which process instances can be upgraded without stopping the container.

The deployment workflow for upgrading mutable containers, is very machine-like -- NixOS has a similar workflow that consists of updating the machine configuration (/etc/nixos/configuration.nix) and running a single command-line instruction to upgrade machine (nixos-rebuild switch).

We can actually start using containers as VMs by adding another ingredient in the mix -- we can also assign static IP addresses to Docker containers.

With the following Nix expression, we can create a Docker image for a mutable container, using any of the processes models shown previously as the "machine's configuration":

let
  pkgs = import <nixpkgs> {};

  createMutableMultiProcessImage = import ../nix-processmgmt/nixproc/create-image-from-steps/create-mutable-multi-process-image-universal.nix {
    inherit pkgs;
  };
in
createMutableMultiProcessImage {
  name = "disnix";
  tag = "test";
  contents = [ pkgs.mc pkgs.disnix ];
  exprFile = ./processes.nix;
  interactive = true;
  manpages = true;
  processManager = "supervisord";
}

The exprFile in the above Nix expression refers to a previously shown processes model, and the processManager the desired process manager to use, such as supervisord.

With the following command, we can build the image with Nix and load it into Docker:

$ nix-build
$ docker load -i result

With the following command, we can create a network to which our containers (with IP addresses) should belong:

$ docker network create --subnet=192.168.2.0/8 disnixnetwork

The above command creates a subnet with a prefix: 192.168.2.0 and allocates an 8-bit block for host IP addresses.

We can create and start a Docker container named: containervm using our previously built image, and assign it an IP address:

$ docker run --network disnixnetwork --ip 192.168.2.1 \
  --name containervm disnix:test

By default, Disnix uses SSH to connect to remote machines. With the following commands we can create a public-private key pair and copy the public key to the container:

$ ssh-keygen -t ed25519 -f id_test -N ""

$ docker exec containervm mkdir -m0700 -p /root/.ssh
$ docker cp id_test.pub containervm:/root/.ssh/authorized_keys
$ docker exec containervm chmod 600 /root/.ssh/authorized_keys
$ docker exec containervm chown root:root /root/.ssh/authorized_keys

On the coordinator machine, that carries out the deployment, we must add the private key to the SSH agent and configure the disnix-ssh-client to connect to the disnix-service:

$ ssh-add id_test
$ export DISNIX_REMOTE_CLIENT=disnix-client

By executing all these steps, containervm can be (mostly) used as if it were a virtual machine, including connecting to it with an IP address over SSH.

Conclusion


In this blog post, I have described how the Nix process management framework can be used as a partial infrastructure deployment solution for Disnix. It can be used both for deploying the disnix-service (to facilitate multi-user installations) as well as deploying container providers: services that manage the life-cycles of services deployed by Disnix.

Moreover, the Nix process management framework makes it possible to do these deployments on all kinds of software distributions that can use the Nix package manager, including NixOS, conventional Linux distributions and other operating systems, such as macOS and FreeBSD.

If I had developed this solution a couple of years ago, it would probably have saved me many hours of preparation work for my first demo in my NixCon 2015 talk in which I wanted demonstrate that it is possible to deploy services to a heterogeneous network that consists of a NixOS, Ubuntu and Windows machine. Back then, I had to do all the infrastructure deployment tasks manually.

I also have to admit (but this statement is mostly based on my personal preferences, not facts), is that I find the functional style that the framework uses is IMO far more intuitive than the NixOS module system for certain service configuration aspects, especially for configuring container services and exposing them with Disnix and Dysnomia:

  • Because every process instance is constructed from a constructor function that makes all instance parameters explicit, you are guarded against common configuration errors such as undeclared dependencies.

    For example, the DisnixWebService-enabled Apache Tomcat service requires access to the dbus-service providing the system bus. Not having this service in the processes model, causes a missing function parameter error.
  • Function parameters in the processes model make it more clear that a process depends on another process and what that relationship may be. For example, with the containerProviders parameter it becomes IMO really clear that the disnix-service uses them as potential deployment targets for services deployed by Disnix.

    In comparison, the implementations of the Disnix and Dysnomia NixOS modules are far more complicated and monolithic -- the Dysnomia module has to figure for all potential container services deployed as part of a NixOS configuration, their properties, convert them to Dysnomia configuration files, and configure the systemd configuration for the disnix-service for proper activation ordering.

    The wants parameter (used for activation ordering) is just a list of strings, not knowing whether it contains valid references to services that have been deployed already.

Availability


The constructor functions for the services as well as the deployment examples described in this blog post can be found in the Nix process management services repository.

Future work


Slowly more and more of my personal use cases are getting supported by the Nix process management framework.

Moreover, the services repository is steadily growing. To ensure that all the services that I have packaged so far do not break, I really need to focus my work on a service test solution.

Wednesday, February 24, 2021

Deploying mutable multi-process Docker containers with the Nix process management framework (or running Hydra in a Docker container)

In a blog post written several months ago, I have shown that the Nix process management framework can also be used to conveniently construct multi-process Docker images.

Although Docker is primarily used for managing single root application process containers, multi-process containers can sometimes be useful to deploy systems that consist of multiple, tightly coupled, processes.

The Docker manual has a section that describes how to construct images for multi-process containers, but IMO the configuration process is a bit tedious and cumbersome.

To make this process more convenient, I have built a wrapper function: createMultiProcessImage around the dockerTools.buildImage function (provided by Nixpkgs) that does the following:

  • It constructs an image that runs a Linux and Docker compatible process manager as an entry point. Currently, it supports supervisord, sysvinit, disnix and s6-rc.
  • The Nix process management framework is used to build a configuration for a system that consists of multiple processes, that will be managed by any of the supported process managers.

Although the framework makes the construction of multi-process images convenient, a big drawback of multi-process Docker containers is upgrading them -- for example, for Debian-based containers you can imperatively upgrade packages by connecting to the container:

$ docker exec -it mycontainer /bin/bash

and upgrade the desired packages, such as file:

$ apt install file

The upgrade instruction above is not reproducible -- apt may install file version 5.38 today, and 5.39 tomorrow.

To cope with these kinds of side-effects, Docker works with images that snapshot the outcomes of all the installation steps. Constructing a container from the same image will always provide the same versions of all dependencies.

As a consequence, to perform a reproducible container upgrade, it is required to construct a new image, discard the container and reconstruct the container from the new image version, causing the system as a whole to be terminated, including the processes that have not changed.

For a while, I have been thinking about this limitation and developed a solution that makes it possible to upgrade multi-process containers without stopping and discarding them. The only exception is the process manager.

To make deployments reproducible, it combines the reproducibility properties of Docker and Nix.

In this blog post, I will describe how this solution works and how it can be used.

Creating a function for building mutable Docker images


As explained in an earlier blog post, that compares the deployment properties of Nix and Docker, both solutions support reproducible deployment, albeit for different application domains.

Moreover, their reproducibility properties are built around different concepts:

  • Docker containers are reproducible, because they are constructed from images that consist of immutable layers identified by hash codes derived from their contents.
  • Nix package builds are reproducible, because they are stored in isolation in a Nix store and made immutable (the files' permissions are set read-only). In the construction process of the packages, many side effects are mitigated.

    As a result, when the hash code prefix of a package (derived from all build inputs) is the same, then the build output is also (nearly) bit-identical, regardless of the machine on which the package was built.

By taking these reproducibilty properties into account, we can create a reproducible deployment process for upgradable containers by using a specific separation of responsibilities.

Deploying the base system


For the deployment of the base system that includes the process manager, we can stick ourselves to the traditional Docker deployment workflow based on images (the only unconventional aspect is that we use Nix to build a Docker image, instead of Dockerfiles).

The process manager that the image provides deploys its configuration from a dynamic configuration directory.

To support supervisord, we can invoke the following command as the container's entry point:

supervisord --nodaemon \
  --configuration /etc/supervisor/supervisord.conf \
  --logfile /var/log/supervisord.log \
  --pidfile /var/run/supervisord.pid

The above command starts the supervisord service (in foreground mode), using the supervisord.conf configuration file stored in /etc/supervisord.

The supervisord.conf configuration file has the following structure:

[supervisord]

[include]
files=conf.d/*

The above configuration automatically loads all program definitions stored in the conf.d directory. This directory is writable and initially empty. It can be populated with configuration files generated by the Nix process management framework.

For the other process managers that the framework supports (sysvinit, disnix and s6-rc), we follow a similar strategy -- we configure the process manager in such a way that the configuration is loaded from a source that can be dynamically updated.

Deploying process instances


Deployment of the process instances is not done in the construction of the image, but by the Nix process management framework and the Nix package manager running in the container.

To allow a processes model deployment to refer to packages in the Nixpkgs collection and install binary substitutes, we must configure a Nix channel, such as the unstable Nixpkgs channel:

$ nix-channel --add https://nixos.org/channels/nixpkgs-unstable
$ nix-channel --update

(As a sidenote: it is also possible to subscribe to a stable Nixpkgs channel or a specific Git revision of Nixpkgs).

The processes model (and relevant sub models, such as ids.nix that contains numeric ID assignments) are copied into the Docker image.

We can deploy the processes model for supervisord as follows:

$ nixproc-supervisord-switch

The above command will deploy the processes model in the NIXPROC_PROCESSES environment variable, which defaults to: /etc/nixproc/processes.nix:

  • First, it builds supervisord configuration files from the processes model (this step also includes deploying all required packages and service configuration files)
  • It creates symlinks for each configuration file belonging to a process instance in the writable conf.d directory
  • It instructs supervisord to reload the configuration so that only obsolete processes get deactivated and new services activated, causing unchanged processes to remain untouched.

(For the other process managers, we have equivalent tools: nixproc-sysvinit-switch, nixproc-disnix-switch and nixproc-s6-rc-switch).

Initial deployment of the system


Because only the process manager is deployed as part of the image (with an initially empty configuration), the system is not yet usable when we start a container.

To solve this problem, we must perform an initial deployment of the system on first startup.

I used my lessons learned from the chainloading techniques in s6 (in the previous blog post) and developed hacky generated bootstrap script (/bin/bootstrap) that serves as the container's entry point:

cat > /bin/bootstrap <<EOF
#! ${pkgs.stdenv.shell} -e

# Configure Nix channels
nix-channel --add ${channelURL}
nix-channel --update

# Deploy the processes model (in a child process)
nixproc-${input.processManager}-switch &

# Overwrite the bootstrap script, so that it simply just
# starts the process manager the next time we start the
# container
cat > /bin/bootstrap <<EOR
#! ${pkgs.stdenv.shell} -e
exec ${cmd}
EOR

# Chain load the actual process manager
exec ${cmd}
EOF
chmod 755 /bin/bootstrap

The generated bootstrap script does the following:

  • First, a Nix channel is configured and updated so that we can install packages from the Nixpkgs collection and obtain substitutes.
  • The next step is deploying the processes model by running the nixproc-*-switch tool for a supported process manager. This process is started in the background (as a child process) -- we can use this trick to force the managing bash shell to load our desired process supervisor as soon as possible.

    Ultimately, we want the process manager to become responsible for supervising any other process running in the container.
  • After the deployment process is started in the background, the bootstrap script is overridden by a bootstrap script that becomes our real entry point -- the process manager that we want to use, such as supervisord.

    Overriding the bootstrap script makes sure that the next time we start the container, it will start instantly without attempting to deploy the system again.
  • Finally, the bootstrap script "execs" into the real process manager, becoming the new PID 1 process. When the deployment of the system is done (the nixproc-*-switch process that still runs in the background), the process manager becomes responsible for reaping it.

With the above script, the workflow of deploying an upgradable/mutable multi-process container is the same as deploying an ordinary container from a Docker image -- the only (minor) difference is that the first time that we start the container, it may take some time before the services become available, because the multi-process system needs to be deployed by Nix and the Nix process management framework.

A simple usage scenario


Similar to my previous blog posts about the Nix process management framework, I will use the trivial web application system to demonstrate how the functionality of the framework can be used.

The web application system consists of one or more webapp processes (with an embedded HTTP server) that only return static HTML pages displaying their identities.

An Nginx reverse proxy forwards incoming requests to the appropriate webapp instance -- each webapp service can be reached by using its unique virtual host value.

To construct a mutable multi-process Docker image with Nix, we can write the following Nix expression (default.nix):

let
  pkgs = import <nixpkgs> {};

  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  createMutableMultiProcessImage = import "${nix-processmgmt}/nixproc/create-image-from-steps/create-mutable-multi-process-image-universal.nix" {
    inherit pkgs;
  };
in
createMutableMultiProcessImage {
  name = "multiprocess";
  tag = "test";
  contents = [ pkgs.mc ];
  exprFile = ./processes.nix;
  idResourcesFile = ./idresources.nix;
  idsFile = ./ids.nix;
  processManager = "supervisord"; # sysvinit, disnix, s6-rc are also valid options
}

The above Nix expression invokes the createMutableMultiProcessImage function that constructs a Docker image that provides a base system with a process manager, and a bootstrap script that deploys the multi-process system:

  • The name, tag, and contents parameters specify the image name, tag and the packages that need to be included in the image.
  • The exprFile parameter refers to a processes model that captures the configurations of the process instances that need to be deployed.
  • The idResources parameter refers to an ID resources model that specifies from which resource pools unique IDs need to be selected.
  • The idsFile parameter refers to an IDs model that contains the unique ID assignments for each process instance. Unique IDs resemble TCP/UDP port assignments, user IDs (UIDs) and group IDs (GIDs).
  • We can use the processManager parameter to select the process manager we want to use. In the above example it is supervisord, but other options are also possible.

We can use the following processes model (processes.nix) to deploy a small version of our example system:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

  sharedConstructors = import "${nix-processmgmt}/examples/services-agnostic/constructors/constructors.nix" {
    inherit pkgs stateDir runtimeDir logDir cacheDir tmpDir forceDisableUserChange processManager ids;
  };

  constructors = import "${nix-processmgmt}/examples/webapps-agnostic/constructors/constructors.nix" {
    inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager ids;
  };
in
rec {
  webapp = rec {
    port = ids.webappPorts.webapp or 0;
    dnsName = "webapp.local";

    pkg = constructors.webapp {
      inherit port;
    };

    requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];
  };

  nginx = rec {
    port = ids.nginxPorts.nginx or 0;

    pkg = sharedConstructors.nginxReverseProxyHostBased {
      webapps = [ webapp ];
      inherit port;
    } {};

    requiresUniqueIdsFor = [ "nginxPorts" "uids" "gids" ];
  };
}

The above Nix expression configures two process instances, one webapp process that returns a static HTML page with its identity and an Nginx reverse proxy that forwards connections to it.

A notable difference between the expression shown above and the processes models of the same system shown in my previous blog posts, is that this expression does not contain any references to files on the local filesystem, with the exception of the ID assignments expression (ids.nix).

We obtain all required functionality from the Nix process management framework by invoking builtins.fetchGit. Eliminating local references is required to allow the processes model to be copied into the container and deployed from within the container.

We can build a Docker image as follows:

$ nix-build

load the image into Docker:

$ docker load -i result

and create and start a Docker container:

$ docker run -it --name webapps --network host multiprocess:test
unpacking channels...
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
created 1 symlinks in user environment
2021-02-21 15:29:29,878 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2021-02-21 15:29:29,878 WARN No file matches via include "/etc/supervisor/conf.d/*"
2021-02-21 15:29:29,897 INFO RPC interface 'supervisor' initialized
2021-02-21 15:29:29,897 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2021-02-21 15:29:29,898 INFO supervisord started with pid 1
these derivations will be built:
  /nix/store/011g52sj25k5k04zx9zdszdxfv6wy1dw-credentials.drv
  /nix/store/1i9g728k7lda0z3mn1d4bfw07v5gzkrv-credentials.drv
  /nix/store/fs8fwfhalmgxf8y1c47d0zzq4f89fz0g-nginx.conf.drv
  /nix/store/vxpm2m6444fcy9r2p06dmpw2zxlfw0v4-nginx-foregroundproxy.sh.drv
  /nix/store/4v3lxnpapf5f8297gdjz6kdra8g7k4sc-nginx.conf.drv
  /nix/store/mdldv8gwvcd5fkchncp90hmz3p9rcd99-builder.pl.drv
  /nix/store/r7qjyr8vr3kh1lydrnzx6nwh62spksx5-nginx.drv
  /nix/store/h69khss5dqvx4svsc39l363wilcf2jjm-webapp.drv
  /nix/store/kcqbrhkc5gva3r8r0fnqjcfhcw4w5il5-webapp.conf.drv
  /nix/store/xfc1zbr92pyisf8lw35qybbn0g4f46sc-webapp.drv
  /nix/store/fjx5kndv24pia1yi2b7b2bznamfm8q0k-supervisord.d.drv
these paths will be fetched (78.80 MiB download, 347.06 MiB unpacked):
...

As may be noticed by looking at the output, on first startup the Nix process management framework is invoked to deploy the system with Nix.

After the system has been deployed, we should be able to connect to the webapp process via the Nginx reverse proxy:

$ curl -H 'Host: webapp.local' http://localhost:8080
<!DOCTYPE html>
<html>
  <head>
    <title>Simple test webapp</title>
  </head>
  <body>
    Simple test webapp listening on port: 5000
  </body>
</html>

When it is desired to upgrade the system, we can change the system's configuration by connecting to the container instance:

$ docker exec -it webapps /bin/bash

In the container, we can edit the processes.nix configuration file:

$ mcedit /etc/nixproc/processes.nix

and make changes to the configuration of the system. For example, we can change the processes model to include a second webapp process:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

  sharedConstructors = import "${nix-processmgmt}/examples/services-agnostic/constructors/constructors.nix" {
    inherit pkgs stateDir runtimeDir logDir cacheDir tmpDir forceDisableUserChange processManager ids;
  };

  constructors = import "${nix-processmgmt}/examples/webapps-agnostic/constructors/constructors.nix" {
    inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager ids;
  };
in
rec {
  webapp = rec {
    port = ids.webappPorts.webapp or 0;
    dnsName = "webapp.local";

    pkg = constructors.webapp {
      inherit port;
    };

    requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];
  };

  webapp2 = rec {
    port = ids.webappPorts.webapp2 or 0;
    dnsName = "webapp2.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "2";
    };

    requiresUniqueIdsFor = [ "webappPorts" "uids" "gids" ];
  };

  nginx = rec {
    port = ids.nginxPorts.nginx or 0;

    pkg = sharedConstructors.nginxReverseProxyHostBased {
      webapps = [ webapp webapp2 ];
      inherit port;
    } {};

    requiresUniqueIdsFor = [ "nginxPorts" "uids" "gids" ];
  };
}

In the above process model model, a new process instance named: webapp2 was added that listens on a unique port that can be reached with the webapp2.local virtual host value.

By running the following command, the system in the container gets upgraded:

$ nixproc-supervisord-switch

resulting in two webapp process instances running in the container:

$ supervisorctl 
nginx                            RUNNING   pid 847, uptime 0:00:08
webapp                           RUNNING   pid 459, uptime 0:05:54
webapp2                          RUNNING   pid 846, uptime 0:00:08
supervisor>

The first instance: webapp was left untouched, because its configuration was not changed.

The second instance: webapp2 can be reached as follows:

$ curl -H 'Host: webapp2.local' http://localhost:8080
<!DOCTYPE html>
<html>
  <head>
    <title>Simple test webapp</title>
  </head>
  <body>
    Simple test webapp listening on port: 5001
  </body>
</html>

After upgrading the system, the new configuration should also get reactivated after a container restart.

A more interesting example: Hydra


As explained earlier, to create upgradable containers we require a fully functional Nix installation in a container. This observation made a think about a more interesting example than the trivial web application system.

A prominent example of a system that requires Nix and is composed out of multiple tightly integrated process is Hydra: the Nix-based continuous integration service.

To make it possible to deploy a minimal Hydra service in a container, I have packaged all its relevant components for the Nix process management framework.

The processes model looks as follows:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  nix-processmgmt-services = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt-services.git;
    ref = "master";
  };

  constructors = import "${nix-processmgmt-services}/services-agnostic/constructors.nix" {
    inherit nix-processmgmt pkgs stateDir runtimeDir logDir tmpDir cacheDir forceDisableUserChange processManager;
  };

  instanceSuffix = "";
  hydraUser = hydraInstanceName;
  hydraInstanceName = "hydra${instanceSuffix}";
  hydraQueueRunnerUser = "hydra-queue-runner${instanceSuffix}";
  hydraServerUser = "hydra-www${instanceSuffix}";
in
rec {
  nix-daemon = {
    pkg = constructors.nix-daemon;
  };

  postgresql = rec {
    port = 5432;
    postgresqlUsername = "postgresql";
    postgresqlPassword = "postgresql";
    socketFile = "${runtimeDir}/postgresql/.s.PGSQL.${toString port}";

    pkg = constructors.simplePostgresql {
      inherit port;
      authentication = ''
        # TYPE  DATABASE   USER   ADDRESS    METHOD
        local   hydra      all               ident map=hydra-users
      '';
      identMap = ''
        # MAPNAME       SYSTEM-USERNAME          PG-USERNAME
        hydra-users     ${hydraUser}             ${hydraUser}
        hydra-users     ${hydraQueueRunnerUser}  ${hydraUser}
        hydra-users     ${hydraServerUser}       ${hydraUser}
        hydra-users     root                     ${hydraUser}
        # The postgres user is used to create the pg_trgm extension for the hydra database
        hydra-users     postgresql               postgresql
      '';
    };
  };

  hydra-server = rec {
    port = 3000;
    hydraDatabase = hydraInstanceName;
    hydraGroup = hydraInstanceName;
    baseDir = "${stateDir}/lib/${hydraInstanceName}";
    inherit hydraUser instanceSuffix;

    pkg = constructors.hydra-server {
      postgresqlDBMS = postgresql;
      user = hydraServerUser;
      inherit nix-daemon port instanceSuffix hydraInstanceName hydraDatabase hydraUser hydraGroup baseDir;
    };
  };

  hydra-evaluator = {
    pkg = constructors.hydra-evaluator {
      inherit nix-daemon hydra-server;
    };
  };

  hydra-queue-runner = {
    pkg = constructors.hydra-queue-runner {
      inherit nix-daemon hydra-server;
      user = hydraQueueRunnerUser;
    };
  };

  apache = {
    pkg = constructors.reverseProxyApache {
      dependency = hydra-server;
      serverAdmin = "admin@localhost";
    };
  };
}

In the above processes model, each process instance represents a component of a Hydra installation:

  • The nix-daemon process is a service that comes with Nix package manager to facilitate multi-user package installations. The nix-daemon carries out builds on behalf of a user.

    Hydra requires it to perform builds as an unprivileged Hydra user and uses the Nix protocol to more efficiently orchestrate large builds.
  • Hydra uses a PostgreSQL database backend to store data about projects and builds.

    The postgresql process refers to the PostgreSQL database management system (DBMS) that is configured in such a way that the Hydra components are authorized to manage and modify the Hydra database.
  • hydra-server is the front-end of the Hydra service that provides a web user interface. The initialization procedure of this service is responsible for initializing the Hydra database.
  • The hydra-evaluator regularly updates the repository checkouts and evaluates the Nix expressions to decide which packages need to be built.
  • The hydra-queue-runner builds all jobs that were evaluated by the hydra-evaluator.
  • The apache server is used as a reverse proxy server forwarding requests to the hydra-server.

With the following commands, we can build the image, load it into Docker, and deploy a container that runs Hydra:

$ nix-build hydra-image.nix
$ docker load -i result
$ docker run -it --name hydra-test --network host hydra:test

After deploying the system, we can connect to the container:

$ docker exec -it hydra-test /bin/bash

and observe that all processes are running and managed by supervisord:

$ supervisorctl
apache                           RUNNING   pid 1192, uptime 0:00:42
hydra-evaluator                  RUNNING   pid 1297, uptime 0:00:38
hydra-queue-runner               RUNNING   pid 1296, uptime 0:00:38
hydra-server                     RUNNING   pid 1188, uptime 0:00:42
nix-daemon                       RUNNING   pid 1186, uptime 0:00:42
postgresql                       RUNNING   pid 1187, uptime 0:00:42
supervisor>

With the following commands, we can create our initial admin user:

$ su - hydra
$ hydra-create-user sander --password secret --role admin
creating new user `sander'

We can connect to the Hydra front-end in a web browser by opening http://localhost (this works because the container uses host networking):


and configure a job set to a build a project, such as libprocreact:


Another nice bonus feature of having multiple process managers supported is that if we build Hydra's Nix process management configuration for Disnix, we can also visualize the deployment architecture of the system with disnix-visualize:


The above diagram displays the following properties:

  • The outer box indicates that we are deploying to a single machine: localhost
  • The inner box indicates that all components are managed as processes
  • The ovals correspond to process instances in the processes model and the arrows denote dependency relationships.

    For example, the apache reverse proxy has a dependency on hydra-server, meaning that the latter process instance should be deployed first, otherwise the reverse proxy is not able to forward requests to it.

Building a Nix-enabled container image


As explained in the previous section, mutable Docker images require a fully functional Nix package manager in the container.

Since this may also be an interesting sub use case, I have created a convenience function: createNixImage that can be used to build an image whose only purpose is to provide a working Nix installation:

let
  pkgs = import <nixpkgs> {};

  nix-processmgmt = builtins.fetchGit {
    url = https://github.com/svanderburg/nix-processmgmt.git;
    ref = "master";
  };

  createNixImage = import "${nix-processmgmt}/nixproc/create-image-from-steps/create-nix-image.nix" {
    inherit pkgs;
  };
in
createNixImage {
  name = "foobar";
  tag = "test";
  contents = [ pkgs.mc ];
}

The above Nix expression builds a Docker image with a working Nix setup and a custom package: the Midnight Commander.

Conclusions


In this blog post, I have described a new function in the Nix process management framework: createMutableMultiProcessImage that creates reproducible mutable multi-process container images, by combining the reproducibility properties of Docker and Nix. With the exception of the process manager, process instances in a container can be upgraded without bringing the entire container down.

With this new functionality, the deployment workflow of a multi-process container configuration has become very similar to how physical and virtual machines are managed with NixOS -- you can edit a declarative specification of a system and run a single command-line instruction to deploy the new configuration.

Moreover, this new functionality allows us to deploy a complex, tightly coupled multi-process system, such as Hydra: the Nix-based continuous integration service. In the Hydra example case, we are using Nix for three deployment aspects: constructing the Docker image, deploying the multi-process system configuration and building the projects that are configured in Hydra.

A big drawback of mutable multi-process images is that there is no sharing possible between multiple multi-process containers. Since the images are not built from common layers, the Nix store is private to each container and all packages are deployed in the writable custom layer, this may lead to substantial disk and RAM overhead per container instance.

Deploying the processes model to a container instance can probably be made more convenient by using Nix flakes -- a new Nix feature that is still experimental. With flakes we can easily deploy an arbitrary number of Nix expressions to a container and pin the deployment to a specific version of Nixpkgs.

Another interesting observation is the word: mutable. I am not completely sure if it is appropriate -- both the layers of a Docker image, as well as the Nix store paths are immutable and never change after they have been built. For both solutions, immutability is an important ingredient in making sure that a deployment is reproducible.

I have decided to still call these deployments mutable, because I am looking at the problem from a Docker perspective -- the writable layer of the container (that is mounted on top of the immutable layers of an image) is modified each time that we upgrade a system.

Future work


Although I am quite happy with the ability to create mutable multi-process containers, there is still quite a bit of work that needs to be done to make the Nix process management framework more usable.

Most importantly, trying to deploy Hydra revealed all kinds of regressions in the framework. To cope with all these breaking changes, a structured testing approach is required. Currently, such an approach is completely absent.

I could also (in theory) automate the still missing parts of Hydra. For example, I have not automated the process that updates the garbage collector roots, which needs to run in a timely manner. To solve this, I need to use a cron service or systemd timer units, which is beyond the scope of my experiment.

Availability


The createMutableMultiProcessImage function is part of the experimental Nix process management framework GitHub repository that is still under heavy development.

Because the amount of services that can be deployed with the framework has grown considerably, I have moved all non-essential services (not required for testing) into a separate repository. The Hydra constructor functions can be found in this repository as well.