Friday, October 30, 2015

Deploying prebuilt binary software with the Nix package manager

As described in a number of older blog posts, Nix is primarily a source based package manager -- it constructs packages from source code by executing their build procedures in isolated environments in which only specified dependencies can be found.

As an optimization, it provides transparent binary deployment -- if a package that has been built from the same inputs exists elsewhere, it can be downloaded from that location instead of being built from source improving the efficiency of deployment processes.

Because Nix is a source based package manager, the documentation mainly describes how to build packages from source code. Moreover, the Nix expressions are written in such a way that they can be included in the Nixpkgs collection, a repository containing build recipes for more than 2500 packages.

Although the manual contains some basic packaging instructions, I noticed that there a few practical bits were missing. For example, how to package software privately outside the Nixpkgs tree is not clearly described, which makes experimentation a bit less convenient, in particular for newbies.

Despite being a source package manager, Nix can also be used to deploy binary software packages (i.e. software for which no source code and build scripts have been provided). Unfortunately, getting prebuilt binaries to run properly is quite tricky. Furthermore, apart from some references, there are no examples in the manual describing how to do this either.

Since I am receiving too many questions about this lately, I have decided to write a blog post about it covering two examples that should be relatively simple to repeat.

Why prebuilt binaries will typically not work


Prebuilt binaries deployed by Nix typically do not work out of the box. For example, if we want to deploy a simple binary package such as pngout (only containing a set of ELF executables) we may initially think that copying the executable into the Nix store suffices:

with import <nixpkgs> {};

stdenv.mkDerivation {
  name = "pngout-20130221";

  src = fetchurl {
    url = http://static.jonof.id.au/dl/kenutils/pngout-20130221-linux.tar.gz;
    sha256 = "1qdzmgx7si9zr7wjdj8fgf5dqmmqw4zg19ypg0pdz7521ns5xbvi";
  };

  installPhase = ''
    mkdir -p $out/bin
    cp x86_64/pngout $out/bin
  '';
}

However, when we build the above package:

$ nix-build pngout.nix

and attempt to run the executable, we stumble upon the following error:

$ ./result/bin/pngout
bash: ./result/bin/pngout: No such file or directory

The above error is quite strange -- the corresponding file resides in exactly the specified location yet it appears that it cannot be found!

The actual problem is not that the executable is missing, but one of its dependencies. Every ELF executable that uses shared libraries consults the dynamic linker/loader (that typically resides in /lib/ld-linux.so.2 (on x86 Linux platforms) and /lib/ld-linux-x86-64.so.2 on (x86-64 Linux platforms)) to provide the shared libraries it needs. This path is hardwired into the ELF executable, as can be observed by running:

$ readelf -l ./result/bin/pngout 

Elf file type is EXEC (Executable file)
Entry point 0x401160
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001c0 0x00000000000001c0  R E    8
  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000001593c 0x000000000001593c  R E    200000
  LOAD           0x0000000000015940 0x0000000000615940 0x0000000000615940
                 0x00000000000005b4 0x00000000014f9018  RW     200000
  DYNAMIC        0x0000000000015968 0x0000000000615968 0x0000000000615968
                 0x00000000000001b0 0x00000000000001b0  RW     8
  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x0000000000014e5c 0x0000000000414e5c 0x0000000000414e5c
                 0x00000000000001fc 0x00000000000001fc  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     8

In NixOS, most parts of the system are stored in a special purpose directory called the Nix store (i.e. /nix/store) including the dynamic linker. As a consequence, the dynamic linker cannot be found because it resides elsewhere.

Another reason why most binaries will not work is because they must know where to find its required shared libraries. In most conventional Linux distributions these reside in global directories (e.g. /lib and /usr/lib). In NixOS, these folders do not exist. Instead, every package is stored in isolation in separate folders in the Nix store.

Why compilation from source works


In contrast to prebuilt ELF binaries, binaries produced by a source build in a Nix build environment work out of the box typically without problems (i.e. they often do not require any special modifications in the build procedure). So why is that?

The "secret" is that the linker (that gets invoked by the compiler) has been wrapped in the Nix build environment -- if we invoke ld, then we actually end up using a wrapper: ld-wrapper that does a number of additional things besides the tasks the linker normally carries out.

Whenever we supply a library to link to, the wrapper appends an -rpath parameter providing its location. Furthermore, it appends the path to the dynamic linker/loader (-dynamic-linker) so that the resulting executable can load the shared libraries on startup.

For example, when producing an executable, the compiler may invoke the following command that links a library to a piece of object code:

$ ld test.o -lz -o test

in reality, ld has been wrapped and executes something like this:

$ ld test.o -lz \
  -rpath /nix/store/31w31mc8i...-zlib-1.2.8/lib \
  -dynamic-linker \
    /nix/store/hd6km3hscb...-glibc-2.21/lib/ld-linux-x86-64.so.2 \
  ...
  -o test

As may be observed, the wrapper transparently appends the path to zlib as an RPATH parameter and provides the path to the dynamic linker.

The RPATH attribute is basically a colon separated string of paths in which the dynamic linker looks for its shared dependencies. The RPATH is hardwired into an ELF binary.

Consider the following simple C program (test.c) that displays the version of the zlib library that it links against:

#include <stdio.h>
#include <zlib.h>

int main()
{
    printf("zlib version is: %s\n", ZLIB_VERSION);
    return 0;
}

With the following Nix expression we can compile an executable from it and link it against the zlib library:

with import <nixpkgs> {};

stdenv.mkDerivation {
  name = "test";
  buildInputs = [ zlib ];
  buildCommand = ''
    gcc ${./test.c} -lz -o test
    mkdir -p $out/bin
    cp test $out/bin
  '';
}

When we build the above package:

nix-build test.nix

and inspect the program headers of the ELF binary, we can observe that the dynamic linker (program interpreter) corresponds to an instance residing in the Nix store:

$ readelf -l ./result/bin/test 

Elf file type is EXEC (Executable file)
Entry point 0x400680
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001f8 0x00000000000001f8  R E    8
  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x0000000000000050 0x0000000000000050  R      1
      [Requesting program interpreter: /nix/store/hd6km3hs...-glibc-2.21/lib/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000096c 0x000000000000096c  R E    200000
  LOAD           0x0000000000000970 0x0000000000600970 0x0000000000600970
                 0x0000000000000260 0x0000000000000268  RW     200000
  DYNAMIC        0x0000000000000988 0x0000000000600988 0x0000000000600988
                 0x0000000000000200 0x0000000000000200  RW     8
  NOTE           0x0000000000000288 0x0000000000400288 0x0000000000400288
                 0x0000000000000020 0x0000000000000020  R      4
  GNU_EH_FRAME   0x0000000000000840 0x0000000000400840 0x0000000000400840
                 0x0000000000000034 0x0000000000000034  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     8
  PAX_FLAGS      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000         8

Furthermore, if we inspect the dynamic section of the binary, we will see that an RPATH attribute has been hardwired into it providing a collection of library paths (including the path to zlib):

$ readelf -d ./result/bin/test 

Dynamic section at offset 0x988 contains 27 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000f (RPATH)              Library rpath: [
/nix/store/8w39iz6sp...-test/lib64:
/nix/store/8w39iz6sp...-test/lib:
/nix/store/i9nn1fkcy...-gcc-4.9.3/libexec/gcc/x86_64-unknown-linux-gnu/4.9.3:
/nix/store/31w31mc8i...-zlib-1.2.8/lib:
/nix/store/hd6km3hsc...-glibc-2.21/lib:
/nix/store/i9nn1fkcy...-gcc-4.9.3/lib]
 0x000000000000001d (RUNPATH)            Library runpath: [
/nix/store/8w39iz6sp...-test/lib64:
/nix/store/8w39iz6sp...-test/lib:
/nix/store/i9nn1fkcy...-gcc-4.9.3/libexec/gcc/x86_64-unknown-linux-gnu/4.9.3:
/nix/store/31w31mc8i...-zlib-1.2.8/lib:
/nix/store/hd6km3hsc...-glibc-2.21/lib:
/nix/store/i9nn1fkcy...-gcc-4.9.3/lib]
 0x000000000000000c (INIT)               0x400620
 0x000000000000000d (FINI)               0x400814
 0x0000000000000019 (INIT_ARRAY)         0x600970
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x600978
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x0000000000000004 (HASH)               0x4002a8
 0x0000000000000005 (STRTAB)             0x400380
 0x0000000000000006 (SYMTAB)             0x4002d8
 0x000000000000000a (STRSZ)              528 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x600b90
 0x0000000000000002 (PLTRELSZ)           72 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x4005d8
 0x0000000000000007 (RELA)               0x4005c0
 0x0000000000000008 (RELASZ)             24 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x4005a0
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x400590
 0x0000000000000000 (NULL)               0x0

As a result, the program works as expected:

$ ./result/bin/test 
zlib version is: 1.2.8

Patching existing ELF binaries


To summarize, the reason why ELF binaries produced in a Nix build environment work is because they refer to the correct path of the dynamic linker and have an RPATH value that refers to the paths of the shared libraries that it needs.

Fortunately, we can accomplish the same thing with prebuilt binaries by using the PatchELF tool. With PatchELF we can patch existing ELF binaries to have a different dynamic linker and RPATH.

Running the following instruction in a Nix expression allows us to change the dynamic linker of the pngout executable shown earlier:

$ patchelf --set-interpreter \
    ${stdenv.glibc}/lib/ld-linux-x86-64.so.2 $out/bin/pngout

By inspecting the dynamic section of a binary, we can find out what shared libraries it requires:

$ readelf -d ./result/bin/pngout

Dynamic section at offset 0x15968 contains 22 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x400ea8
 0x000000000000000d (FINI)               0x413a78
 0x0000000000000004 (HASH)               0x400260
 0x000000006ffffef5 (GNU_HASH)           0x4003b8
 0x0000000000000005 (STRTAB)             0x400850
 0x0000000000000006 (SYMTAB)             0x4003e8
 0x000000000000000a (STRSZ)              379 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x615b20
 0x0000000000000002 (PLTRELSZ)           984 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400ad0
 0x0000000000000007 (RELA)               0x400a70
 0x0000000000000008 (RELASZ)             96 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x400a30
 0x000000006fffffff (VERNEEDNUM)         2
 0x000000006ffffff0 (VERSYM)             0x4009cc
 0x0000000000000000 (NULL)               0x0

According to the information listed above, two libraries are required (libm.so.6 and libc.so.6) which can be provided by the glibc package. We can change the executable's RPATH in the Nix expression as follows:

$ patchelf --set-rpath ${stdenv.glibc}/lib $out/bin/pngout

We can write a revised Nix expression for pngout (taking patching into account) that looks as follows:

with import <nixpkgs> {};

stdenv.mkDerivation {
  name = "pngout-20130221";

  src = fetchurl {
    url = http://static.jonof.id.au/dl/kenutils/pngout-20130221-linux.tar.gz;
    sha256 = "1qdzmgx7si9zr7wjdj8fgf5dqmmqw4zg19ypg0pdz7521ns5xbvi";
  };

  installPhase = ''
    mkdir -p $out/bin
    cp x86_64/pngout $out/bin
    patchelf --set-interpreter \
        ${stdenv.glibc}/lib/ld-linux-x86-64.so.2 $out/bin/pngout
    patchelf --set-rpath ${stdenv.glibc}/lib $out/bin/pngout
  '';
}

When we build the expression:

$ nix-build pngout.nix

and try to run the executable:

$ ./result/bin/pngout 
PNGOUT [In:{PNG,JPG,GIF,TGA,PCX,BMP}] (Out:PNG) (options...)
by Ken Silverman (http://advsys.net/ken)
Linux port by Jonathon Fowler (http://www.jonof.id.au/pngout)

We will see that the executable works as expected!

A more complex example: Quake 4 demo


The pngout example shown earlier is quite simple as it is only a tarball with only one executable that must be installed and patched. Now that we are familiar with some basic concepts -- how should we a approach a more complex prebuilt package, such as a computer game like the Quake 4 demo?

When we download the Quake 4 demo installer for Linux, we actually get a Loki setup tools based installer that is a self-extracting shell script executing an installer program.

Unfortunately, we cannot use this installer program in NixOS for two reasons. First, the installer executes (prebuilt) executables that will not work. Second, to use the full potential of NixOS, it is better to deploy packages with Nix in isolation in the Nix store.

Fortunately, running the installer with the --help parameter reveals that it is also possible to extract its contents without running the installer:

$ bash ./quake4-linux-1.0-demo.x86.run --noexec --keep

After executing the above command-line instruction, we can find the extracted files in the ./quake4-linux-1.0-demo in the current working directory.

The next step is figuring out where the game files reside and which binaries need to be patched. A rough inspection of the extracted folder:

$ cd quake4-linux-1.0-demo
$ ls
bin
Docs
License.txt
openurl.sh
q4base
q4icon.bmp
README
setup.data
setup.sh
version.info

reveals to me that we have both files of installer (./setup.data) and the game intermixed with each other. Some files seem to be required to run the game, but the some others, such as the setup files (e.g. the ones residing in setup.data/) are unnecessary.

Running the following command helps me to figure out which ELF binaries we may have to patch:

$ file $(find . -type f)         
./Docs/QUAKE4_demo_readme.txt:     Little-endian UTF-16 Unicode text, with CRLF line terminators
./bin/Linux/x86/libstdc++.so.5:    ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
./bin/Linux/x86/quake4-demo:       POSIX shell script, ASCII text executable
./bin/Linux/x86/quake4.x86:        ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.0.30, stripped
./bin/Linux/x86/quake4-demoded:    POSIX shell script, ASCII text executable
./bin/Linux/x86/libgcc_s.so.1:     ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
./bin/Linux/x86/q4ded.x86:         ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.0.30, stripped
./README:                          ASCII text
./version.info:                    ASCII text
./q4base/game100.pk4:              Zip archive data, at least v2.0 to extract
./q4base/mapcycle.scriptcfg:       ASCII text, with CRLF line terminators
./q4base/game000.pk4:              Zip archive data, at least v1.0 to extract
./License.txt:                     ISO-8859 text, with very long lines
./openurl.sh:                      POSIX shell script, ASCII text executable
./q4icon.bmp:                      PC bitmap, Windows 3.x format, 48 x 48 x 24
...

As we can see in the output, the ./bin/Linux/x86 sub folder contains a number of ELF executables and shared libraries that most likely require patching.

As with the previous example (pngout), we can use readelf to inspect what libraries the ELF executables require. The first executable q4ded.x86 has the following dynamic section:

$ cd ./bin/Linux/x86
$ readelf -d q4ded.x86 

Dynamic section at offset 0x366220 contains 25 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libpthread.so.0]
 0x00000001 (NEEDED)                     Shared library: [libdl.so.2]
 0x00000001 (NEEDED)                     Shared library: [libstdc++.so.5]
 0x00000001 (NEEDED)                     Shared library: [libm.so.6]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
...

According to the above information, the executable requires a couple of libraries that seem to be stored in the same package (in the same folder to be precise): libstdc++.so.5 and libgcc_s.so.1.

Furthermore, it also requires a number of libraries that are not in the same folder. These missing libraries must be provided by external packages. I know from experience that the remaining libraries: libpthread.so.0, libdl.so.2, libm.so.6, libc.so.6, are provided by the glibc package.

The other ELF executable has the following library references:

$ readelf -d ./quake4.x86 

Dynamic section at offset 0x3779ec contains 29 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libSDL-1.2.so.0]
 0x00000001 (NEEDED)                     Shared library: [libpthread.so.0]
 0x00000001 (NEEDED)                     Shared library: [libdl.so.2]
 0x00000001 (NEEDED)                     Shared library: [libstdc++.so.5]
 0x00000001 (NEEDED)                     Shared library: [libm.so.6]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x00000001 (NEEDED)                     Shared library: [libX11.so.6]
 0x00000001 (NEEDED)                     Shared library: [libXext.so.6]
...

This executable has a number dependencies that are identical to the previous executable. Additionally, it requires: libSDL-1.2.so.0 that can be provided by SDL, libX11.so.6 by libX11 and libXext.so.6 by libXext

Besides the executables, the shared libraries bundled with the package may also have dependencies on shared libraries. We need to inspect and fix these as well.

Inspecting the dynamic section of libgcc_s.so.1 reveals the following:

$ readelf -d ./libgcc_s.so.1 

Dynamic section at offset 0x7190 contains 23 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
...

The above library has a dependency on libc.so.6 which can be provided by glibc

The remaining library (libstdc++.so.5) has the following dependencies:

$ readelf -d ./libstdc++.so.5 

Dynamic section at offset 0xadd8c contains 25 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libm.so.6]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
...

It seems to depend on libgcc_s.so.1 residing in the same folder. Similar to the previous binaries, libm.so.6, libc.so.6 provided can be provided by glibc.

With the gathered information so far, we can write the following Nix expression that we can use as a first attempt to run the game:

with import <nixpkgs> { system = "i686-linux"; };

stdenv.mkDerivation {
  name = "quake4-demo-1.0";
  src = fetchurl {
    url = ftp://ftp.idsoftware.com/idstuff/quake4/demo/quake4-linux-1.0-demo.x86.run;
    sha256 = "0wxw2iw84x92qxjbl2kp5rn52p6k8kr67p4qrimlkl9dna69xrk9";
  };
  buildCommand = ''
    # Extract files from the installer
    cp $src quake4-linux-1.0-demo.x86.run
    bash ./quake4-linux-1.0-demo.x86.run --noexec --keep
    
    # Move extracted files into the Nix store
    mkdir -p $out/libexec
    mv quake4-linux-1.0-demo $out/libexec
    cd $out/libexec/quake4-linux-1.0-demo
    
    # Remove obsolete setup files
    rm -rf setup.data
    
    # Patch ELF binaries
    cd bin/Linux/x86
    patchelf --set-interpreter ${stdenv.cc.libc}/lib/ld-linux.so.2 ./quake4.x86
    patchelf --set-rpath $(pwd):${stdenv.cc.libc}/lib:${SDL}/lib:${xlibs.libX11}/lib:${xlibs.libXext}/lib ./quake4.x86
    chmod +x ./quake4.x86
    
    patchelf --set-interpreter ${stdenv.cc.libc}/lib/ld-linux.so.2 ./q4ded.x86
    patchelf --set-rpath $(pwd):${stdenv.cc.libc}/lib ./q4ded.x86
    chmod +x ./q4ded.x86
    
    patchelf --set-rpath ${stdenv.cc.libc}/lib ./libgcc_s.so.1
    patchelf --set-rpath $(pwd):${stdenv.cc.libc}/lib ./libstdc++.so.5
  '';
}

In the above Nix expression, we do the following:

  • We import the Nixpkgs collection so that we can provide the external dependencies that the package needs. Because the executables are 32-bit x86 binaries, we need to refer to packages built for the i686-linux architecture.
  • We download the Quake 4 demo installer from Id software's FTP server.
  • We automate the steps we have done earlier -- we extract the files from the installer, move them into Nix store, prune the obsolete setup files, and finally we patch the ELF executables and libraries with the paths to the dependencies that we have discovered in our investigation.

We should now be able to build the package:

$ nix-build quake4demo.nix

and investigate whether the executables can be started:

./result/libexec/quake4-linux-1.0-demo/bin/Linux/x86/quake4.x86

Unfortunately, it does not seem to work:

...
no 'q4base' directory in executable path /nix/store/0kfgsjryycsk5kfv97phj8ypv66n6caz-quake4-demo-1.0/libexec/quake4-linux-1.0-demo/bin/Linux/x86, skipping
no 'q4base' directory in current durectory /home/sander/quake4, skipping

According to the output, it cannot find the q4base/ folder. Running the same command with strace reveals why:

$ strace -f ./result/libexec/quake4-linux-1.0-demo/bin/Linux/x86/quake4.x86
...
stat64("/nix/store/0kfgsjryycsk5kfv97phj8ypv66n6caz-quake4-demo-1.0/libexec/quake4-linux-1.0-demo/bin/Linux/x86/q4base", 0xffd7b230) = -1 ENOENT (No such file or directory)
write(1, "no 'q4base' directory in executa"..., 155no 'q4base' directory in executable path /nix/store/0kfgsjryycsk5kfv97phj8ypv66n6caz-quake4-demo-1.0/libexec/quake4-linux-1.0-demo/bin/Linux/x86, skipping
) = 155
...

It seems that the program searches relative to the current working directory. The missing q4base/ folder apparently resides in the base directory of the extracted folder.

By changing the current working directory and invoking the executable again, the q4base/ directory can be found:

$ cd result/libexec/quake4-linux-1.0-demo
$ ./bin/Linux/x86/quake4.x86
...
--------------- R_InitOpenGL ----------------
Initializing SDL subsystem
Loading GL driver 'libGL.so.1' through SDL
libGL error: unable to load driver: i965_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: i965
libGL error: unable to load driver: swrast_dri.so
libGL error: failed to load driver: swrast
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  154 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  33
  Current serial number in output stream:  34

Despite fixing the problem, we have run into another one! Apparently the OpenGL driver cannot be loaded. Running the same command again with the following environment variable (source):

$ export LIBGL_DEBUG=verbose

shows us what is causing it:

--------------- R_InitOpenGL ----------------
Initializing SDL subsystem
Loading GL driver 'libGL.so.1' through SDL
libGL: OpenDriver: trying /run/opengl-driver-32/lib/dri/tls/i965_dri.so
libGL: OpenDriver: trying /run/opengl-driver-32/lib/dri/i965_dri.so
libGL: dlopen /run/opengl-driver-32/lib/dri/i965_dri.so failed (/nix/store/0kfgsjryycsk5kfv97phj8ypv66n6caz-quake4-demo-1.0/libexec/quake4-linux-1.0-demo/bin/Linux/x86/libgcc_s.so.1: version `GCC_3.4' not found (required by /run/opengl-driver-32/lib/dri/i965_dri.so))
libGL error: unable to load driver: i965_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: i965
libGL: OpenDriver: trying /run/opengl-driver-32/lib/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /run/opengl-driver-32/lib/dri/swrast_dri.so
libGL: dlopen /run/opengl-driver-32/lib/dri/swrast_dri.so failed (/nix/store/0kfgsjryycsk5kfv97phj8ypv66n6caz-quake4-demo-1.0/libexec/quake4-linux-1.0-demo/bin/Linux/x86/libgcc_s.so.1: version `GCC_3.4' not found (required by /run/opengl-driver-32/lib/dri/swrast_dri.so))
libGL error: unable to load driver: swrast_dri.so
libGL error: failed to load driver: swrast
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  154 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  33
  Current serial number in output stream:  34

Apparently, the libgcc_so.1 library bundled with the game is conflicting with Mesa3D. According to this GitHub issue, replacing the conflicting version with the host system's GCC's version fixes it.

In our situation, we can accomplish this by appending the path to the host system's GCC library folder to the RPATH of the binaries referring to it and by removing the conflicting library from the package.

Moreover, we can address the annoying issue with the missing q4base/ folder by creating wrapper scripts that change the current working folder and invoke the executable.

The revised expression taking these aspects into account will be as follows:

with import <nixpkgs> { system = "i686-linux"; };

stdenv.mkDerivation {
  name = "quake4-demo-1.0";
  src = fetchurl {
    url = ftp://ftp.idsoftware.com/idstuff/quake4/demo/quake4-linux-1.0-demo.x86.run;
    sha256 = "0wxw2iw84x92qxjbl2kp5rn52p6k8kr67p4qrimlkl9dna69xrk9";
  };
  buildCommand = ''
    # Extract files from the installer
    cp $src quake4-linux-1.0-demo.x86.run
    bash ./quake4-linux-1.0-demo.x86.run --noexec --keep
    
    # Move extracted files into the Nix store
    mkdir -p $out/libexec
    mv quake4-linux-1.0-demo $out/libexec
    cd $out/libexec/quake4-linux-1.0-demo
    
    # Remove obsolete setup files
    rm -rf setup.data
    
    # Patch ELF binaries
    cd bin/Linux/x86
    patchelf --set-interpreter ${stdenv.cc.libc}/lib/ld-linux.so.2 ./quake4.x86
    patchelf --set-rpath $(pwd):${stdenv.cc.cc}/lib:${stdenv.cc.libc}/lib:${SDL}/lib:${xlibs.libX11}/lib:${xlibs.libXext}/lib ./quake4.x86
    chmod +x ./quake4.x86
    
    patchelf --set-interpreter ${stdenv.cc.libc}/lib/ld-linux.so.2 ./q4ded.x86
    patchelf --set-rpath $(pwd):${stdenv.cc.cc}/lib:${stdenv.cc.libc}/lib ./q4ded.x86
    chmod +x ./q4ded.x86
    
    patchelf --set-rpath $(pwd):${stdenv.cc.libc}/lib ./libstdc++.so.5
    
    # Remove libgcc_s.so.1 that conflicts with Mesa3D's libGL.so
    rm ./libgcc_s.so.1
    
    # Create wrappers for the executables
    mkdir -p $out/bin
    cat > $out/bin/q4ded <<EOF
    #! ${stdenv.shell} -e
    cd $out/libexec/quake4-linux-1.0-demo
    ./bin/Linux/x86/q4ded.x86 "\$@"
    EOF
    chmod +x $out/bin/q4ded
    
    cat > $out/bin/quake4 <<EOF
    #! ${stdenv.shell} -e
    cd $out/libexec/quake4-linux-1.0-demo
    ./bin/Linux/x86/quake4.x86 "\$@"
    EOF
    chmod +x $out/bin/quake4
  '';
}

We can install the revised package in our Nix profile as follows:

$ nix-env -f quake4demo.nix -i quake4-demo

and conveniently run it from the command-line:

$ quake4


Happy playing!

(As a sidenote: besides creating a wrapper script, it is also possible to create a Freedesktop compliant .desktop entry file, so that it can be launched from the KDE/GNOME applications menu, but I leave this an open exercise to the reader!)

Conclusion


In this blog post, I have explained that prebuilt binaries do not work out of the box in NixOS. The main reason is that they cannot find their dependencies in their "usual locations", because these do not exist in NixOS. As a solution, it is possible to patch binaries with a tool called PatchELF to provide them the correct location to the dynamic linker and the paths to the libraries they need.

Moreover, I have shown two example packaging approaches (a simple and complex one) that should be relatively easy to repeat as an exercise.

Although source deployments typically work out of the box with few or no modifications, getting prebuilt binaries to work is often a journey that requires patching, wrapping, and experimentation. In this blog post I have described a few tricks that can be applied to make prebuilt packages work.

The approach described in this blog post is not the only solution to get prebuilt binaries to work in NixOS. An alternative approach is composing FHS-compatible chroot environments from Nix packages. This solution simulates an environment in which dependencies can be found in their common FHS locations. As a result, we do not require any modifications to a binary.

Although FHS chroot environments are conceptually nice, I would still prefer the patching approach described in this blog post unless there is no other way to make a package work properly -- it has less overhead, does not require any special privileges (e.g. super user rights), we can use the distribution mechanisms of Nix in its full extent, and we can also install a package as an unprivileged user.

Steam is a notable exception for using FHS compatible choot environments, because it is a deployment tool that conflicts with Nix's deployment properties.

As a final practical note: if you want to repeat the Quake 4 demo packing process, please check the following:

  • To enable hardware accelerated OpenGL for 32-bit applications in a 64-bit NixOS, add the following property to /etc/nixos/configuration.nix:

    hardware.opengl.driSupport32Bit = true;
    
  • Id sofware's FTP server seems to be quite slow to download from. You can also obtain the demo from a different download site (e.g. Fileplanet) and run the following command to get it imported into the Nix store:

    $ nix-prefetch-url file:///home/sander/quake4-linux-1.0-demo.x86.run
    

Tuesday, October 13, 2015

Setting up a basic software configuration management process in a small organization

I have been responsible for many things in my past and current career. Besides research and development, I have also been responsible for software and systems configuration management in small/medium sized companies, such as my current employer.

I have observed that in organizations like these, configuration management (CM) is typically nobody's (full) responsibility. Preferably, people want to stick themselves to their primary responsibilities and typically carry out change activities in an ad-hoc and unstructured way.

Not properly implementing changes have a number of serious implications. For example, some problems I have encountered are:

  • Delays. There are many factors that will unnecessarily increase the time it will take to implement a change. Many of my big delays were caused by the fact that I always have to search for all the relevant installation artifacts, such as documentation, installation discs, and so on. I have also encountered many times that artifacts were missing requiring me to obtain copies elsewhere.
  • Errors. Any change could potentially lead to errors for many kinds of reasons. For example, implementing a set of changes in the wrong order could break a system. Also, components of which a system consist may have complex dependencies on other components that have to be met. Quite often, it is not fully clear what the dependencies of a component or system are, especially when documentation is incomplete or lacking.

    Moreover, after having solved an error, you need to remember many arbitrary things, such as workarounds, that tend to become forgotten knowledge over time.
  • Disruptions. When implementing changes, a system may be partially or fully unavailable until all changes have been implemented. Preferably this time window should be as short as possible. Unfortunately, the inconsistency time window most likely becomes quite big when the configuration management process is not optimal or subject to errors.

It does not matter if an organization is small or big, but these problems cost valuable time and money. To alleviate these problems, it is IMO unavoidable to have a structured way of carrying out changes so that a system maintains its integrity.

Big organizations typically have information systems, people and management procedures to support structured configuration management, because failures are typically too costly for them. There are also standards available (such as the IEEE 828-2012 Standard for Configuration Management in Systems and Software Engineering) that they may use as a reference for implementing a configuration management process.

However, in small organizations people typically refrain from thinking about a process at all while they keep suffering from the consequences, because they think they are too small for it. Furthermore, they find it too costly to invest in people or an information system supporting configuration management procedures. Consulting a standard is something that is generally considered a leap too far.

In this blog post, I will describe a very basic software configuration management process I have implemented at my current employer.

The IEEE Standard for Configuration Management


As crazy as this may sound, I have used the IEEE 828-2012 standard as a reference for my implementation. The reason why I consulted this standard besides the fact that using an existing and reasonably well-established reference is good, is that I was already familiar with it, because of my previous background as a PhD researcher in software deployment.

The IEEE standard defines a framework of so-called "lower-level processes" from which a configuration management process can be derived. The most important lower-level processes are:

  • CM identification, which concerns identifying, naming, describing, versioning, storing and retrieving configuration items (CIs) and their baselines.
  • CM change control is about planning, requesting, approval and verification of change activities.
  • CM status accounting is about identifying the status of CIs and change requests.
  • CM auditing concerns identifying, tracing and reporting discrepancies with regards to the CIs and their baselines.
  • CM release management is about planning, defining a format for distribution, delivery, and archival of configuration items.

All the other lower-level processes have references to the lower-level processes listed above. CM planning is basically about defining a plan how to carry out the above activities. CM management is about actually installing the tools involved, executing the process, monitoring its progress, and status and revising/optimizing the plan if any problem occurs.

The remaining two lower-level processes concern outside involvement -- Supplier configuration item control concerns CIs that are provided by external parties. Interface control concerns the implications of configuration changes that concern external parties. I did not take the traits of these lower-level processes into account in the implementation.

Implementing a configuration management process


Implementing a configuration management process (according to the IEEE standard) starts by developing configuration management plan. The standard mentions many planning aspects, such as identifying the information needs, reporting needs, the reporting frequency, and the information needed to manage CM activities. However, from my perspective, in a small organization many of these aspects are difficult to answer in advance, in particular the information needs.

As a rule of thumb, I think that when you do not exactly know what is needed, consider that the worst thing could happen -- the office burns down and everything gets lost. What does it take to reproduce the entire configuration from scratch?

This is how I have implemented the main lower-level processes:

CM identification


A configuration item is any structural unit that is distinguishable and configurable. In our situation, the most important kind of configuration item is a machine configuration (e.g. a physical machine or a virtual machine hosted in an IaaS environment, such as Amazon EC2), or a container configuration (such as an Apache Tomcat container or PaaS service, such as Elastic Beanstalk).

Machines/containers belong to an environment. Some examples of environments that we currently maintain are: the production environment containing the configurations of the production machines of our service, test contains the configurations of the test environment, and internal contains the configurations of our internal IT infrastructure, such as our internal Hydra build cluster, and other peripherals, such as routers and printers.

Machines run a number of applications that may have complex installation procedures. Moreover, identical/similar application configurations may have to be deployed to multiple machines.

For storing the configurations of the CIs, I have set up a Git repository that follows a specific directory structure of three levels:

<environment>/<machine | container>/<application>

Each directory contains all artifacts (e.g. keys, data files, configuration files, scripts, documents etc.) required to reproduce a CI's configuration from scratch. Moreover, each directory has a README.md markdown file:

  • The top-level README.md describes which environments are available and what their purposes are.
  • The environment-level README.md describes which machines/containers are part of it, a brief description of their purpose, and a picture showing how they are connected. I typically use Dia to draw them, because the tool is simple and free.
  • The machine-level README.md describes the purpose of the machine and the activities that must be carried out to reproduce its configuration from scratch.
  • The application-level README.md captures the steps that must be executed to reproduce an application's configuration.

When storing artifacts and writing README.md files, I try to avoid duplication as much as possible, because it makes it harder to keep the repository consistent and maintainable:

  • README.md files are not supposed to be tool manuals. I mention the steps that must be executed and what their purposes are, but I avoid explaining how a tool works. That is the purpose of the tool's manual.
  • When there are common files used among machines, applications or environments, I do not duplicate them. Instead, I use a _common/ folder that I put one directory level higher. For example, the _common/ folder in the internal/ directory contains shared artifacts that are supposed to reused among all machines belonging to our internal IT infrastructure.
  • I also capture common configuration steps in a separate README.md and refer to it, instead of duplicating the same steps in multiple README.md files.

Because I use Git, versioning, storage and retrieval of configurations is implicit. I do not have to invent something myself or think too much about it. For example, I do not have to manually assign version numbers to CIs, because Git already computes them for each change. Moreover, because I use textual representations of most of the artifacts I can also easily compare versions of the configurations.

Furthermore, besides capturing and storing all the prerequisites to reproduce a configuration, I also try to automate this process as much as possible. For most of the automation aspects, I use tools from the Nix project, such as the Nix package manager for individual packages, NixOS for system configurations, Disnix for distributed services, and NixOps for networks of machines.

Tools in the Nix project are driven by declarative specifications -- a specification captures the structure of a system, e.g. their components and their dependencies. From this specification the entire deployment process will be derived, such as building the components from source code, distributing them to the right machines in the network, and activating them in the right order.

Using a declarative deployment approach prevents me writing down the activities to carry out, because they are implicit. Also, there is no need describing the structure of the system because it is already captured in the deployment specification.

Unfortunately, not all machine's deployment processes can be fully automated with Nix deployment tools (e.g. non-Linux machines and special purpose peripherals, such as routers) still requiring me to carry out some configuration activities manually.

CM change control


Implementing changes may cause disruptions costing time and money. That is why the right people must be informed and approval is needed. Big organizations typically have sophisticated management procedures including request and approval forms, but in a small organization it typically suffices to notify people informally before implementing a change.

Besides notifying people, I also take the following things into account while implementing changes:

  • Some configuration changes require validation including review and testing before they can be actually implemented in production. I typically keep the master Git branch in a state releasable state, meaning that it is ready to be deployed into production. Any changes that require explicit validation go into a different branch first.

    Moreover, when using tools from the Nix project it is relatively easy to reliably test changes first by deploying a system into a test environment, or by spawning virtual NixOS machines in which integration tests can be executed.
  • Sometimes you need to make an analysis of the impact and costs that a change would bring. Up-to-date and consistent documentation of the CIs including their dependencies makes this process more reliable.

    Furthermore, with tools from the Nix project you can also make better estimations by executing a dry-run deployment process -- the dry run shows what activities will be carried out without actually executing them or bringing the system down.
  • After a change has been deployed, we also need to validate whether the new configuration is correct. Typically, this requires testing.

    Tools from the Nix project support congruent deployment, meaning that if the deployment has succeeded, the actual configuration is guaranteed to match the deployment specification for the static parts of a system, giving better guarantees about its validity.
  • Also you have to pick the right moment to implement potentially disrupting changes. For example, it is typically a bad idea to do this while your service is under peak load.

CM status accounting


It is also highly desirable to know what the status of the CIs and the change activities are. The IEEE standard goes quite far in this. For example, the overall state of the system may converge into a certain direction (e.g. in terms of features complete, error ratios etc.), which you continuously want to measure and report about. I think that in a small organization these kinds of status indicators are typically too complex to define and measure, in particular in the early stages of a development process.

However, I think the most important status indicator that you never want to lose track of is the following: does everything (still) work?

There are two facilities that help me out a lot in keeping a system in working state:

  • Automating deployment with tools from the Nix project ensure that the static parts of a deployed system are congruent with the deployment configuration specification and atomic -- either a deployment is in the old configuration or the new configuration but never in an inconsistent mix of the two. As a result, we have fewer broken configurations as a result of (re)deploying a system.
  • We must also observe a system's runtime behavior and take action if things will grow out of hand. For example, when a machine runs out of system resources.

    Using a monitoring service, such as Zabbix or Datadog, helps me a lot in accomplishing this. They can also be used to configure alarms that warn you when things become critical.

CM auditing


Another important aspect is the integrity of the configurations repository. How can we be sure that what is stored inside the repository matches the actual configurations and that the configuration procedures still work?

Fortunately, because we use tools from the Nix project, there is relatively little audit work we need to do. With Nix-related tools the deployment process is fully automated. As a consequence, we need to adapt the deployment specification when we intend to make changes. Moreover, since the deployment specifications of Nix-related tools are congruent, we know that the static parts of a system are guaranteed to match the actual configurations if the (re)deployment process succeeded.

However, for non-NixOS machines and other peripherals, we must still manually check once in a while whether the indented configuration matches. I made it a habit to go through them once a month and to adjust the documentation if any discrepancies were found.

CM release management


When updating a configuration file, we must also release the new corresponding configuration items. The IEEE standard describes many concerns, such as approval procedures, requirements on the distribution mechanism and so on.

For me, most of these concerns are unimportant, especially in a small organization. The only thing that matters to me is that a release process is fully automated, reliable, reproducible. Fortunately, the deployment tools from the Nix project support these properties quite well.

Discussion


In this blog post, I have described a basic configuration management process that I have implemented in a small organization.

Some people will probably argue that defining a CM process in a small organization looks crazy. Some people think they do not need a process and that it is too much of an investment. Following an IEEE standard is generally considered a leap too far.

In my opinion, however, the barrier of implementing a CM process is not actually not that high. From my experience, the biggest investment is setting up a configuration management repository. Although big organizations typically have sophisticated information systems, I have also shown that using a simple filesystem layout and collection of free and open source tools (e.g. Git, Dia, Nix) a simple variant of such a repository can be set up with relatively little effort.

I also observed that automating CM tasks helps a lot, in particular using a declarative and congruent deployment approach, such as Nix. With a declarative approach, configuration activities are implicit (they are a consequence of applying a change in the deployment specification) and do not have to be documented. Furthermore, because Nix's deployment models are congruent, the static aspects of a configuration are guaranteed to match the deployment specifications. Moreover, the deployment model serves as the documentation, because it captures the structure of a system.

So how beneficial is setting up a CM process in a small organization? I observed many benefits. For example, a major benefit is that I can carry out many CM tasks much faster. I no longer have to waste much of my time looking for configuration artifacts and documentation. Also, because the steps to carry out are documented or automated, there are fewer things I need to (re)discover or solve while implementing a change.

Another benefit is that I can more reliably estimate the impact of implementing changes, because the CIs and their relationships are known. More knowledge, also causes fewer errors.

Although a simple CM approach provides benefits and many aspects can be automated, it always requires discipline from all people involved. For example, when errors are discovered and configurations must be modified in a stressful situation, it is very tempting to bypass updating the documentation.

Moreover, communication is also an important aspect. For example, when notifying people of a potentially disrupting change, clear communication is required. Typically, also non-technical stakeholders must be informed. Eventually, you have to start developing formalized procedures to properly handle decision processes.

Finally, the CM approach described in this blog post is obviously too limited if a company grows. If an organization gets bigger, a more sophisticated and more formalized CM approach will be required.

Sunday, August 30, 2015

Deploying target-specific services with Disnix

As explained in a previous blog post, Disnix's purpose is to be a distributed service deployment tool -- it deploys systems that are composed of distributable components (called services) that may have dependencies on each other into networks of machines having various characteristics.

The definition of a service in a Disnix context is not very strict. Basically, a service can take almost any form, such as a web service, web application, UNIX processes and even entire NixOS configurations.

Apart from the fact that we can deploy various kinds of services, they have another important characteristic from a deployment perspective. By default, services are target-agnostic, which means that they always have the same form regardless to what machine they are deployed in the network. In most cases this is considered a good thing.

However, there are also situations in which we want to deploy services that are built and configured specifically for a target machine. In this blog post, I will elaborate on this problem and describe how target-specific services can be deployed with Disnix.

Target-agnostic service deployment


Why are services target-agnostic by default in Disnix?

This property actually stems from the way "ordinary packages" are built with the Nix package manager which is used as a basis for Disnix.

As explained earlier, Nix package builds are influenced by its declared inputs only, such as the source code, build scripts and other kinds of dependencies, e.g. a compiler and libraries. Nix has means to ensure that undeclared dependencies cannot influence a build and that dependencies never collide with each other.

As a result, builds are reliable and reproducible. For example, it does not matter where the build of a package is performed. If the inputs are the same, then the corresponding outcome will be the same as well. (As a sidenote: there are some caveats, but in general there are no observable side-effects). Also, it provides better guarantees that, for example, if I have build and tested a program on my machine that it will work on a different machine as well.

Moreover, since it does not matter where a package has been built, we can, for example, also download a package built from identical inputs from a remote location, instead of building it ourselves improving the efficiency of deployment processes.

In Disnix, Nix's concept of building packages has been extended to services in a distributed setting. The major difference between a package and a serivce is that services take an additional class of dependencies into account. Besides the intra-dependencies that Nix manages, services may also have inter-dependencies on services that may be deployed to remote machines in a network. Disnix can be used to configure services in such a way that a service knows how to reach them and that the system is activated and deactivated in the right order.

As a consequence, it does not take a machine's properties into account when deploying it to a target machine in the network unless a machine's properties are explicitly provided as dependencies of a service.

In many cases, this is a good thing. For example, the following image shows a particular deployment scenario of the ridiculous StaffTracker example (described in some of my research publications and earlier blog posts):


The above image describes a deployment scenario in which we have deployed services (denoted by the ovals) to two machines in a network (denoted by the grey boxes). The arrows denote inter-dependency relationships.

One of the things we could do is changing the location of the StaffTracker web application front-end service, by changing the following line in the distribution model:

StaffTracker = [ infrastructure.test2 ];

to:

StaffTracker = [ infrastructure.test1 ];

Redeploying the system yields the following deployment architecture:


Performing the redeployment procedure is actually quite efficient. Since the intra-dependencies and inter-dependencies of the StaffTracker service have not changed, we do not have to rebuild and reconfigure the StaffTracker service. We can simply take the existing build result from the coordinator machine (that has been previously distributed to machine test1) and distribute it to test2.

Also, because the build result is the same, we have better guarantees that if the service worked on machine test1, it should work on machine test2 as well.

(As a sidenote: there is actually a situation in which a service will get rebuilt when moving it from one machine to another while its intra-dependencies and inter-dependencies have not changed.

Disnix also supports heterogeneous service deployment meaning that we can run target machines having different CPU architectures and operating systems. For example, if test2 were a Linux machine and test1 a Mac OS X machine, Disnix attempts to rebuild it for the new platform.

However, if all machines have the CPU architecture and operating system this will not happen).

Deploying target-specific services


Target-agnostic services are generally considered good because they improve reproducibility and efficiency when moving a service from machine to another. However, in some situations you may need to configure a service for a target machine specifically.

An example of a deployment scenario in which we need to deploy target-specific services, is when we want to deploy a collection of Node.js web applications and an nginx reverse proxy in which each web application should be reached by its own unique DNS domain name (e.g. http://webapp1.local, http://webapp2.local etc.).

We could model the nginx reverse proxy and each web application as (target-agnostic) distributable services, and deploy them in a network with Disnix as follows:


We can declare the web applications to be inter-dependencies of the nginx service and generate its configuration accordingly.

Although this approach works, the downside is that in the above deployment architecture, the test1 machine has to handle all the network traffic including the requests that should be propagated to the web applications deployed to test2 making the system not very scalable, because only one machine is responsible for handling all the network load.

We can also deploy two redundant instances of the nginx service by specifying the following attribute in the distribution model:

nginx = [ infrastructure.test1 infrastructure.test2 ];

The above modification yields the following deployment architecture:


The above deployment architecture is more scalable -- now requests meant for any of the web applications deployed to machine test1 can be handled by the nginx server deployed to test1 and the nginx server deployed to test2 can handle all the requests meant for the web applications deployed to test2.

Unfortunately, there is also an undesired side effect. As all the nginx services have the same form regardless to which machines they have been deployed, they have inter-dependencies on all web applications in the entire network including the ones that are not running on the same machine.

This property makes upgrading the system very inefficient. For example, if we update the webapp3 service (deployed to machine test2), the nginx configurations on all the other machines must be updated as well causing all nginx services on all machines to be upgraded, because they also have an inter-dependency on the upgraded web application.

In a 2 machine scenario with 4 web applications, this inefficiency may still be acceptable, but in a big environment with tens of web applications and tens of machines, we most likely suffer from many (hundreds of) unnecessary redeployment activities bringing the system down for a unnecessary long time.

A more efficient deployment architecture would be the following:


We deploy two target-specific nginx services that only have inter-dependencies on the web applications deployed to the same machine. In this scenario, upgrading webapp3 does not affect the configurations of any of the services deployed to the test1 machine.

How to specify these target-specific nginx services?

A dumb way to do it is to define a service for each target in the Disnix services model:

{pkgs, system, distribution}:

let
  customPkgs = ...
in
rec {
  ...

  nginx-wrapper-test1 = rec {
    name = "nginx-wrapper-test1";
    pkg = customPkgs.nginx-wrapper;
    dependsOn = {
      inherit webapp1 webapp2;
    };
    type = "wrapper";
  };

  nginx-wrapper-test2 = rec {
    name = "nginx-wrapper-test2";
    pkg = customPkgs.nginx-wrapper;
    dependsOn = {
      inherit webapp3 webapp4;
    };
    type = "wrapper";
  };
}

And then distributing them to the appropriate target machines in the Disnix distribution model:

{infrastructure}:

{
  ...
  nginx-wrapper-test1 = [ infrastructure.test1 ];
  nginx-wrapper-test2 = [ infrastructure.test2 ];
}

Manually specifying target-specific services is quite tedious and labourious especially if you have tens of services and tens of machines. We have to specify machines x components services resulting in hundreds of target-specific service configurations.

Furthermore, there is a bit of repetition. Both the distribution model and the service models reflect mappings from services to target machines.

A better approach would be to generate target-specific services. An example of such an approach is to specify the mappings of these services in the distribution model first:

{infrastructure}:

let
  inherit (builtins) listToAttrs attrNames getAttr;
in
{
  webapp1 = [ infrastructure.test1 ];
  webapp2 = [ infrastructure.test1 ];
  webapp3 = [ infrastructure.test2 ];
  webapp4 = [ infrastructure.test2 ];
} //

# To each target, distribute a reverse proxy

listToAttrs (map (targetName: {
  name = "nginx-wrapper-${targetName}";
  value = [ (getAttr targetName infrastructure) ];
}) (attrNames infrastructure))

In the above distribution model, we statically map all the target-agnostic web application services, and for each target machine in the infrastructure model we generate a mapping of the target-specific nginx service to its target machine.

We can generate the target-specific nginx service configurations in the services model as follows:

{system, pkgs, distribution, invDistribution}:

let
  customPkgs = import ../top-level/all-packages.nix {
    inherit pkgs system;
  };
in
{
  webapp1 = ...
  
  webapp2 = ...
  
  webapp3 = ...
  
  webapp4 = ...
} //

# Generate nginx proxy per target host

builtins.listToAttrs (map (targetName:
  let
    serviceName = "nginx-wrapper-${targetName}";
    servicesToTarget = (builtins.getAttr targetName invDistribution).services;
  in
  { name = serviceName;
    value = {
      name = serviceName;
      pkg = customPkgs.nginx-wrapper;
      # The reverse proxy depends on all services distributed to the same
      # machine, except itself (of course)
      dependsOn = builtins.removeAttrs servicesToTarget [ serviceName ];
      type = "wrapper";
    };
  }
) (builtins.attrNames invDistribution))

To generate the nginx services, we iterate over a so-called inverse distribution model mapping targets to services that has been computed from the distribution model (mapping services to one or more machines in the network).

The inverse distribution model is basically just the infrastructure model in which each target attribute set has been augmented with a services attribute containing the properties of the services that have been deployed to it. The services attribute refers to an attribute set in which each key is the name of the service and each value the service configuration properties defined in the services model:

{
  test1 = {
    services = {
      nginx-wrapper-test1 = ...
      webapp1 = ...
      webapp2 = ...
    };
    hostname = "test1";
  };
  
  test2 = {
    services = {
      nginx-wrapper-test2 = ...
      webapp3 = ...
      webapp4 = ...
    };
    hostname = "test2";
  };
}

For example, if we refer to invDistribution.test1.services we get all the configurations of the services that are deployed to machine test1. If we remove the reference to the nginx reverse proxy, we can pass this entire attribute set as inter-dependencies to configure the reverse proxy on machine test1. (The reason why we remove the reverse proxy as a dependency is because it is meaningless to let it refer to itself. Furthermore, this would also cause infinite recursion).

With this approach we can also easily scale up the environment. By simply adding more machines in the infrastructure model and additional web application service mappings in the distribution model, the service configurations in the service model get adjusted automatically not requiring us to think about specifying inter-dependencies at all.

Conclusion


To make target-specific service deployment possible, you need to explicitly define service configurations for specific target machines in the Disnix services model and distribute them to the right targets.

Unfortunately, manually specifying target-specific services is quite tedious, inefficient and laborious, in particular in big environments. A better solution would be to generate the configurations of target-specific services.

To make generation more convenient, you may have to refer to the infrastructure model and you need to know which services are deployed to each target.

I have integrated the inverse distribution generation feature into the latest development version of Disnix and it will become part of the next Disnix release.

Moreover, I have developed yet another example package, called the Disnix virtual hosts example, to demonstrate how it can be used.

Wednesday, July 29, 2015

Assigning port numbers to (micro)services in Disnix deployment models

I have been working on many Disnix related aspects for the last few months. For example, in my last blog post I have announced a new Disnix release supporting experimental state management.

Although I am quite happy with the most recent feature addition, another major concern that the basic Disnix toolset does not solve is coping with the dynamism of the services and the environment in which a system has been deployed.

Static modeling of services and the environment have the following consequences:

  • We must write an infrastructure model reflecting all relevant properties of all target machines. Although writing such a configuration file for a new environment is doable, it is quite tedious and error prone to keep it up to date and in sync with their actual configurations -- whenever a machine's property or the network changes, the infrastructure model must be updated accordingly.

    (As a sidenote: when using the DisnixOS extension, a NixOS network model is used instead of an infrastructure model from which the machine's configurations can be automatically deployed making the consistency problem obsolete. However, the problem remains to persist if we need to deploy to a network of non-NixOS machines)
  • We must manually specify the distribution of services to machines. This problem typically becomes complicated if services have specific technical requirements on the host that they need to run (e.g. operating system, CPU architecture, infrastructure components such as an application server).

    Moreover, a distribution could also be subject to non-functional requirements. For example, a service providing access to privacy-sensitive data should not be deployed to a machine that is publicly accessible from the internet.

    Because requirements may be complicated, it is typically costly to repeat the deployment planning process whenever the network configuration changes, especially if the process is not automated.

To cope with the above listed issues, I have developed a prototype extension called Dynamic Disnix and wrote a paper about it. The extension toolset provides the following:

  • A discovery service that captures the properties of the machines in the network from which an infrastructure model is generated.
  • A framework allowing someone to automate deployment planning processes using a couple of algorithms described in the literature.

Besides the dynamism of the infrastructure model and distribution models, I also observed that the services model (capturing the components of which a system consists) may be too static in certain kinds of situations.

Microservices


Lately, I have noticed that some kind of new paradigm named Microservice architectures is gaining a lot of popularity. In many ways this new trend reminds me of the service-oriented architectures days -- everybody was talking about it and had success stories, but nobody had a full understanding of it, nor an idea what it exactly was supposed to mean.

However, if I would restrict myself to some of their practical properties, microservices (like "ordinary" services in a SOA-context) are software components and one important trait (according to Clemens Szyperski's Component Software book) is that a software component:

is a unit of independent deployment

Another important property of microservices is that they interact with other by sending messages through the HTTP communication protocol. In practice, many people accomplish this by running processes with an embedded HTTP server (as opposed to using application servers or external web servers).

Deploying Microservices with Disnix


Although Disnix was originally developed to deploy a "traditional" service-oriented system case-study (consisting of "real" web services using SOAP as communication protocol), it has been made flexible enough to deploy all kinds of components. Likewise, Disnix can also deploy components that qualify themselves as microservices.

However, when deploying microservices (running embedded HTTP servers) there is one practical issue -- every microservice must listen on their own unique TCP port on a machine. Currently, meeting this requirement is completely the responsibility of the person composing the Disnix deployment models.

In some cases, this problem is more complicated than expected. For example, manually assigning a unique TCP port to every service for the initial deployment is straight forward, but it may also be desired to move a service from one machine to another. It could happen that a previously assigned TCP port will conflict with another service after moving it, breaking the deployment of the system.

The port assignment problem


So far, I take the following aspects into account when assignment ports:

  • Each service must listen on a port that is unique to the machine the service runs on. In some cases, it may also be desirable to assign a port that is unique to the entire network (instead of a single machine) so that it can be uniformly accessed regardless of its location.
  • The assigned ports must be within a certain range so that (for example) they do not collide with system services.
  • Once a port number has been assigned to a service, it must remain reserved until it gets undeployed.

    The alternative would be to reassign all port numbers to all services for each change in the network, but that can be quite costly in case of an upgrade. For example, if we upgrade a network running 100 microservices, all 100 of them may need to be deactivated and activated to make them listen on their newly assigned ports.

Dynamically configuring ports in Disnix models


Since it is quite tedious and error prone to maintain port assignments in Disnix models, I have developed a utility to automate the process. To dynamically assign ports to services, they must be annotated with the portAssign property in the services model (which can be changed to any other property through a command-line parameter):

{distribution, system, pkgs}:

let
  portsConfiguration = if builtins.pathExists ./ports.nix
    then import ./ports.nix else {};
  ...
in
rec {
  roomservice = rec {
    name = "roomservice";
    pkg = customPkgs.roomservicewrapper { inherit port; };
    dependsOn = {
      inherit rooms;
    };
    type = "process";
    portAssign = "private";
    port = portsConfiguration.ports.roomservice or 0;
  };

  ...

  stafftracker = rec {
    name = "stafftracker";
    pkg = customPkgs.stafftrackerwrapper { inherit port; };
    dependsOn = {
      inherit roomservice staffservice zipcodeservice;
    };
    type = "process";
    portAssign = "shared";
    port = portsConfiguration.ports.stafftracker or 0;
    baseURL = "/";
  };
}

In the above example, I have annotated the roomservice component with a private port assignment property meaning that we want to assign a TCP port that is unique to the machine and the stafftracker component with a shared port assignment meaning that we want to assign a TCP port that is unique to the network.

By running the following command we can assign port numbers:

$ dydisnix-port-assign -s services.nix -i infrastructure.nix \
    -d distribution.nix > ports.nix

The above command generates a port assignment configuration Nix expression (named: ports.nix) that contains port reservations for each service and port assignment configurations for the network and each individual machine:

{
  ports = {
    roomservice = 8001;
    ...
    zipcodeservice = 3003;
  };
  portConfiguration = {
    globalConfig = {
      lastPort = 3003;
      minPort = 3000;
      maxPort = 4000;
      servicesToPorts = {
        stafftracker = 3002;
      };
    };
    targetConfigs = {
      test2 = {
        lastPort = 8001;
        minPort = 8000;
        maxPort = 9000;
        servicesToPorts = {
          roomservice = 8001;
        };
      };
    };
  };
}

The above configuration attribute set contains three properties:

  • The ports attribute contains the actual port numbers that have been assigned to each service. The services defined in the services model (shown earlier) refer to the port values defined here.
  • The portConfiguration attribute contains port configuration settings for the network and each target machine. The globalConfig attribute defines a TCP port range with ports that must be unique to the network. Besides the port range it also stores the last assigned TCP port number and all global port reservations.
  • The targetConfigs attribute contains port configuration settings and reservations for each target machine.

We can also run the port assign command-utility again with an existing port assignment configuration as a parameter:

$ dydisnix-port-assign -s services.nix -i infrastructure.nix \
    -d distribution.nix -p ports.nix > ports2.nix

The above command-line invocation reassigns TCP ports, taking the previous port reservations into account so that these will be reused where possible (e.g. only new services get a port number assigned). Furthermore, it also clears all port reservations of the services that have been undeployed. The new port assignment configuration is stored in a file called ports2.nix.

Conclusion


In this blog post, I have identified another deployment planning problem that manifests itself when deploying microservices that all have to listen on a unique TCP port. I have developed a utility to automate this process.

Besides assigning port numbers, there are many other kinds of problems that need a solution while deploying microservices. For example, you might also want to restrict their privileges (e.g. by running all of them as separate unprivileged users). It is also possible to take care of that with Dysnomia.

Availability


The dydisnix-port-assign utility is part of the Dynamic Disnix toolset that can be obtained from my GitHub page. Unfortunately, the Dynamic Disnix toolset is still a prototype with no end-user documentation or a release, so you have to be brave to use it.

Moreover, I have created yet another Disnix example package (a Node.js variant of the ridiculous StaffTracker example) to demonstrate how "microservices" can be deployed. This particular variant uses Node.js as implementation platform and exposes the data sets through REST APIs. All components are microservices using Node.js' embedded HTTP server listening on their own unique TCP ports.

I have also modified the TCP proxy example to use port assignment configurations generated by the tool described in this blog post.

Wednesday, July 8, 2015

Deploying state with Disnix


A couple of months ago, I announced a new Disnix release after a long period of only little development activity.

As I have explained earlier, Disnix's main purpose is to automatically deploy service-oriented systems into heterogeneous networks of machines running various kinds of operating systems.

In addition to automating deployment, it has a couple of interesting non-functional properties as well. For example, it supports reliable deployment, because components implementing services are stored alongside existing versions and older versions are never automatically removed. As a result, we can always roll back to the previous configuration in case of a failure.

However, there is one major unaddressed concern when using Disnix to deploy a service-oriented system. Like the Nix the package manager -- that serves as the basis of Disnix --, Disnix does not manage state.

The absence of state management has a number of implications. For example, when deploying a database, it gets created on first startup, often with a schema and initial data set. However, the structure and contents of a database typically evolves over time. When updating a deployment configuration that (for example) moves a database from one machine to another, the changes that have been made since its initial deployment are not migrated.

So far, state management in combination with Disnix has always been a problem that must be solved manually or by using an external solution. For a single machine, manual state management is often tedious but still doable. For large networks of machines, however, it may become a problem that is too big too handle.

A few years ago, I rushed out a prototype tool called Dysnomia to address state management problems in conjunction with Disnix and wrote a research paper about it. In the last few months, I have integrated the majority of the concepts of this prototype into the master versions of Dysnomia and Disnix.

Executing state management activities


When deploying a service oriented system with Disnix, a number of deployment activities are executed. For the build and distribution activities, Disnix consults the Nix package manager.

After all services have been transferred, Disnix activates them and deactivates the ones that have become obsolete. Disnix consults Dysnomia to execute these activities through a plugin system that delegates the execution of these steps to an appropriate module for a given service type, such as a process, source code repository or a database.

Deployment activities carried out by Dysnomia require two mandatory parameters. The first parameter is a container specification capturing the properties of a container that hosts one or more mutable components. For example, a MySQL DBMS instance can be specified as follows:

type=mysql-database
mysqlUsername=root
mysqlPassword=verysecret

The above specification states the we have a container of type mysql-database that can be reached using the above listed credentials. The type attribute allows Dysnomia to invoke the module that executes the required deployment steps for MySQL.

The second parameter refers to a logical representation of the initial state of a mutable component. For example, a MySQL database is represented as a script that generates its schema:

create table author
( AUTHOR_ID  INTEGER       NOT NULL,
  FirstName  VARCHAR(255)  NOT NULL,
  LastName   VARCHAR(255)  NOT NULL,
  PRIMARY KEY(AUTHOR_ID)
);

create table books
( ISBN       VARCHAR(255)  NOT NULL,
  Title      VARCHAR(255)  NOT NULL,
  AUTHOR_ID  VARCHAR(255)  NOT NULL,
  PRIMARY KEY(ISBN),
  FOREIGN KEY(AUTHOR_ID) references author(AUTHOR_ID)
    on update cascade on delete cascade
);

A MySQL database can be activated in a MySQL DBMS, by running the following command-line instruction with the two configuration files shown earlier as parameters:

$ dysnomia --operation activate \
  --component ~/testdb \
  --container ~/mysql-production

The above command first checks if a MySQL database named testdb exists. If it does not exists, it gets created and the initial schema is imported. If the database with the given name exists already, it does nothing.

With the latest Dysnomia, it is also possible to run snapshot operations:
$ dysnomia --operation snapshot \
  --component ~/testdb \
  --container ~/mysql-production

The above command invokes the mysqldump utility to take a snapshot of the testdb in a portable and consistent manner and stores the output in a so-called Dysnomia snapshot store.

When running the following command-line instruction, the contents of the snapshot store is displayed for the MySQL container and testdb component:

$ dysnomia-snapshots --query-all --container mysql-database --component testdb
mysql-production/testdb/9b0c3562b57dafd00e480c6b3a67d29146179775b67dfff5aa7a138b2699b241
mysql-production/testdb/1df326254d596dd31d9d9db30ea178d05eb220ae51d093a2cbffeaa13f45b21c
mysql-production/testdb/330232eda02b77c3629a4623b498855c168986e0a214ec44f38e7e0447a3f7ef

As may be observed, the dysnomia-snapshots utility outputs three relative paths that correspond to three snapshots. The paths reflect over a number of properties, such as the container name and component name. The last path component is a SHA256 hash code reflecting its contents (that is computed from the actual dump).

Each container type follows its own naming convention to reflect its contents. While MySQL and most of the other Dysnomia modules use output hashes, also different naming conventions are used. For example, the Subversion module uses the revision id of the repository.

A naming convention using an attribute to reflect its contents has all kinds of benefits. For example, if the MySQL database does not change and we run the snapshot operation again, it discovers that a snapshot with the same output hash already exists, preventing it to store the same snapshot twice improving storage efficiency.

The absolute versions of the snapshot paths can be retrieved with the following command:

$ dysnomia-snapshots --resolve mysql-database/testdb/330232eda02b77c3629a4623b498855c...
/var/state/dysnomia/snapshots/mysql-production/testdb/330232eda02b77c3629a4623b498855...

Besides snapshotting, it is also possible to restore state with Dysnomia:

$ dysnomia --operation restore \
  --component ~/testdb \
  --container ~/mysql-production

The above command restores the latest snapshot generation. If no snapshot exist in the store, it does nothing.

Finally, it is also possible to clean things up. Similar to the Nix package manager, old components are never deleted automatically, but must be explicitly garbage collected. For example, deactivating the MySQL database can be done as follows:

$ dysnomia --operation deactivate \
  --component ~/testdb \
  --container ~/mysql-production

The above command does not delete the MySQL database. Instead, it simply marks it as garbage, but otherwise keeps it. Actually deleting the database can be done by invoking the garbage collect operation:

$ dysnomia --operation collect-garbage \
  --component ~/testdb \
  --container ~/mysql-production

The above command first checks whether the database has been marked as garbage. If this is the case (because it has been deactivated) it is dropped. Otherwise, this command does nothing (because we do not want to delete stuff that is actually in use).

Besides the physical state of components, also all generations of snapshots in the store are kept by default. They can be removed by running the snapshot garbage collector:

$ dysnomia-snapshots --gc --keep 3

The above command states that all but the last 3 snapshot generations should be removed from the snapshot store.

Managing state of service-oriented systems


With the new snapshotting facilities provided by Dysnomia, we have extended Disnix to support state deployment of service-oriented systems.

By default, the new version of Disnix does not manage state and its behaviour remains exactly the same as the previous version, i.e. it only manages the static parts of the system. To allow Disnix to manage state of services, they must be explicitly annotated as such in the services model:

staff = {
  name = "staff";
  pkg = customPkgs.staff;
  dependsOn = {};
  type = "mysql-database";
  deployState = true;
};

Adding the attribute deployState to a service that is set to true causes Disnix to manage its state as well. For example, when changing the target machine of the database in the distribution model and by running the following command:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix executes the data migration phase after the configuration has been successfully activated. In this phase, Disnix snapshots the state of the annotated services on the target machines, transfers the snapshots to the new targets (through the coordinator machine), and finally restores their state.

In addition to data migration, Disnix can also be used as a backup tool. Running the following command:

$ disnix-snapshot

Captures the state of all annotated services in the configuration that has been previously deployed and transfers their snapshots to the coordinator machine's snapshot store.

Likewise, the snapshots can be restored as follows:

$ disnix-restore

By default, the above command only restores the state of the services that are in the last configuration, but not in the configuration before. However, it may also be desirable to force the state of all annotated services in the current configuration to be restored. This can be done as follows:

$ disnix-restore --no-upgrade

Finally, the snapshots that are taken on the target machines are not deleted automatically. Disnix can also automatically clean the snapshot stores of a network of machines:

$ disnix-clean-snapshots --keep 3 infrastructure.nix

The above command deletes all but the last three snapshot generations from all machines defined in the infrastructure model.

Discussion


The extended implementations of Dysnomia and Disnix implement the majority of concepts described in my earlier blog post and the corresponding paper. However, there are a number of things that are different:

  • The prototype implementation stores snapshots in the /dysnomia folder (analogous to the Nix store that resides in /nix/store), which is a non-FHS compliant directory. Nix has a number of very good reasons to deviate from the FHS and requires packages to be addressed by their absolute paths across machines so that they can be uniformly accessed by a dynamic linker.

    However, such a level of strictness is not required for addressing snapshots. In the current implementation, snapshots are stored in /var/state/dysnomia which is FHS-compliant. Furthermore, snapshots are identified by their relative paths to the store folder. The snapshot store's location can be changed by setting the DYSNOMIA_STATEDIR environment variable, allowing someone to have multiple snapshot stores.
  • In the prototype, the semantics of the deactivate operation also imply deleting the state of mutable component in a container. As this is a dangerous and destructive operation, the current implementation separates the actual delete operation into a garbage collect operation that must be invoked explicitly.
  • In both the prototype and the current implementation, a Dysnomia plugin can choose its own naming convention to identify snapshots. In the prototype, the naming must reflect both the contents and the order in which the snapshots have been taken. As a general fallback, I proposed using timestamps.

    However, timestamps are unreliable in a distributed setting because the machines' clocks may not be in sync. In the current implementation, I use output hashes as a general fallback. As hashes cannot reflect the order in their names, Dysnomia provides a generations folder containing symlinks to snapshots which names reflect the order in which they have been taken.

The paper also describes two concepts that are still unimplemented in the current master version:

  • The incremental snapshot operation is unimplemented. Although this feature may sound attractive, I could only properly do this with Subversion repositories and MySQL databases with binary logging enabled.
  • To upgrade a service-oriented system (that includes moving state) atomically, access to the system in the transition and data migration phases must be blocked/queued. However, when moving large data sets, this time window could be incredibly big.

    As an optimization, I proposed an improved upgrade process in which incremental snapshots are transferred inside the locking time window, while full snapshots are transferred before the locking starts. Although it may sound conceptually nice, it is difficult to properly apply it in practice. I may still integrate it some day, but currently I don't need it. :)

Finally, there are some practical notes when using Dysnomia's state management facilities. Its biggest drawback is that the way state is managed (by consulting tools that store dumps on the filesystem) is typically expensive in terms of time (because it may take a long time writing a dump to disk) and storage. For very large databases, the costs may actually be too high.

As described in the previous blog post and the corresponding paper, there are alternative ways of doing state management:

  • Filesystem-level snapshotting is typically faster since files only need to be copied. However, its biggest drawback is that physical state may be inconsistent (because of unfinished write operations) and non-portable. Moreover, it may be difficult to manage individual chunks of state. NixOps, for example, supports partition-level state management of EBS volumes.
  • Database replication engines can also typically capture and transfer state much more efficiently.

Because Dysnomia's way of managing state has some expensive drawbacks, it has not been enabled by default in Disnix. Moreover, this was also the main reason why I did not integrate the features of the Dysnomia prototype sooner.

The reason why I have proceeded anyway, is that I have to manage a big environment of small databases, which sizes are only several megabytes each. For such an environment, Dysnomia's snapshotting facilities work fine.

Availability


The state management facilities described in this blog post are part of Dysnomia and Disnix version 0.4. I also want to announce their immediate availability! Visit the Disnix homepage for more information!

As with the previous release, Disnix still remains a tool that should be considered an advanced prototype, despite the fact that I am using it on a daily basis to eat my own dogfood. :)