Monday, November 11, 2019

A Nix-based functional organization for managing processes

The Nix expression language and the Nix packages repository follow a number of unorthodox, but simple conventions that provide all kinds of benefits, such as the ability to conveniently construct multiple variants of packages and store them safely in isolation without any conflicts.

The scope of the Nix package manager, however, is limited to package deployment only. Other tools in the Nix project extend deployment to other kinds of domains, such as machine level deployment (NixOS), networks of machines (NixOps) and service-oriented systems (Disnix).

In addition to packages, there is also a category of systems (such as systems following the microservices paradigm) that are composed of running processes.

Recently, I have been automating deployments of several kinds of systems that are composed of running processes and I have investigated how we can map the most common Nix packaging conventions to construct specifications that we can use to automate the deployment of these kinds of systems.

Some common Nix packaging conventions


The Nix package manager implements a so-called purely functional deployment model. In Nix, packages are constructed in the Nix expression language from pure functions in which side effects are eliminated as much as possible, such as undeclared dependencies residing in global directories, such as /lib and /bin.

The function parameters of a build function refer to all required inputs to construct the package, such as the build instructions, the source code, environment variables and all required build-time dependencies, such as compilers, build tools and libraries.

A big advantage of eliminating side effects (or more realistically: significantly reducing side effects) is to support reproducible deployment -- when building the same package with the same inputs on a different machine, we should get a (nearly) bit-identical result.

Strong reproducibility guarantees, for example, make it possible to optimize package deployments by only building a package from source code once and then downloading binary substitutes from remote servers that can be trusted.

In addition to the fact that packages are constructed by executing pure functions (with some caveats), the Nixpkgs repository -- that contains a large set of well known free and open source packages -- follows a number of conventions. One of such conventions is that most package build recipes reside in separate files and that each recipe declares a function.

An example of such a build recipe is:

{ stdenv, fetchurl, pkgconfig, glib, gpm, file, e2fsprogs
, perl, zip, unzip, gettext, libssh2, openssl}:

stdenv.mkDerivation rec {
  pname = "mc";
  version = "4.8.23";

  src = fetchurl {
    url = "http://www.midnight-commander.org/downloads/${pname}-${version}.tar.xz";
    sha256 = "077z7phzq3m1sxyz7li77lyzv4rjmmh3wp2vy86pnc4387kpqzyx";
  };

  buildInputs = [
    pkgconfig perl glib slang zip unzip file gettext libssh2 openssl
  ];

  configureFlags = [ "--enable-vfs-smb" ];

  meta = {
    description = "File Manager and User Shell for the GNU Project";
    homepage = http://www.midnight-commander.org;
    maintainers = [ stdenv.lib.maintainers.sander ];
    platforms = with stdenv.lib.platforms; linux ++ darwin;
  };
}

The Nix expression shown above (pkgs/tools/misc/mc/default.nix) describes how to build the Midnight Commander from source code and its inputs:

  • The first line declares a function in which the function arguments refer to all dependencies required to build Midnight Commander: stdenv refers to an environment that provides standard UNIX utilities, such as cat and ls and basic build utilities, such as gcc and make. fetchurl is a utility function that can be used to download artifacts from remote locations and that can verify the integrity of the downloaded artifact.

    The remainder of the function arguments refer to packages that need to be provided as build-time dependencies, such as tools and libraries.
  • In the function body, we invoke the stdenv.mkDerivation function to construct a Nix package from source code.

    By default, if no build instructions are provided, it will automatically execute the standard GNU Autotools/GNU Make build procedure: ./configure; make; make install, automatically downloads and unpacks the tarball specified by the src parameter, and uses buildInputs to instruct the configure script to automatically find the dependencies it needs.

A function definition that describes a package build recipe is not very useful on its own -- to be able to build a package, it needs to be invoked with the appropriate parameters.

A Nix package is composed in a top-level Nix expression (pkgs/top-level/all-packages.nix) that declares one big data structure: an attribute set, in which every attribute name refers to a possible variant of a package (typically only one) and each value to a function invocation that builds the package, with the desired versions of variants of the dependencies that a package may need:

{ system ? builtins.currentSystem }:

rec {
  stdenv = ...
  fetchurl = ...
  pkgconfig = ...
  glib = ...

  ...

  openssl = import ../development/libraries/openssl {
    inherit stdenv fetchurl zlib ...;
  };

  mc = import ../tools/misc/mc {
    inherit stdenv fetchurl pkgconfig glib gpm file e2fsprogs perl;
    inherit zip unzip gettext libssh2 openssl;
  };
}

The last attribute (mc) in the attribute set shown above, builds a specific variant of Midnight Commander, by passing the dependencies that it needs as parameters. It uses the inherit language construct to bind the parameters that are declared in the same lexical scope.

All the dependencies that Midnight Commander needs are declared in the same attribute set and composed in a similar way.

(As a sidenote: in the above example, we explicitly propagate all function parameters, which is quite verbose and tedious. In Nixpkgs, it is also possible to use a convenience function called: callPackage that will automatically pass the attributes with the same names as the function arguments as parameters.)

With the composition expression above and running the following command-line instruction:

$ nix-build all-packages.nix -A mc
/nix/store/wp3r8qv4k510...-mc-4.8.23

The Nix package manager will first deploy all build-time dependencies that Midnight Commander needs, and will then build Midnight Commander from source code. The build result is stored in the Nix store (/nix/store/...-mc-4.8.23), in which all build artifacts reside in isolation in their own directories.

We can start Midnight Commander by providing the full path to the mc executable:

$ /nix/store/wp3r8qv4k510...-mc-4.8.23/bin/mc

The prefix of every artifact in the Nix store is a SHA256 hash code derived from all inputs provided to the build function. The SHA256 hash prefix makes it possible to safely store multiple versions and variants of the same package next to each other, because they never share the same name.

If Nix happens to compute a SHA256 that is already in the Nix store, then the build result is exactly the same, preventing Nix from doing the same build again.

Because the Midnight Commander build recipe is a function, we can also adjust the function parameters to build different variants of the same package. For example, by changing the openssl parameter, we can build a Midnight Commander variant that uses a specific version of OpenSSL that is different than the default version:

{ system ? builtins.currentSystem }:

rec {
  stdenv = ...
  fetchurl = ...
  pkgconfig = ...
  glib = ...

  ...

  openssl_1_1_0 = import ../development/libraries/openssl/1.1.0.nix {
    inherit stdenv fetchurl zlib ...;
  };

  mc_alternative = import ../tools/misc/mc {
    inherit stdenv fetchurl pkgconfig glib gpm file e2fsprogs perl;
    inherit zip unzip gettext libssh2;
    openssl = openssl_1_1_0; # Use a different OpenSSL version
  };
}

We can build our alternative Midnight Commander variant as follows:

$ nix-build all-packages.nix -A mc_alternative
/nix/store/0g0wm23y85nc0y...-mc-4.8.23

As may be noticed, we get a different Nix store path, because we build Midnight Commander with different build inputs.

Although the purely functional model provides all kinds of nice benefits (such as reproducibility, the ability conveniently construct multiple variants of a package, and storing them in isolation without any conflicts), it also has a big inconvenience from a user point of view -- as a user, it is very impractical to remember the SHA256 hash prefixes of a package to start a program.

As a solution, Nix also makes it possible to construct user environments (probably better known as Nix profiles), by using the nix-env tool or using the buildEnv {} function in Nixpkgs.

User environments are symlink trees that blend the content of a set of packages into a single directory in the Nix store so that they can be accessed from one single location. By adding the bin/ sub folder of a user environment to the PATH environment variable, it becomes possible for a user to start a command-line executable without specifying a full path.

For example, with the nix-env tool we can install the Midnight Commander in a Nix profile:

$ nix-env -f all-packages.nix -iA mc

and then start it as follows:

$ mc

The above command works if the Nix profile is in the PATH environment variable of the user.

Mapping packaging conventions to process management


There are four important packaging conventions that the Nix package manager and the Nixpkgs repository follow that I want to emphasize:

  • Invoking the derivation function (typically through stdenv.mkDerivation or an abstraction built around it) builds a package from its build inputs.
  • Every package build recipe defines a function in which the function parameters refer to all possible build inputs. We can use this functions to compose all kinds of variants of a package.
  • Invoking a package build recipe function constructs a particular variant of a package and stores the result in the Nix store.
  • Nix profiles blend the content of a collection of packages into one directory and makes them accessible from a single location.

(As a sidenote: There is some discussion in the Nix community about these concepts. For example, one of the (self-)criticisms is that the Nix expression language, that is specifically designed as a DSL for package management, has no package concept in the language.

Despite this oddity, I personally think that functions are a simple and powerful concept. The only thing that is a bit of a poor decision is to call the mechanism that executes build is called: derivation which sounds a bit abstract).

Process management is quite different from package management -- we need to have an executable deployed first (typically done by a package manager, such as Nix), but in addition, we also need to manage the life-cycle of a process, such as starting and stopping it. These facilities are not Nix's responsibility. Instead, we need to work with a process manager that can facilitate these.

Furthermore, systems composed of running processes have a kind of dependency relationship that Nix does not manage -- they may also communicate with other processes (e.g. via a network connection or UNIX domain sockets).

As a consequence, they require the presence of other processes in order to work. This means that processes need to be activated in the right order or, alternatively, the communication between two dependent processes need to be queued until both are available.

If these dependency requirements are not met, then a system may not work. For example, a web application process is useless if the database backend is not available.

In order to fully automate the deployment of systems that are composed of running processes, we can do package management with Nix first and then we need to:

  • Integrate with a process manager, by generating artifacts that a process manager can work with, such as scripts and/or configuration files.
  • Make it possible to specify the process dependencies so that they can be managed (by a process manager or by other means) and activated in the right order.

Generating sysvinit scripts


There a variety of means to manage processes. A simple (and for today's standards maybe an old fashioned and perhaps controversial) way to manage processes is by using sysvinit scripts (also known as LSB Init compliant scripts).

A sysvinit script implements a set of activities and a standardized interface allowing us to manage the lifecycle of a specific process, or a group of processes.

For example, on a traditional Linux distribution, we can start a process, such as the Nginx web server, with the following command:

$ /etc/init.d/nginx start

and stop it as follows:

$ /etc/init.d/nginx stop

A sysvinit script is straight forward to implement and follows a number of conventions:

#!/bin/bash

## BEGIN INIT INFO
# Provides:      nginx
# Default-Start: 3 4 5
# Default-Stop:  0 1 2 6
# Should-Start:  webapp
# Should-Stop:   webapp
# Description:   Nginx
## END INIT INFO

. /lib/lsb/init-functions

case "$1" in
  start)
    log_info_msg "Starting Nginx..."
    mkdir -p /var/nginx/logs
    start_daemon /usr/bin/nginx -c /etc/nginx.conf -p /var/nginx 
    evaluate_retval
    ;;

  stop)
    log_info_msg "Stopping Nginx..."
    killproc /usr/bin/nginx
    evaluate_retval
    ;;

  reload)
    log_info_msg "Reloading Nginx..."
    killproc /usr/bin/nginx -HUP
    evaluate_retval
    ;;

  restart)
    $0 stop
    sleep 1
    $0 start
    ;;

  status)
    statusproc /usr/bin/nginx
    ;;

  *)
    echo "Usage: $0 {start|stop|reload|restart|status}"
    exit 1
    ;;
esac

  • A sysvinit script typically starts by providing some metadata, such a description, in which runlevels it needs to be started and stopped, and which dependencies the script has.

    In classic Linux distributions, meta information is typically ignored, but more sophisticated process managers, such as systemd, can use it to automatically configure the activation/deactivation ordering.
  • The body defines a case statement that executes a requested activity.
  • Activities use a special construct (in the example above it is: evaluate_retval) to display the status of an instruction, typically whether a process has started or stopped successfully or not, using appropriate colors (e.g. red in case of a failure, green in case of sucess).
  • sysvinit scripts typically define a number of commonly used activities: start starts a process, stop stops a process, reload sends a HUP signal to the process to let it reload its configuration (if applicable), restart restarts the process, status indicates the status, and there is a fallback activity that displays the usage to the end user to show which activities can be executed.

sysvinit scripts use number of utility functions that are defined by the Linux Standards Base (LSB):

  • start_daemon is a utility function that is typically used for starting a process. It has the expectation that the process daemonizes -- a process that daemonizes will fork another process that keeps running in the background and then terminates immediately.

    Controlling a daemonized processes is a bit tricky -- when spawning a process the shell can tell its process id (PID), so that it can be controlled, but it cannot tell you the PID of the process that gets daemonized by the invoked process, because that is beyond the shell's control.

    As a solution, most programs that daemonize will write a PID file (e.g. /var/run/nginx.pid) that can be used to determine the PID of the daemon so that it can be controlled.

    To do proper housekeeping, the start_daemon function will check whether such a PID file already exists, and will only start the process when it needs to.
  • Stopping a process, or sending it a different kind of signal, is typically done with the killproc function.

    This function will search for the corresponding PID file of the process (by default, a PID file that has the same name as the executable or a specified PID file) and uses the corresponding PID content to terminate the daemon. As a fallback, if no PID file exists, it will scan the entire process table and kills the process with the same name.
  • We can determine the status of a process (e.g. whether it is running or not), with the statusproc function that also consults the corresponding PID file or scans the process table if needed.

Most common system software have the ability to deamonize, such as nginx, the Apache HTTP server, MySQL and PostgreSQL. Unfortunately, application services (such as microservices) that are implemented with technologies such as Python, Node.js or Java Springboot do not have this ability out of the box.

Fortunately, we can use an external utility, such as libslack's daemon command, to let these foreground-only processes daemonize. Although it is possible to conveniently daemonize external processes, this functionality is not part of the LSB standard.

For example, using the following command to start the web application front-end process will automatically daemonize a foreground process, such as a simple Node.js web application, and creates a PID file so that it can be controlled by the sysvinit utility functions:

$ daemon -U -i /home/sander/webapp/app.js

In addition to manually starting and stopping sysvinit script, sysvinit scripts are also typically started on startup and stopped on shutdown, or when a user switches between runlevels. These processes are controlled by symlinks that reside in an rc.d directory that have specific prefixes:

/etc/
  init.d/
    webapp
    nginx
  rc0.d/
    K98nginx -> ../init.d/nginx
    K99webapp -> ../init.d/webapp
  rc1.d/
    K98nginx -> ../init.d/nginx
    K99webapp -> ../init.d/webapp
  rc2.d/
    K98nginx -> ../init.d/nginx
    K99webapp -> ../init.d/webapp
  rc3.d/
    S00webapp -> ../init.d/nginx
    S01nginx -> ../init.d/webapp
  rc4.d/
    S00webapp -> ../init.d/nginx
    S01nginx -> ../init.d/webapp
  rc5.d/
    S00webapp -> ../init.d/nginx
    S01nginx -> ../init.d/webapp
  rc6.d/
    K98nginx -> ../init.d/nginx
    K99webapp -> ../init.d/webapp

In the above directory listing, every rc?.d directory contains symlinks to scripts in the init.d directory.

The first character of each symlink file indicates whether an init.d script should be started (S) or stopped (K). The two numeric digits that follow indicate the order in which the scripts need to be started and stopped.

Each runlevel has a specific purpose as described in the LSB standard. In the above situation, when we boot the system in multi-user mode on the console (run level 3), first our Node.js web application will be started, followed by nginx. On a reboot (when we enter runlevel 6) nginx and then the web application will be stopped. Basically, the stop order is the reverse of the start order.

To convienently automate the deployment of sysvinit scripts, I have created a utility function called: createSystemVInitScript that makes it possible to generate sysvinit script with the Nix package manager.

We can create a Nix expression that generates a sysvinit script for nginx, such as:

{createSystemVInitScript, nginx}:

let
  configFile = ./nginx.conf;
  stateDir = "/var";
in
createSystemVInitScript {                                                                                                                                                                                          
  name = "nginx";
  description = "Nginx";
  activities = {
    start = ''
      mkdir -p ${stateDir}/logs
      log_info_msg "Starting Nginx..."
      loadproc ${nginx}/bin/nginx -c ${configFile} -p ${stateDir}
      evaluate_retval
    '';
    stop = ''
      log_info_msg "Stopping Nginx..."
      killproc ${nginx}/bin/nginx
      evaluate_retval
    '';
    reload = ''
      log_info_msg "Reloading Nginx..."
      killproc ${nginx}/bin/nginx -HUP
      evaluate_retval
    '';
    restart = ''
      $0 stop
      sleep 1
      $0 start
    '';
    status = "statusproc ${nginx}/bin/nginx";
  };
  runlevels = [ 3 4 5 ];
}

The above expression defines a function in which the function parameters refer to all dependencies that we need to construct the sysvinit script to manage a nginx server: createSystemVInitScript is the utility function that create sysvinit scripts, nginx is the package that provides Nginx.

In the body, we invoke the: createSystemVInitScript to construct a sysvinit script:

  • The name corresponds to name of the sysvinit script and the description to the description displayed in the metadata header.
  • The activities parameter refers to an attribute set in which every name refers to an activity and every value to the shell commands that need to be executed for this activity.

    We can use this parameter to specify the start, stop, reload, restart and status activities for nginx. The function abstraction will automatically configure the fallback activity that displays the usage to the end-user including the activities that the script supports.
  • The runlevels parameter indicates in which runlevels the init.d script should be started. For these runlevels, the function will create start symlinks. An implication is that for the runlevels that are not specified (0, 1, 2, and 6) the script will automatically create stop symlinks.

As explained earlier, sysvinit script use conventions. One of such conventions is that most activities typically display a description, then execute a command, and finally display the status of that command, such as:

log_info_msg "Starting Nginx..."
loadproc ${nginx}/bin/nginx -c ${configFile} -p ${stateDir}
evaluate_retval

The createSystemVInit script also a notion of instructions, that are automatically translated into activities displaying task descriptions (derived from the general description) and the status. Using the instructions parameter allows us to simplify the above expression to:

{createSystemVInitScript, nginx}:

let
  configFile = ./nginx.conf;
  stateDir = "/var";
in
createSystemVInitScript {                                                                                                                                                                                          
  name = "nginx";
  description = "Nginx";
  instructions = {
    start = {
      activity = "Starting";
      instruction = ''
        mkdir -p ${stateDir}/logs
        loadproc ${nginx}/bin/nginx -c ${configFile} -p ${stateDir}
      '';
    };
    stop = {
      activity = "Stopping";
      instruction = "killproc ${nginx}/bin/nginx";
    };
    reload = {
      activity = "Reloading";
      instruction = "killproc ${nginx}/bin/nginx -HUP";
    };
  };
  activities = {
    status = "statusproc ${nginx}/bin/nginx";
  };
  runlevels = [ 3 4 5 ];
}

In the above expression, the start, stop and reload activities have been simplified by defining them as instructions allowing us to write less repetitive boilerplate code.

We can reduce the amount of boilerplate code even further -- the kind of activities that we need to implement for managing process are typically mostly the same. When we want to manage a process, we typically want a start, stop, restart, status activity and, if applicable, a reload activity if a process knows how to handle the HUP signal.

Instead of speciying activities or instructions, it is also possible to specify which process we want to manage, and what kind of parameters the process should take:

{createSystemVInitScript, nginx}:

let
  configFile = ./nginx.conf;
  stateDir = "/var";
in
createSystemVInitScript {                                                                                                                                                                                          
  name = "nginx";
  description = "Nginx";
  initialize = ''
    mkdir -p ${stateDir}/logs
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-c" configFile "-p" stateDir ];
  runlevels = [ 3 4 5 ];
}

From the process and args parameters, the createSystemVInitScript automatically derives all relevant activities that we need to manage the process. It is also still possible to augment or override the generated activities by means of the instructions or activities parameters.

Besides processes that already have the ability to daemonize, it is also possible to automatically daemonize foreground processes with this function abstraction. This is particularly useful to generate a sysvinit script for the Node.js web application service, that lacks this ability:

{createSystemVInitScript}:

let
  webapp = (import ./webapp {}).package;
in
createSystemVInitScript {
  name = "webapp";
  process = "${webapp}/lib/node_modules/webapp/app.js";
  processIsDaemon = false;
  runlevels = [ 3 4 5 ];
  environment = {
    PORT = 5000;
  };
}

In the above Nix expression, we set the parameter: processIsDaemon to false (the default value is: true) to indicate that the process is not a deamon, but a foreground process. The createSystemVInitScript function will generate a start activity that invokes the daemon command to daemonize it.

Another interesting feature is that we can specify process dependency relationships. For example, an nginx server can act as a reverse proxy for the Node.js web application.

To reliably activate the entire system, we must make sure that the web application process is deployed before Nginx is deployed. If we activate the system in the opposite order, then the reverse proxy may redirect users to an non-existent web application causing them to see 502 bad gateway errors.

We can use the dependency parameter with a reference to a sysvinit script to indicate that this sysvinit script has a dependency. For example, we can revise the Nginx sysvinit script expression as follows:

{createSystemVInitScript, nginx, webapp}:

let
  configFile = ./nginx.conf;
  stateDir = "/var";
in
createSystemVInitScript {                                                                                                                                                                                          
  name = "nginx";
  description = "Nginx";
  initialize = ''
    mkdir -p ${stateDir}/logs
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-c" configFile "-p" stateDir ];
  runlevels = [ 3 4 5 ];
  dependencies = [ webapp ];
}

In the above example, we pass the webapp sysvinit script as a dependency (through the dependencies parameter). Adding it as a dependency causes the generator to compute a start sequence number for the nginx script will be higher than the web app sysvinit script and stop sequence number lower than the web app script.

The different sequence numbers ensure that webapp is started before nginx starts, and that the nginx stops before the webapp stops.

Configuring managed processes


So far composing sysvinit scripts is still very similar to composing ordinary Nix packages. We can also extend the four Nix packaging conventions described in the introduction to create a process management discipline.

Similar to the convention in which every package is in a separate file, and defines a function in which the function parameters refers to all package dependencies, we can extend this convention for processes to also include relevant parameters to configure a service.

For example, we can write a Nix expression for the web application process as follows:

{createSystemVInitScript, port ? 5000}:

let
  webapp = (import /home/sander/webapp {}).package;
in
createSystemVInitScript {
  name = "webapp";
  process = "${webapp}/lib/node_modules/webapp/app.js";
  processIsDaemon = false;
  runlevels = [ 3 4 5 ];
  environment = {
    PORT = port;
  };
}

In the above expression, the port function parameter allows us to configure the TCP port where the web application listens to (and defaults to 5000).

We can also make the configuration of nginx configurable. For example, we can create a function abstraction that creates a configuration for nginx to let it act as a reverse proxy for the web application process shown earlier:

{createSystemVInitScript, stdenv, writeTextFile, nginx
, runtimeDir, stateDir, logDir, port ? 80, webapps ? []}:

let
  nginxStateDir = "${stateDir}/nginx";
in
import ./nginx.nix {
  inherit createSystemVInitScript nginx instanceSuffix;
  stateDir = nginxStateDir;

  dependencies = map (webapp: webapp.pkg) webapps;

  configFile = writeTextFile {
    name = "nginx.conf";
    text = ''
      error_log ${nginxStateDir}/logs/error.log;
      pid ${runtimeDir}/nginx.pid;

      events {
        worker_connections 190000;
      }

      http {
        ${stdenv.lib.concatMapStrings (dependency: ''
          upstream webapp${toString dependency.port} {
            server localhost:${toString dependency.port};
          }
        '') webapps}

        ${stdenv.lib.concatMapStrings (dependency: ''
          server {
            listen ${toString port};
            server_name ${dependency.dnsName};

            location / {
              proxy_pass  http://webapp${toString dependency.port};
            }
          }
        '') webapps}
      }
    '';
  };
}

The above Nix expression's funtion header defines, in addition to the package dependencies, process configuration parameters that make it possible to configure the TCP port that Nginx listens to (port 80 by default) and to which web applications it should forward requests based on their virtual host property.

In the body, these properties are used to generate a nginx.conf file that defines virtualhosts for each web application process. It forwards incoming requests to the appropriate web application instance. To connect to a web application instance, it uses the port number that the webapp instance configuration provides.

Similar to ordinary Nix expressions, Nix expressions for processes also need to be composed, by passing the appropriate function parameters. This can be done in a process composition expression that has the following structure:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
}:

let
  createSystemVInitScript = import ./create-sysvinit-script.nix {
    inherit (pkgs) stdenv writeTextFile daemon;
    inherit runtimeDir tmpDir;

    createCredentials = import ./create-credentials.nix {
      inherit (pkgs) stdenv;
    };

    initFunctions = import ./init-functions.nix {
      basePackages = [
        pkgs.coreutils
        pkgs.gnused
        pkgs.inetutils
        pkgs.gnugrep
        pkgs.sysvinit
      ];
      inherit (pkgs) stdenv;
      inherit runtimeDir;
    };
  };
in
rec {
  webapp = rec {
    port = 5000;
    dnsName = "webapp.local";

    pkg = import ./webapp.nix {
      inherit createSystemVInitScript port;
    };
  };

  nginxReverseProxy = rec {
    port = 80;

    pkg = import ./nginx-reverse-proxy.nix {
      inherit createSystemVInitScript;
      inherit stateDir logDir runtimeDir port;
      inherit (pkgs) stdenv writeTextFile nginx;
      webapps = [ webapp ];
    };
  };
}

The above expression (processes.nix) has the following structure:

  • The expression defines a function in which the function parameters allow common properties that apply to all processes to be configured: pkgs refers to the set of Nixpkgs that contains a big collection of free and open source packages, system refers to the system architecture to build packages for, and stateDir to the directory where processes should store their state (which is /var according to the LSB standard).

    The remaining parameters specify the runtime, log and temp directories that are typically sub directories in the state directory.
  • In the let block, we compose our createSystemVInitScript function using the relevant state directory parameters, base packages and utility functions.
  • In the body, we construct an attribute set in which every name represents a process name and every value an attribute set that contains process properties.
  • One reserved process property of a process attribute set is the pkg property that refers to a package providing the sysvinit script.
  • The remaining process properties can be freely chosen and can be consumed by any process that has a dependency on it.

    For example, the nginxReverseProxy service uses the port and dnsName properties of the webapp process to configure nginx to forward requests to the provided DNS host name (webapp.local) to the web application process listening on the specified TCP port (5000).

Using the above composition Nix expression for processes and the following command-line instruction, we can build the sysvinit script for the web application process:

$ nix-build processes.nix -A webapp

We can start the web application process by using the generated sysvinit script, as follows:

$ ./result/bin/etc/rc.d/init.d/webapp start

and stop it as follows:

$ ./result/bin/etc/rc.d/init.d/webapp stop

We can also build the nginx reverse proxy in a similar way, but to properly activate it, we must make sure that the webapp process is activated first.

To reliably manage a set of processes and activate them in the right order, we can also generate a Nix profile that contains all init.d scripts and rc.d symlinks for stopping and starting:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
}:

let
  buildSystemVInitEnv = import ./build-sysvinit-env.nix {
    inherit (pkgs) buildEnv;
  };
in
buildSystemVInitEnv {
  processes = import ./processes.nix {
    inherit pkgs system;
  };
}

The above expression imports the process composition shown earlier, and invokes the buildSystemVInitEnv to compose a Nix profile out of it. We can build this environment as follows:

$ nix-build profile.nix

Visually, the content of the Nix profile can presented as follows:


In the above diagram the ovals denote processes and the arrows denote process dependency relationships. The arrow indicates that the webapp process needs to be activated before the nginxReverseProxy.

We can use the system's rc script manage the starting and stopping the processes when runlevels are switched. Runlevel 0 and 6 make it possible to start the processes on startup and stopped on shutdown.

In addition to the system's rc script, we can also directly control the processes in a Nix profile -- I have created a utility script called: rcswitch that makes it possible to manually start all processes in a profile:

$ rcswitch ./result/etc/rc.d/rc3.d

we can also use the rcswitch command to do an upgrade from one set of processes to another:

$ rcswitch ./result/etc/rc.d/rc.3 ./oldresult/etc/rc.d/rc3.d

The above command checks which of the sysvinit scripts exist in both profiles and will only deactivate obsolete processes and activate new processes.

With the rcrunactivity command it is possible to run arbitrary activities on all processes in a profile. For example, the following command will show all statuses:

$ rcactivity status ./result/etc/rc.d/rc3.d

Deploying services as an unprivileged user


The process composition expression shown earlier is also a Nix function that takes various kinds of state properties as parameters.

By default, it has been configured in such a way that it facilitates production deployments. For example, it stores the state of all services in the global /var directory. Only the super user has the permissions to alter the structure of the global /var directory.

It is also possible to change these configuration parameters in such a way that it becomes possible as an unprivileged user to do process deployment.

For example, by changing the port number of the nginxReverseProxy process to a value higher than 1024, such as 8080 (an unprivileged user is not allowed to bind any services to ports below 1024), and changing the stateDir parameter to a directory in a user's home directory, we can deploy our web application service and Nginx reverse proxy as an unprivileged user:

$ nix-build processes.nix --argstr stateDir /home/sander/var \
  -A nginxReverseProxy

By overriding the stateDir parameter, the resulting Nginx process has been configured to store all state in /home/sander/var as opposed to the global /var that cannot be modified by an unprivileged user.

As a unprivileged user, I should be able to start the Nginx reverse proxy as follows:

$ ./result/etc/rc.d/init.d/nginx start

The above Nginx instance can be reached by opening: http://localhost:8080 in a web browser.

Creating multiple process instances


So far, we have only been deploying single instances of processes. For the Nginx reverse proxy example, it may also be desired to deploy multiple instances of the webapp process so that we can manage forwardings for multiple virtual domains.

We can adjust the Nix expression for the webapp to make it possible to create multiple process instances:

{createSystemVInitScript}:
{port, instanceSuffix ? ""}:

let
  webapp = (import ./webapp {}).package;
  instanceName = "webapp${instanceSuffix}";
in
createSystemVInitScript {
  name = instanceName;
  inherit instanceName;
  process = "${webapp}/lib/node_modules/webapp/app.js";
  processIsDaemon = false;
  runlevels = [ 3 4 5 ];
  environment = {
    PORT = port;
  };
}

The above Nix expression is a modified webapp build recipe that facilitates instantiation:

  • We have split the Nix expression into two nested functions. The first line: the outer function header defines all dependencies and configurable properties that apply to all services instances.
  • The inner function header allows all instance specific properties to be configured so that multiple instances can co-exist. An example of such a property is the port parameter -- only one service can bind to a specific TCP port. Configuring an instance to bind to different port allows two instances co-exist.

    The instanceSuffix parameter makes it possible to give each webapp process a unique name (e.g. by providing a numeric value).

    From the package name and instance suffix a unique instanceName is composed. Propagating the instanceName to the createSystemVInitScript function instructs the daemon command to create a unique PID file (not a PID file that corresponds to the executable name) for each daemon process so that multiple instances can be controlled independently.

Although this may sound as a very uncommon use case, it is also possible to change the Nix expression for the Nginx reverse proxy to support multiple instances.

Typically, for system services, such as web servers and database servers, it is very uncommon to run multiple instances at the same time. Despite the fact that it is uncommon, it is actually possible and quite useful for development and/or experimentation purposes:

{ createSystemVInitScript, stdenv, writeTextFile, nginx
, runtimeDir, stateDir, logDir}:

{port ? 80, webapps ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxStateDir = "${stateDir}/${instanceName}";
in
import ./nginx.nix {
  inherit createSystemVInitScript nginx instanceSuffix;
  stateDir = nginxStateDir;

  dependencies = map (webapp: webapp.pkg) webapps;

  configFile = writeTextFile {
    name = "nginx.conf";
    text = ''
      error_log ${nginxStateDir}/logs/error.log;
      pid ${runtimeDir}/${instanceName}.pid;

      events {
        worker_connections 190000;
      }

      http {
        ${stdenv.lib.concatMapStrings (dependency: ''
          upstream webapp${toString dependency.port} {
            server localhost:${toString dependency.port};
          }
        '') webapps}

        ${stdenv.lib.concatMapStrings (dependency: ''
          server {
            listen ${toString port};
            server_name ${dependency.dnsName};

            location / {
              proxy_pass  http://webapp${toString dependency.port};
            }
          }
        '') webapps}
      }
    '';
  };
}

The code fragment above shows a revised Nginx expression that supports instantiation:

  • Again, the Nix expression defines a nested function in which the outer function header refers to configuration properties for all services, whereas the inner function header refers to all conflicting parameters that need to be changed so that multiple instances can co-exist.
  • The port parameter allows the TCP port where Nginx bind to be configured. To have two instances co-existing they both need to bind to unreserved ports.
  • As with the previous example, the instanceSuffix parameter makes it possible to compose unique names for each Nginx instance. The instanceName variable that is composed from it, is used to create and configure a dedicate state directory, and a unique PID file that does not conflict with other Nginx instances.

With this new convention of nested functions for instantiatable services means that we have to compose these expressions twice. First, we need to pass all parameters that configure properties that apply to all service instances. This can be done in a Nix expression that has the following structure:

{ pkgs
, system
, stateDir
, logDir
, runtimeDir
, tmpDir
}:

let
  createSystemVInitScript = import ./create-sysvinit-script.nix {
    inherit (pkgs) stdenv writeTextFile daemon;
    inherit runtimeDir tmpDir;

    createCredentials = import ./create-credentials.nix {
      inherit (pkgs) stdenv;
    };

    initFunctions = import ./init-functions.nix {
      basePackages = [
        pkgs.coreutils
        pkgs.gnused
        pkgs.inetutils
        pkgs.gnugrep
        pkgs.sysvinit
      ];
      inherit (pkgs) stdenv;
      inherit runtimeDir;
    };
  };
in
{
  webapp = import ./webapp.nix {
    inherit createSystemVInitScript;
  };

  nginxReverseProxy = import ./nginx-reverse-proxy.nix {
    inherit createSystemVInitScript stateDir logDir runtimeDir;
    inherit (pkgs) stdenv writeTextFile nginx;
  };
}

The above Nix expression is something we could call a constructors expression (constructors.nix) that returns an attribute set in which each member refers to a function that allows us to compose a specific process instance.

By using the constructors expression shown above, we can create a processes composition expression that works with multiple instances:

{ pkgs ? import  { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/home/sbu"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
}:

let
  constructors = import ./constructors.nix {
    inherit pkgs system stateDir runtimeDir logDir tmpDir;
  };
in
rec {
  webapp1 = rec {
    port = 5000;
    dnsName = "webapp1.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "1";
    };
  };

  webapp2 = rec {
    port = 5001;
    dnsName = "webapp2.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "2";
    };
  };

  webapp3 = rec {
    port = 5002;
    dnsName = "webapp3.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "3";
    };
  };

  webapp4 = rec {
    port = 5003;
    dnsName = "webapp4.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "4";
    };
  };

  nginxReverseProxy = rec {
    port = 8080;

    pkg = constructors.nginxReverseProxy {
      webapps = [ webapp1 webapp2 webapp3 webapp4 ];
      inherit port;
    };
  };

  webapp5 = rec {
    port = 6002;
    dnsName = "webapp5.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "5";
    };
  };

  webapp6 = rec {
    port = 6003;
    dnsName = "webapp6.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "6";
    };
  };

  nginxReverseProxy2 = rec {
    port = 8081;

    pkg = constructors.nginxReverseProxy {
      webapps = [ webapp5 webapp6 ];
      inherit port;
      instanceSuffix = "2";
    };
  };
}

In the above expression, we import the constructors expression, as shown earlier. In the body, we construct multiple instances of these processes by using the constructors functions:

  • We compose six web application instances (webapp1, webapp2, ..., webapp6), each of them listening on a unique TCP port.
  • We compose two Nginx instances (nginxReverseProxy, nginxReverseProxy2). The first instance listens on TCP port 8080 and redirects the user to any of the first three web application processes, based on the virtual host name. The other Nginx instance listens on TCP port 8081, redirecting the user to the remaining web apps based on the virtual host name.

We can represent the above composition expression visually, as follows:


As with the previous examples, we can deploy each process instance individually:

$ nix-build processes.nix -A webapp3
$ ./result/etc/rc.d/init.d/webapp3 start

Or the the whole set as a Nix profile:

$ nix-build profile.nix
$ rcswitch ./result/etc/rc.d/rc3.d

Again, the rcswitch command will make sure that all processes are activated in the right order. This means that the webapp processes are activated first, followed by the Nginx reverse proxies.

Managing user accounts/state with Dysnomia


Most of the deployment of the processes can be automated in a stateless way -- Nix can deploy the executable as a Nix package and the sysvinit script can manage the lifecycle.

There is another concern, that we may also want to address. Typically, it is not recommended to run processes as a root user, such as essential system services, for security and safety reasons.

In order to run a process as an unprivileged user, an unprivileged group and user account must be creaed first by some means. Furthermore, when undeploying a process, we probably also want to remove the dedicated user and group.

User account management is a feature that the Nix package manager does not support -- Nix only works with files stored in the Nix store and cannot/will not (by design) change any files on the host system, such as /etc/passwd where the user accounts are stored.

I have created a deployment tool for state management (Dysnomia) that can be used for this purpose. It facilitates a plugin system that can manage deployment activities for components that Nix does not support: activating, deactivating, taking snapshots, restoring snapshots etc.

I have created a Dysnomia plugin called: sysvinit-script that can activate or deactivate a process by invoking a sysvinit script, and it can also create or discard users and groups from a declarative configuration file that is included with a sysvinit script.

We can revise a process Nix expression to start a process as an unprivileged user:

{createSystemVInitScript}:
{port, instanceSuffix ? ""}:

let
  webapp = (import ./webapp {}).package;
  instanceName = "webapp${instanceSuffix}";
in
createSystemVInitScript {
  name = instanceName;
  inherit instanceName;
  process = "${webapp}/lib/node_modules/webapp/app.js";
  processIsDaemon = false;
  runlevels = [ 3 4 5 ];
  environment = {
    PORT = port;
  };
  user = instanceName;

  credentials = {
    groups = {
      "${instanceName}" = {};
    };
    users = {
      "${instanceName}" = {
        group = instanceName;
        description = "Webapp";
      };
    };
  };
}

The above Nix expression is a revised webapp Nix expression that facilitates user switching:

  • The user parameter specifies that we want to run the process as an unprivileged user. Because this process can also be instantiated, we have to make sure that it gets a unique name. To facilitate that, we create a user with the same username as the instance name.
  • The credentials parameter refers to a specification that instructs the sysvinit-script Dysnomia plugin to create an unprivileged user and group on activation, and discard them on deactivation.

For production purposes (e.g. when we deploy processes as the root user), switching to unprivileged users is useful, but for development purposes, such as running a set of processes as an unprivileged user, we cannot switch users because we may not have the permissions to do so.

For convenience purposes, it is also possible to globally disable user switching, which we can do as follows:

{ pkgs
, stateDir
, logDir
, runtimeDir
, tmpDir
, forceDisableUserChange
}:

let
  createSystemVInitScript = import ./create-sysvinit-script.nix {
    inherit (pkgs) stdenv writeTextFile daemon;
    inherit runtimeDir tmpDir forceDisableUserChange;

    createCredentials = import ./create-credentials.nix {
      inherit (pkgs) stdenv;
    };

    initFunctions = import ./init-functions.nix {
      basePackages = [
        pkgs.coreutils
        pkgs.gnused
        pkgs.inetutils
        pkgs.gnugrep
        pkgs.sysvinit
      ];
      inherit (pkgs) stdenv;
      inherit runtimeDir;
    };
  };
in
{
  ...
}

In the above example, the forceDisableUserChange parameter can be used to globally disable user switching for all sysvinit scripts composed in the expression. It invokes a feature of the createSystemVInitScript to ignore any user settings that might have been propagated to it.

With the following command we can deploy a process that does not switch users, despite having user settings configured in the process Nix expressions:

$ nix-build processes.nix --arg forceDisableUserChange true

Distributed process deployment with Disnix


As explained earlier, I have adopted four common Nix package conventions and extended them suit the needs of process management.

This is not the only solution that I have implemented that builds on these four conventions -- the other solution is Disnix, that extends Nix's packaging principles to (distributed) service-oriented systems.

Disnix extends Nix expressions for ordinary packages with another category of dependencies: inter-dependencies that model dependencies on services that may have been deployed to remote machines in a network and require a network connection to work.

In Disnix, a service expression is a nested function in which the outer function header specificies all intra-dependencies (local dependencies, such as build tools and libraries), and the inner function header refers to inter-dependencies.

It is also possible to combine the concepts of process deployment described in this blog post with the service-oriented system concepts of Disnix, such as inter-dependencies -- the example with Nginx reverse proxies and web application processes can also be extended to work in a network of machines.

Besides deploying a set processes (that may have dependencies on each other) to a single machine, it is also possible to deploy the web application processes to different machines in the network than the machine where the Nginx reverse proxy is deployed to.

We can configure the reverse proxy in such a way that it will forward requests to the machine where the web application processes may have been deployed to.

{ createSystemVInitScript, stdenv, writeTextFile, nginx
, runtimeDir, stateDir, logDir
}:

{port ? 80, instanceSuffix ? ""}:

interDeps:

let
  instanceName = "nginx${instanceSuffix}";
  nginxStateDir = "${stateDir}/${instanceName}";
in
import ./nginx.nix {
  inherit createSystemVInitScript nginx instanceSuffix;
  stateDir = nginxStateDir;

  dependencies = map (dependencyName: 
    let
      dependency = builtins.getAttr dependencyName interDeps;
    in
    dependency.pkg
   ) dependencies;

  configFile = writeTextFile {
    name = "nginx.conf";
    text = ''
      error_log ${nginxStateDir}/logs/error.log;
      pid ${runtimeDir}/${instanceName}.pid;

      events {
        worker_connections 190000;
      }

      http {
        ${stdenv.lib.concatMapStrings (dependencyName:
          let
            dependency = builtins.getAttr dependencyName interDeps;
          in
          ''
            upstream webapp${toString dependency.port} {
              server ${dependency.target.properties.hostname}:${toString dependency.port};
            }
          '') (builtins.attrNames interDeps)}

        ${stdenv.lib.concatMapStrings (dependencyName:
          let
            dependency = builtins.getAttr dependencyName interDeps;
          in
          ''
            server {
              listen ${toString port};
              server_name ${dependency.dnsName};

              location / {
                proxy_pass  http://webapp${toString dependency.port};
              }
            }
          '') (builtins.attrNames interDeps)}
        }
    '';
  };
}

The above Nix expression is a revised Nginx configuration that also works with inter-dependencies:

  • The above Nix expression defines three nested functions. The purpose of the outermost function (the first line) is to configure all local dependencies that are common to all process instances. The middle function defines all process instance parameters that are potentially conflicting and need to be configurd with unique values so that multiple instances can co-exist. The third (inner-most) function refers to the inter-dependencies of this process: services that may reside on a different machine in the network and need to be reached with a network connection.
  • The inter-dependency function header (interDeps:) takes an arbitrary number of dependencies. These inter-dependencies refer to all web application process instances that the Nginx reverse proxy should redirect to.
  • In the body, we generate an nginx.conf that uses the inter-dependencies to set up the forwardings.

    Compared to the previous Nginx reverse proxy example, it will use the dependency.target.properties.hostname property that refers to the hostname of the machine where the web application process is deployed to instead of a forwarding to localhost. This makes it possible to connect to a web application process that may have been deployed to another machine.
  • The inter-dependencies are also passed to the dependencies function parameter of the Nginx function. This will ensure that if Nginx and a web application process are distributed to the same machine by Disnix, they will also get activated in the right order by the system's rc script on startup.

A with the previous examples, we need to compose the above Disnix expression multiple times. The composition of the constructors can be done in the constructors expression (as shown in the previous examples).

The processes' instance dependencies and inter-dependencies can be configured in the Disnix services model, that shares many similarities with process composition expression, shown earlier. As a matter of fact, a Disnix services model is a superset of it:

{ pkgs, distribution, invDistribution, system
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? true
}:

let
  constructors = import ./constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir;
    inherit forceDisableUserChange;
  };
in
rec {
  webapp = rec {
    name = "webapp";
    port = 5000;
    dnsName = "webapp.local";
    pkg = constructors.webapp {
      inherit port;
    };
    type = "sysvinit-script";
  };

  nginxReverseProxy = rec {
    name = "nginxReverseProxy";
    port = 8080;
    pkg = constructors.nginxReverseProxy {
      inherit port;
    };
    dependsOn = {
      inherit webapp;
    };
    type = "sysvinit-script";
  };
}

The above Disnix services model defines two services (representing processes) that have an inter-dependency on each other, as specified with the dependsOn parameter property of each service.

The sysvinit-script type property instructs Disnix to deploy the services as processes managed by a sysvinit script. In a Disnix-context, services have no specific form or meaning, and can basically represent anything. The type property is used to tell Disnix with what kind of service we are dealing with.

To properly configure remote dependencies we also need to know the target machines where we can deploy to and what their properties are. This is where we can use an infrastructure model for.

For example, a simple infrastructure model of two machines could be:

{
  test1.properties.hostname = "test1";
  test2.properties.hostname = "test2";
}

We must also tell Disnix to which target machines we want to distribute the services. This can be done in a distribution model:

{infrastructure}:

{
  webapp = [ infrastructure.test1 ];
  nginxReverseProxy = [ infrastructure.test2 ];
}

In the above distribution model we distribute the webapp process to the first target machine and the nginxReverseProxy to the second machine. Because both services are deployed to different machines in the network, the nginxReverseProxy uses a network link to forward incoming requests to the web application.

By running the following command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix will deploy the processes to the target machines defined in the distribution model.

The result is the following deployment architecture:


As may be noticed by looking at the above diagram, the process dependency manifest itself as a network link managed as an inter-dependency by Disnix.

Conclusion


In this blog post, I have described a Nix-based functional organization for managing processes based on four simple Nix packaging conventions. This approach offers the following benefits:

  • Integration with many process managers that manage the lifecycle of a process (in this particular blog post: using sysvinit scripts).
  • The ability to relocate state to other locations, which is useful to facilitate unprivileged user deployments.
  • The ability to create multiple instances of processes, by making conflicting properties configurable.
  • Disabling user switching, which is useful to facilitate unprivileged user deployments.
  • It can be used on any Linux system that has the Nix package manager installed. It can be used on NixOS, but NixOS is not a requirement.

Related work


Integrating process management with Nix package deployment is not a new subject, nor something that is done for the first time.

Many years ago, there was the "trace" Subversion repository (that was named after the research project TraCE: Transparent Configuration Environments funded by NWO/Jacquard), the repository in which all Nix-related development was done before the transition was made to Github (before 2012).

In the trace repository, there was also a services project that could be used to generate sysvinit-like scripts that could be used on any Linux distribution, and several non-Linux systems as well, such as FreeBSD.

Eelco Dolstra's PhD thesis Chapter 9 describes a distributed deployment prototype that extends the init script approach to networks of machines. The prototype facilitates the distribution of init scripts to remote machines and also facilitates heterogeneous operating systems deployment -- an init script can be built for multiple operating systems, such as Linux and FreeBSD.

Although the prototype shares some concepts with Disnix and the process management described in this blog post support, it also lacks many features -- it has no notion of process dependencies, inter-dependencies, the ability to separate services/processes and infrastructure, and to specify distribution mappings between process and target machines including the deployment of redundant instances.

Originally, NixOS used to work with the generated scripts from services sub project in the trace repository, but quite quickly adopted Upstart as its init system. Gradually, the init scripts and upstart jobs got integrated, and eventually replaced by Upstart jobs completely. As a result, it was no longer possible to run services independently of NixOS.

NixOS is a Linux distribution whose static aspects are fully managed by Nix, including user packages, configuration files, the Linux kernel, and kernel modules. NixOS machine configurations are deployed from a single declarative specification.

Although NixOS is an extension of Nix deployment principles to machine-level deployment, a major conceptual difference between NixOS and the Nix packages repository is that NixOS generates a big data structure made out of all potential configuration options that NixOS provides. It uses this (very big) generated data structure as an input for an activation script that will initialize all dynamic system parts, such as populating the state directories (e.g. /var) and loading systemd jobs.

In early incarnations of NixOS, the organization of the repository was quite monolithic -- there was one NixOS file that defines all configuration options for all possible system configuration aspetcts, one file that defines the all the system user accounts, one file that defines all global configuration files in /etc. When it was desired to add a new system service, all these global configuration files need to be modified.

Some time later (mid 2009), the NixOS module system was introduced that makes it possible to isolate all related configuration aspects of, for example, a system service into a separate module. Despite the fact that configuration aspects are isolated, the NixOS module system has the ability (through a concept called fixed points) refer to properties of the entire configuration. The NixOS module system merges all configuration aspects of all modules into a single configuration data structure.

The NixOS module system is quite powerful. In many ways, it is much more powerful than the process management approach described in this blog post. The NixOS module system allows you to refer, override and adjust any system configuration aspect in any module.

For example, a system service, such as the OpenSSH server, can automatically configure the firewall module in such a way that it will open the SSH port (port 22). With the functional approach described in this blog post, everything has to be made explicit and must be propagated through function arguments. This is probably more memory efficient, but a lot less flexible, and more tedious to write.

There are also certain things that NixOS and the NixOS module system cannot do. For example, with NixOS, it is not possible to create multiple instances of system services which the process management conventions described in this blog post can.

NixOS has another drawback -- evaluating system configurations requires all possible NixOS configuration options to be evaluated. There are actually quite a few of of them.

As a result, evaluating a NixOS configuration is quite slow and memory consuming. For single systems, this is typically not a big problem, but for networked NixOS/NixOps configurations, this may be a problem -- for example, I have an old laptop with 4 GiB of RAM that can no longer deploy a test network of three VirtualBox machines using the latest stable NixOS release (19.09), because the Nix evaluator runs out of memory.

Furthermore, NixOS system services can only be used when you install NixOS as your system's software distribution. It is currently not possible to install Nix on a conventional Linux distribution and use NixOS' system services (systemd services) independently of the entire operating system.

The lack of being able to deploy system services independently is not a limitation of the NixOS module system -- there is also an external project called nix-darwin that uses the NixOS module system to generate launchd services, that can be run on top of macOS.

The idea to have a separate function header for creating instances of processes is also not entirely new -- a couple of years ago I have revised the internal deployment model of Disnix to support multiple container instances.

In a Disnix-context, containers can represent anything that can host multiple service instances, such as a process manager, application container, or database management system. I was already using the convention to have a separate function header that makes it possible to create multiple instances of services. In this blog post, I have extended this formalism specifically for managing processes.

Discussion


In this blog post, I have picked sysvinit scripts for process management. The reason why I have picked an old-fashioned solution is not that I consider this to be the best process management facility, or that systemd, the init system that NixOS uses, is a bad solution.

My first reason to choose sysvinit scripts is because it is more universally supported than systemd.

The second reason is that I want to emphasize the value that a functional organization can provide, independent of the process management solution.

Using sysvinit scripts for managing process have all kinds of drawbacks and IMO there is a legitimate reason why alternatives exist, such as systemd (but also other solutions).

For example, controlling daemonized processes is difficult and fragile -- the convention that daemons should follow is to create PID files, but it is not a hard guarantee daemons will comply and that nothing will go wrong. As a result, a daemonized process may escape control of the process manager. systemd, for example, puts all processes that it needs to control in a cgroup and as a result, cannot escape systemd's control.

Furthermore, you may also want to use the more advanced features of the Linux kernel, such as namespaces and cgroups to prevent process from interfering with other processes on the system and the available system resources that a system provides. Namespaces and cgroups are a first class feature in systemd.

If you do not like sysvinit scripts: the functional organization described in this blog post is actually not specifically designed for sysvinit -- it is actually process manager agnostic. I have also implemented a function called: createSystemdService that makes it possible to construct systemd services.

The following Nix expression composes a systemd service for the web application process, shown earlier:

{stdenv, createSystemdService}:
{port, instanceSuffix ? ""}:

let
  webapp = (import ./webapp {}).package;
  instanceName = "webapp${instanceSuffix}";
in
createSystemdService {
  name = instanceName;

  environment = {
    PORT = port;
  };

  Unit = {
    Description = "Example web application";
    Documentation = http://example.com;
  };

  Service = {
    ExecStart = "${webapp}/lib/node_modules/webapp/app.js";
  };
}

I also tried supervisord -- we can write the following Nix expression to compose a supervisord program configuration file of the web application process:

{stdenv, createSupervisordProgram}:
{port, instanceSuffix ? ""}:

let
  webapp = (import ./webapp {}).package;
  instanceName = "webapp${instanceSuffix}";
in
createSupervisordProgram {
  name = instanceName;

  command = "${webapp}/lib/node_modules/webapp/app.js";
  environment = {
    PORT = port;
  };
}

Switching process managers retains our ability to benefit from the facilities that the functional configuration framework provides -- we can use it manage process dependencies, configure state directories, disable user management and when we use Disnix: manage inter-dependencies and bind it to services that are not processes.

Despite the fact that sysvinit scripts are primitive, there are also two advantages that I see over more "modern alternatives", such as systemd:

  • Systemd and supervisord require the presence of a deamon that manages processes (i.e. the systemd and supervisord deamons). sysvinit scripts are self-contained from a process management perspective -- the Nix package manager provides the package dependencies that the sysvinit scripts needs (e.g. basic shell utilities, sysvinit commands), but other than that, it does not require anything else.
  • We can also easily deploy sysvinit scripts to any Linux distribution that has the Nix package manager installed. There are no additional requirements. Systemd services, for example, require the presence of the systemd daemon. Furthermore, we also have to interfere with the host system's systemd service that may also be used to manage essential system services.
  • We can also easily use sysvinit scripts to deploy processes as an unprivileged user to a machine that has a single-user Nix installation -- the sysvinit script infrastructure does not require any tools or daemons that require super user privileges.

Acknowledgements


I have borrowed the init-functions script from the LFS Bootscripts package of the Linux from Scratch project to get an implementation of the utility functions that the LSB standard describes.

Availability and future work


The functionality described in this blog post is still a work in progress and only a first milestone in a bigger objective.

The sysvinit functionality resides in an experimental branch of the Nix low-level experiments repository. The sysvinit-script Dysnomia plugin resides in an experimental branch of the Dysnomia repository.

In the next blog post, I will introduce another interesting concept that we can integrate into the functional process management framework.

Tuesday, October 8, 2019

On motivation and purpose

In this blog post, I will discuss an important and recurring non-technical subject that is common in our field of expertise: motivation. Pretty much everybody in the industry that I know (including myself) have motivational problems once in a while. There are a variety of reasons why people get motivated or demotivated. In my experience, lack of motivation is one of the more important reasons why people quit and change jobs.

I will elaborate about one of the ingredients that is important to me: purpose.

A common misunderstanding: technology


What I have noticed is that quite a few people think that software developers are generally motivated by technology -- for instance, they will be motivated if they can use the latest computer models and the latest and greatest software technologies to develop software. At the time of writing this blog post, I see many companies that have vacancies advertising with technologies, such as: Node.js, Angular, React, JavaScript, Python, Go, Docker, Kubernetes etc.

While it is true that my skills are stronger with one particular class of technology than another, and that I actually do have preferences, such as which kind of tool is best for deployment, technology alone is not something that gives me motivation.

Instead: development is done for a purpose. Typically, we develop software systems to accomplish certain goals for a certain audience. As developers, we want to reach these goals and offer good quality -- technology is in service of reaching these goals. This sometimes also means that I have to work with technology that is not my primary choice or something I am not familiar with.

In addition to using technology, we may have to do things that are not technical at all. Typically, developing software systems is team work. To work effectively as a team, communication is also very important. For example:

  • You may need to write good documentation so that ideas, requirements and other technical considerations are clear among team members. For this you also need proper writing skills and mastery of the English language, if your native language is different.
  • You may need to give trainings or knowledge transfer presentations to end users or fellow team members for which you need good presentation skills.

Examples


I have a lot of interesting anecdotes of unorthodox choices that I made in the past, that I can rationalize by explaining the purpose. Some of them are:

  • I did a PhD, which has number of implications for your career -- as a PhD student you are (fortunately!) employed in the Netherlands, so you will get a salary and additional benefits such as a pension. The biggest disadvantage is that you only get a temporary employment contract and your salary is much lower that an industry job in the same field. Without a permanent employment contract, for example, it is very difficult to get a mortgage to buy a house.

    Because of this "disadvantage", quite a few people think that the PhD degree is the reason that motivated me (because supposedly it provides you better job perspectives, which is generally not the case in software technology).

    My motivation was actually much different: the subject of my PhD research was software deployment. Before I started my PhD research, I already knew how difficult it was to construct software systems from source code, how to package components and how to deploy service-oriented systems that are distributed and facilitate technology diversity. To have the possibility to dedicate yourself to a subject for a few years and construct tools to automate and improve such deployment processes was the real reason why I wanted to do this.

    (As a sidenote: although constructing tools was my primary motivation and the main objective of research in software engineering, I also struggled a bit with motivation on a few occasions. In a blog post that I wrote in my final year, I explained what got me demotivated).

    I learned a lot of things while I was a PhD student. In addition to technology, I also considerably improved my writing skills, presentations skills, I started this blog, and I did quite a lot of traveling alone, which gives you all kinds of interesting experiences. All of these learning experiences were in service of reaching my main goal.
  • During my PhD, I was also a visiting researcher at Philips Healthcare. At Philips Healthcare, I was applying my research and tooling to medical systems. Most of the technology stack used at Philips were Microsoft technologies: Windows as an operating system, Internet Information Services as web server, SQL server as DBMS, .NET/C# as an implementation language, and so on.

    I was (and still am) a Linux/Free software/Open source person. As a result, I was always avoiding these technologies as much possible -- what I basically did not like about them is that were all proprietary and strongly tied to the Windows operating system. Furthermore, I did not like Internet Information Services because of its bad security reputation.

    At the same time, the deployment tool I was working on (Disnix), was also designed to facilitate technology diversity, including technologies that I did not like per se. As part of my work at Philips, I managed to automate the build processes of C#/.NET projects with the Nix package manager and I created Dysnomia plugins so that services implemented with Microsoft technology could be deployed with Disnix.

    I also learned quite a few things about the .NET packaging internals, such as strong names and the global assembly cache. Because I wanted to facilitate technology diversity, I was motivated to learn these concepts.
  • At Conference Compass, I developed a variety of Nix functions to build mobile applications (native Android, native iOS and Titanium). I was highly motivated for native Android, because of two reasons: I have an Android device myself and I consider it quite valuable to automate the build processes of complex Android apps including company applications.

    The iOS and Titanium build functions were less interesting. In particular, what I disliked the most about iOS is that I do not have such a device myself, and I really do not like the fact that app delivery to iOS devices (e.g. iPhone, iPad) rely on one single distribution channel: Apple. It is not even possible to deploy an app that you have developed yourself for a device that you own, without obtaining a certificate and provisioning profile from Apple!

    Still, I considered a conference app to be quite valuable. Our audience uses both iOS and Android devices. This means that iOS cannot be avoided, because that would disqualify a significant chunk of our audience.

    Furthermore, I also wanted to support technology diversity again -- to be able to build apps with the Nix package manager for any mobile platform is useful. The domain and technology diversity of Nix motivated me to also learn about these areas that I initially did not find interesting.
  • For my Nix and Disnix related work, I have developed several small utility libraries, e.g. for concurrency and data exchange, and I explored underlying concepts, such as layered build function abstractions. The primary reason to do these things is not because I was directly interested in concepts, but they significantly contribute to quality improvement of the deployment tools -- they make the infrastructure faster, more robust and easier to maintain.

What I typically do


To get motivated, I basically need to know my purpose, and then define and align goals. This is typically easier said than done and it requires quite a bit of exploration.

Basically, I have adopted the following habits whenever I am new to some organization:

  • Learn about the company: IMO is important to know (at least from a high level perspective) the kinds of products and/or services a company offers, because the work you do is primarily focused on improving business value. For example, when I had the feeling that I learned enough about the Mendix product and service, I wrote a small article about it on my blog and I am grateful to Mendix that I am allowed to do this.
  • Learn about the domain: in addition to the company product and/or service, it is also important to know in what domain it is active. You will get a better understanding about the customers, what they want and what kind of challenges you might face in reaching them.

    For example, at Philips I learned a lot about medical regulations, at Conference Compass I learned the true value of having digital versions of traditional paper programs (that cannot change after they have been printed) and at Mendix it is interesting to continuously think about what kinds of value can low-code development offer (in terms of speed, quality and target audience, such as non-technical developers).
  • Learn about your team and their contributions. Typically, in large organizations with big/complex services or products, you typically work on a particular component (or stack of components), not the product as a whole. For me, it was also interesting to see what my team's contribution is and what value it offers to end users.

    To fully answer that question, I wrote a simple tutorial page that explains how end users use our team's product -- it helped a lot to understand what my changes will contribute to and I noticed that it has been particularly helpful for on-boarding new team members.
  • Define and align goals. Typically, after learning about the company and the teams' products and/or services, you will probably see opportunities. For example, you may notice that there is something that can be improved with technology that you are familiar with. It is good to remember them and work with the team to address them. Be proactive.
  • Keep learning. In addition to opportunities, you may also probably experience confusion or realize that there are things that you do not know yet. I always try to allocate time to learn new things, both technical (e.g. new programming languages, frameworks, technologies) and non-technical (e.g. communication, the domain). From my experience, in software engineering there is only one constant and that constant is change.
  • Try to be pragmatic. This is an important personal lesson for me: since you are working in a team and every person is different (different opinions and priorities), you must sometimes accept that you can not always (directly) accomplish everything you want and that things will work out the way you have intended.

What organizations can do


In addition to the things you can do as an individual, you also need support from the organization. I highly value the following traits:

  • Transparency. It is very useful to inform teams about the impact of the work they do: both positive and negative. I have seen in the past that it is quite tempting, for example, that after a failure things get covered up. I personally believe it is important for developers to know about the strengths and weaknesses of the software they work on, so that they can make meaningful contributions to make something a success.
  • Opportunities to get in touch with end users and other relevant stakeholders. In many organizations, developers are rarely in touch with end users and that is typically for a good reason: they should not get distracted from their work.

    Although I consider preventing distraction a good thing, I personally believe it would not be a bad thing to get in touch with end users sometimes: it gives a developer direct insights in how well the product works and what a customer needs or struggles with.

    At Mendix, we sometimes have clients that will visit us to explain what they do with Mendix and what kinds of challenges they face -- everybody in the company is invited and has the opportunity to ask questions. This will not happen on a very frequent basis, but having these sessions once in a while is something I consider to be very valuable.
  • Offer time to learn and explore. Typically to reach goals, quite frequently developers need to expand their skill set by e.g. by learning new technologies or simply explore the possibilities. They should be offered the time to do this.
  • Taking technical debt seriously. Technical debt, a phenomenon that hinders evolution of a software system, by postponing/skipping certain kind of work (i.e. taking shortcuts) that should have been done (e.g. testing, documentation, refactoring) should also be taken seriously.

    When a system has a lot of technical debt, making changes and improving quality can be enormously (and unnecessarily) difficult and time consuming. In extreme cases, even a subtle change takes too much time. As a result, it becomes quite easy to lose track of the original purpose and easily causes developers lose their motivation.
  • Taking feedback from developers seriously. Developers typically raise concerns (such as quality issues) that may not always look urgent -- as a result, it is very tempting for organizations to always give priority to new features than quality improvements. This may sometimes cause quality to degrate significantly over time.

    Whilst developers are typically not against developing new features, they are very concerned about poor quality and high technical debt. If the latter grows out of hand too much, it is quite easy for developers to lose track of the original purpose of the product or service and lose motivation.

Conclusion


In this blog post, I have shared my experiences with motivation in relation to purpose. Although it may sound conceptually simple, learning about the purpose of a product and service and aligning goals is actually quite difficult -- it requires you to learn about an organization, a product or service, the people in the company, your team, and about yourself. It typically is quite a journey with interesting and, sometimes, a few non-interesting steps.

Finally, purpose is not the only factor that motivates or demotivates me as a developer, but it is an important one.

Sunday, September 8, 2019

Some personal conventions for implementing domain models in C/C++ applications

I have written two data exchange libraries -- not so long ago, I have created libnixxml that can be used to work with XML data following the so-called NixXML convention, which is useful to facilitate integration with tools in the Nix ecosystem, while still having meaningful XML data formats that can be used independently.

Many years ago, I wrote libiff that makes it possible to parse Interchange File Format (IFF) files that use so-called "chunks" to structure and organize binary files.

The goal of these two data exchange libraries is not to only facilitate data interchange -- in addition, they have also been designed to assist the user in constructing domain models in the C (or C++) programming language.

With the term: domain model, I am basically referring to an organization of data structures, e.g. structs, classes and abstract data structures, such as hash tables, lists, maps and trees, that have a strong connection to a (well-defined) problem domain (expressible in a natural language). Deriving such an organization is an important ingredient in object oriented design, but not restricted to object orientation only.

In addition to implementing a domain model in a C or C++ application with an understandable mapping to the problem domain, I also typically want the implementation to provide one or more of the following non-functional and cross-functional properties:

  • The data integrity should be maintained as much as possible. It should be difficult, for example, to mutate properties of an object in such a way that they have representations that cannot be interpreted by the program. For example, if an object requires the presence of another object, then it should be difficult to construct objects that have dangling references.
  • We may want to read a representation of the domain model from an external source, such as a file, and construct a domain model from it. Because external sources cannot be trusted, we also want this process to be safe.
  • In addition to reading, we may also want to write a data representation of a domain model to an external source, such as a file or the standard output. We also want to write a file in such a way that it can be safely consumed again.
  • We may want want to check the integrity of the data model and have decent error reporting in case an inconsistency was found.
  • It should not take too much effort in maintaining and adjusting the implementation of a domain model.

To implement the above properties, I have slowly adopted a number of conventions that I will describe in this blog post.

In addition to C and C++, these conventions also have some relevance to other programming languages, although most "higher level" languages, such as Java and Python, already have many facilities in their standard APIs to implement the above properties, whereas in C and C++ this is mostly the implementer's responsibility and requires a developer to think more consciously.

Constructing objects


When constructing objects, there are two concerns that stand out for me the most -- first, when constructing an object, I want to make sure that they never have inconsistent properties. To facilitate that, the solution is probably obvious -- by creating a constructor function that takes all mandatory properties as parameters that uses these parameters to configure the object members accordingly.

My second concern is memory allocation -- in C and C++ objects (instances of a struct or class) can be allocated both on the stack or the heap. Each approach has their own advantages and disadvantages.

For example, working with stack memory is generally faster and data gets automatically discarded when the scope of a block terminates. A disadvantage is that sizes of the data structures must be known at compile time, and some platforms have a limit of how much data can be allocated on the stack.

Heap memory, can be dynamically allocated (e.g. the size of memory to be allocated does not need to know at compile time), but is slower to allocate, and it is the implementer's responsibility to free up the allocated data when it is no longer needed.

What I generally do is for simple data structures (that do not contain too many fields, or members referring to data structures that require heap memory), I provide an initializer function that can be used on an object that is allocated on the stack to initialize its members.

Whenever a data structure is more complex, i.e. when it has many fields or members that require heap memory, I will create a constructor function that allocates the right amount of heap memory in addition to initializing its members.

Destructing objects


When an object is constructed, it may typically have resources allocated that need to be freed up. An obvious resource is heap memory -- as described earlier, when heap memory was previously allocated (e.g. for the data structure itself, but also for some of its members), it, at a later point in time, also needs to be freed up. Not freeing up memory causes memory leaks eventually causing a program to run out of memory.

Another kind of resource -- that is IMO often overlooked -- are file descriptors. Whenever a file has been opened, it also needs to be explicitly closed to allow the operating system to assign it to another process. Some operating systems have a very limited amount of file descriptors that can be allocated resulting in problems if a program is running for longer periods of time.

To maintain consistency and keep an API understandable, I will always create a destructor function when a constructor function exists -- in some cases (in particular with objects that have no members that require heap memory), it is very tempting to just tell (or simply expect) the API consumer to call free() explicitly (because that is essentially the only thing that is required). To avoid confusion, I always define a destructor explicitly.

Parsing an object from external source


As suggested in the introduction, I (quite frequently) do not only want to construct an object from memory, but I want it to be constructed from a definition originating from an external resource, such as a file on disk. As a rule of thumb (for integrity and security reasons), external input cannot be trusted -- as a result, it needs to be reliably parsed and checked, for which the data interchange libraries I developed provide a solution.

There is a common pitfall that I have encountered quite frequently in the process of constructing an object -- I typically assign default values to primitive members (e.g. integers) and NULL pointers to members that have a pointer type. The most important reason why I want all member fields to be initialized is to prevent them from staying garbage leading to unpredictable results if they are used by accident. In C, C++ when using malloc() or new() memory is allocated, but not automatically cleared, to, for example, zero bytes.

By using NULL pointers, I can later check whether all mandatory properties have been set and raise an error if this is not case.

A really tricky case with NULL pointers are pointers referring to data structures that encapsulate data collections, such as arrays, lists or tables. In some cases, it is fine that the input file does not define any data elements. The result should be an empty data collection. However, following the strategy to assign a NULL pointer by default introduces a problem -- in locations where a data collection is expected, the program will typically crash caused by a segmentation fault, because the program attempts to dereference a NULL pointer.

When assigning NULL pointers, I will always ask myself the question what kind of meaning NULL has. If I cannot provide an explanation, then I will make sure that a value is initialized with some other value than NULL. In practice, this means when members are referring to data collections, I will construct an empty data collection instead of assigning a NULL pointer. For data elements (e.g. strings), assigning NULL pointers to check whether they have been set is fine.

Finally, I also have the habit to make it possible to read from any file descriptor. In UNIX and UNIX-like operating systems everything is a file, and a generic file descriptor interface makes it possible to consume data from any resource that exposes itself as a file, such as a network connection.

Serializing/exporting objects to an external resource


In addition to retrieving and parsing objects from external resources, it is often desirable to do the opposite as well: serializing/exporting objects to an external resource, such as a file on disk.

Data that is consumed from an external source cannot be trusted, but if the output is not generated properly, the output most likely cannot be trusted either and reliably be consumed again.

For example, when generating JSON data with strings, a string that contains a double quote: " needs to be properly escaped, which is very easily overlooked when using basic string manipulation operations. The data exchange libraries provide convenience functions to reliably print and escape values.

We may also want to pretty print the output, e.g. adding indention, so that it can also be read by humans. Typically I add facilities for pretty printing to the functions that generate output.

Similar to the assigning a NULL pointer "dilemma" for empty data collections, we also face the dilemma to print an empty data collection or no elements at all. Typically, I would pick the option to print an empty data structure instead of omitting it, but I have no hard requirements for either of these choices.

As with reading and parsing data from external sources, I also typically facilitate writing to file descriptors so that it is possible to write data to any kind of file, such as the standard output or a remote network resource.

Checking the integrity of objects


Generally, I use constructor functions or mutation functions to prevent breaking the integrity of objects, but it is not always fully possible to fully avoid problems, for example, while parsing data from external resources. In such scenarios, I also typically implement functionality that checks the integrity of an object.

One of the primary responsibilities of a checking function is to examine the validity of all data elements. For example, to check whether a mandatory field has been set (i.e. it is not NULL) and whether they have the right format.

In addition to checking validity of all data elements, I typically also recursively traverse the data structure members and check their validity. When an error has been encountered in an abstract data structure, I will typically indicate which element (e.g. the array index number, or hash table key) is the problem, so that it can be more easily diagnosed by the end user.

When all fields of an object have been considered valid, I may also want to check whether the object's relationships are valid. For example, an object should not have a dangling reference to a non-existent object, that could result in segmentation faults caused by dereferencing NULL pointers.

Instead of invoking a check function explicitly, it is also possible to make a check function an integral part of a parse or constructor function, but I prefer to keep a check function separate, for the following reasons:

  • We do not need to perform a check if we are certain that the operations that we carry out, changed in any data in the wrong way.
  • We may want to perform checks in various stages of program, such as after parsing, after construction or after certain critical updates.

Comparing objects


Another important concern is the ability to compare objects for equality and/or ordering. I also typically implement a comparison function for each data structure.

In theory, recursively comparing a structure of objects could become quite expensive, especially if there are many nested data structures with many data elements. As an optimization, it may be possible to maintain integrity hashes and only check values if these hashes change, but so far I have never had run into any situations in which performance is really a bottleneck.

Naming


When developing data structures and function, I also try to follow a consistent naming convention for data structures and functions. For example, I may want to use: create_<ds_name> for a function creating a data structure and delete_<ds_name> for a function deleting a data structure.

Furthermore, I try to give meaningful names to data structures that have a correspondence with the problem domain.

Modularity


Although not mandatory in C or C++, I also typically try to use one header and one implementation file per data structure and functions that are related to it -- similarly, I follow the same convention for abstract data structure usages.

My main motivation to do this is to keep things understandable -- a module with many responsibilities is typically more difficult to maintain and harder to read.

Furthermore, I try to make all functions that have no relevance to be exposed publicly static.

Discussion


The conventions described in this blog post work particularly well for my own projects -- I have been able to considerably improve the reliability and maintainability of my programs and the error reporting.

However, they are not guaranteed to be the "silver bullet" for all coding problems. Some limitations that I see are:

  • Finding a well-defined description of a domain and implementing a corresponding domain model sounds conceptually simple, but is typically much harder than expected. It typically takes me several iterations to get it (mostly) right.
  • The conventions only makes sense for programs/code areas that are primarily data driven. Workflows that are primarily computationally driven may often have different kinds of requirements, e.g. for performance reasons, and most likely require a different organization.
  • The conventions are not there to facilitate high performance (but also do not always necessarily work against it). For example, splitting up data structures and corresponding functions into modules, makes it impossible to apply certain compiler optimizations that are possible if code would not have been separated into sepearte compilation units. Integrity, security, and maintenance are properties I consider to have higher priority over performance.