Saturday, February 15, 2020

A declarative process manager-agnostic deployment framework based on Nix tooling

In a previous blog post written two months ago, I have introduced a new experimental Nix-based process framework, that provides the following features:

  • It uses the Nix expression language for configuring running process instances, including their dependencies. The configuration process is based on only a few simple concepts: function definitions to define constructors that generate process manager configurations, function invocations to compose running process instances, and Nix profiles to make collections of process configurations accessible from a single location.
  • The Nix package manager delivers all packages and configuration files and isolates them in the Nix store, so that they never conflict with other running processes and packages.
  • It identifies process dependencies, so that a process manager can ensure that processes are activated and deactivated in the right order.
  • The ability to deploy multiple instances of the same process, by making conflicting resources configurable.
  • Deploying processes/services as an unprivileged user.
  • Advanced concepts and features, such as namespaces and cgroups, are not required.

Another objective of the framework is that it should work with a variety of process managers on a variety of operating systems.

In my previous blog post, I was deliberately using sysvinit scripts (also known as LSB Init compliant scripts) to manage the lifecycle of running processes as a starting point, because they are universally supported on Linux and self contained -- sysvinit scripts only require the right packages installed, but they do not rely on external programs that manage the processes' life-cycle. Moreover, sysvinit scripts can also be conveniently used as an unprivileged user.

I have also developed a Nix function that can be used to more conveniently generate sysvinit scripts. Traditionally, these scripts are written by hand and basically require that the implementer writes the same boilerplate code over and over again, such as the activities that start and stop the process.

The sysvinit script generator function can also be used to directly specify the implementation of all activities that manage the life-cycle of a process, such as:

{createSystemVInitScript, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createSystemVInitScript {
  name = instanceName;
  description = "Nginx";
  activities = {
    start = ''
      mkdir -p ${nginxLogDir}
      log_info_msg "Starting Nginx..."
      loadproc ${nginx}/bin/nginx -c ${configFile} -p ${stateDir}
      evaluate_retval
    '';
    stop = ''
      log_info_msg "Stopping Nginx..."
      killproc ${nginx}/bin/nginx
      evaluate_retval
    '';
    reload = ''
      log_info_msg "Reloading Nginx..."
      killproc ${nginx}/bin/nginx -HUP
      evaluate_retval
    '';
    restart = ''
      $0 stop
      sleep 1
      $0 start
    '';
    status = "statusproc ${nginx}/bin/nginx";
  };
  runlevels = [ 3 4 5 ];

  inherit dependencies instanceName;
}

In the above Nix expression, we specify five activities to manage the life-cycle of Nginx, a free/open source web server:

  • The start activity initializes the state of Nginx and starts the process (as a daemon that runs in the background).
  • stop stops the Nginx daemon.
  • reload instructs Nginx to reload its configuration
  • restart restarts the process
  • status shows whether the process is running or not.

Besides directly implementing activities, the Nix function invocation shown above can also be used on a much higher level -- typically, sysvinit scripts follow the same conventions. Nearly all sysvinit scripts implement the activities described above to manage the life-cycle of a process, and these typically need to be re-implemented over and over again.

We can also generate the implementations of these activities automatically from a high level specification, such as:

{createSystemVInitScript, nginx,  stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createSystemVInitScript {
  name = instanceName;
  description = "Nginx";
  initialize = ''
    mkdir -p ${nginxLogDir}
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-c" configFile "-p" stateDir ];
  runlevels = [ 3 4 5 ];

  inherit dependencies instanceName;
}

You could basically say that the above createSystemVInitScript function invocation makes the configuration process of a sysvinit script "more declarative" -- you do not need to specify the activities that need to be executed to manage processes, but instead, you specify the relevant characteristics of a running process.

From this high level specification, the implementations for all required activities will be derived, using conventions that are commonly used to write sysvinit scripts.

After completing the initial version of the process management framework that works with sysvinit scripts, I have also been investigating other process managers. I discovered that their configuration processes have many things in common with the sysvinit approach. As a result, I have decided to explore these declarative deployment concepts a bit further.

In this blog post, I will describe a declarative process manager-agnostic deployment approach that we can integrate into the experimental Nix-based process management framework.

Writing declarative deployment specifications for managed running processes


As explained in the introduction, I have also been experimenting with other process managers than sysvinit. For example, instead of generating a sysvinit script that manages the life-cycle of a process, such as the Nginx server, we can also generate a supervisord configuration file to define Nginx as a program that can be managed with supervisord:

{createSupervisordProgram, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createSupervisordProgram {
  name = instanceName;
  command = "mkdir -p ${nginxLogDir}; "+
    "${nginx}/bin/nginx -c ${configFile} -p ${stateDir}";
  inherit dependencies;
}

Invoking the above function will generate a supervisord program configuration file, instead of a sysvinit script.

With the following Nix expression, we can generate a systemd unit file so that Nginx's life-cycle can be managed by systemd:

{createSystemdService, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createSystemdService {
  name = instanceName;
  Unit = {
    Description = "Nginx";
  };
  Service = {
    ExecStartPre = "+mkdir -p ${nginxLogDir}";
    ExecStart = "${nginx}/bin/nginx -c ${configFile} -p ${stateDir}";
    Type = "simple";
  };

  inherit dependencies;
}

What you may probably notice when comparing the above two Nix expressions with the last sysvinit example (that captures process characteristics instead of activities), is that they all contain very similar properties. Their main difference is a slightly different organization and naming convention, because each abstraction function is tailored towards the configuration conventions that each target process manager uses.

As discussed in my previous blog post about declarative programming and deployment, declarativity is a spectrum -- the above specifications are (somewhat) declarative because they do not capture the activities to manage the life-cycle of the process (the how). Instead, they specify what process we want to run. The process manager derives and executes all activities to bring that process in a running state.

sysvinit scripts themselves are not declarative, because they specify all activities (i.e. shell commands) that need to be executed to accomplish that goal. supervisord configurations and systemd services configuration files are (somewhat) declarative, because they capture process characteristics -- the process manager executes derives all required activities to bring the process in a running state.

Despite the fact that I am not specifying any process management activities, these Nix expressions could still be considered somewhat a "how specification", because each configuration is tailored towards a specific process manager. A process manager, such as syvinit, is a means to accomplish something else: getting a running process whose life-cycle can be conveniently managed.

If I would revise the above specifications to only express what I kind of running process I want, disregarding the process manager, then I could simply write:

{createManagedProcess, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createManagedProcess {
  name = instanceName;
  description = "Nginx";
  initialize = ''
    mkdir -p ${nginxLogDir}
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-c" configFile" -p" "${stateDir}/${instanceName}" ];

  inherit dependencies instanceName;
}

The above Nix expression simply states that we want to run a managed Nginx process (using certain command-line arguments) and before starting the process, we want to initialize the state by creating the log directory, if it does not exists yet.

I can translate the above specification to all kinds of configuration artifacts that can be used by a variety of process managers to accomplish the same outcome. I have developed six kinds of generators allowing me to target the following process managers:


Translating the properties of the process manager-agnostic configuration to a process manager-specific properties is quite straight forward for most concepts -- in many cases, there is a direct mapping between a property in the process manager-agnostic configuration to a process manager-specific property.

For example, when we intend to target supervisord, then we can translate the process and args parameters to a command invocation. For systemd, we can translate process and args to the ExecStart property that refers to a command-line instruction that starts the process.

Although the process manager-agnostic abstraction function supports enough features to get some well known system services working (e.g. Nginx, Apache HTTP service, PostgreSQL, MySQL etc.), it does not facilitate all possible features of each process manager -- it will provide a reasonable set of common features to get a process running and to impose some restrictions on it.

It is still possible work around the feature limitations of process manager-agnostic deployment specifications. We can also influence the generation process by defining overrides to get process manager-specific properties supported:

{createManagedProcess, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createManagedProcess {
  name = instanceName;
  description = "Nginx";
  initialize = ''
    mkdir -p ${nginxLogDir}
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-c" configFile" -p" "${stateDir}/${instanceName}" ];

  inherit dependencies instanceName;

  overrides = {
    sysvinit = {
      runlevels = [ 3 4 5 ];
    };
  };
}

In the above example, we have added an override specifically for sysvinit to tell that the init system that the process should be started in runlevels 3, 4 and 5 (which implies the process should stopped in the remaining runlevels: 0, 1, 2, and 6). The other process managers that I have worked with do not have a notion of runlevels.

Similarly, we can use an override to, for example, use systemd-specific features to run a process in a Linux namespace etc.

Simulating process manager-agnostic concepts with no direct equivalents


For some process manager-agnostic concepts, process managers do not always have direct equivalents. In such cases, there is still the possibility to apply non-trivial simulation strategies.

Foreground processes or daemons


What all deployment specifications shown in this blog post have in common is that their main objective is to bring a process in a running state. How these processes are expected to behave is different among process managers.

sysvinit and BSD rc scripts expect processes to daemonize -- on invocation, a process spawns another process that keeps running in the background (the daemon process). After the initialization of the daemon process is done, the parent process terminates. If processes do not deamonize, the startup process execution blocks indefinitely.

Daemons introduce another complexity from a process management perspective -- when invoking an executable from a shell session in background mode, the shell can you tell its process ID, so that it can be stopped when it is no longer necessary.

With deamons, an invoked processes forks another child process (or when it supposed to really behave well: it double forks) that becomes the daemon process. The daemon process gets adopted by the init system, and thus remains in the background even if the shell session ends.

The shell that invokes the executable does not know the PIDs of the resulting daemon processes, because that value is only propagated to the daemon's parent process, not the calling shell session. To still be able to control it, a well-behaving daemon typically writes its process IDs to a so-called PID file, so that it can be reliably terminated by a shell command when it is no longer required.

sysvinit and BSD rc scripts extensively use PID files to control daemons. By using a process' PID file, the managing sysvinit/BSD rc script can tell you whether a process is running or not and reliably terminate a process instance.

"More modern" process managers, such as launchd, supervisord, and cygrunsrv, do not work with processes that daemonize -- instead, these process managers are daemons themselves that invoke processes that work in "foreground mode".

One of the advantages of this approach is that services can be more reliably controlled -- because their PIDs are directly propagated to the controlling daemon from the fork() library call, it is no longer required to work with PID files, that may not always work reliably (for example: a process might abrubtly terminate and never clean its PID file, giving the system the false impression that it is still running).

systemd improves process control even further by using Linux cgroups -- although foreground process may be controlled more reliably than daemons, they can still fork other processes (e.g. a web service that creates processes per connection). When the controlling parent process terminates, and does not properly terminate its own child processes, they may keep running in the background indefintely. With cgroups it is possible for the process manager to retain control over all processes spawned by a service and terminate them when a service is no longer needed.

systemd has another unique advantage over the other process managers -- it can work both with foreground processes and daemons, although foreground processes seem to have to preference according to the documentation, because they are much easier to control and develop.

Many common system services, such as OpenSSH, MySQL or Nginx, have the ability to both run as a foreground process and as a daemon, typically by providing a command-line parameter or defining a property in a configuration file.

To provide an optimal user experience for all supported process managers, it is typically a good thing in the process manager-agnostic deployment specification to specify both how a process can be used as a foreground process and as a daemon:

{createManagedProcess, nginx, stateDir, runtimeDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createManagedProcess {
  name = instanceName;
  description = "Nginx";
  initialize = ''
    mkdir -p ${nginxLogDir}
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-p" "${stateDir}/${instanceName}" "-c" configFile ];
  foregroundProcessExtraArgs = [ "-g" "daemon off;" ];
  daemonExtraArgs = [ "-g" "pid ${runtimeDir}/${instanceName}.pid;" ];

  inherit dependencies instanceName;

  overrides = {
    sysvinit = {
      runlevels = [ 3 4 5 ];
    };
  };
}

In the above example, we have revised Nginx expression to both specify how the process can be started as a foreground process and as a daemon. The only thing that needs to be configured differently is one global directive in the Nginx configuration file -- by default, Nginx runs as a deamon, but by adding the daemon off; directive to the configuration we can run it in foreground mode.

When we run Nginx as daemon, we configure a PID file that refers to the instance name so that multiple instances can co-exist.

To make this conveniently configurable, the above expression does the following:

  • The process parameter specifies the process that needs to be started both in foreground mode and as a daemon. The args parameter specifies common command-line arguments that both the foreground and daemon process will use.
  • The foregroundProcessExtraArgs parameter specifies additional command-line arguments that are only used when the process is started in foreground mode. In the above example, it is used to provide Nginx the global directive that disables the daemon setting.
  • The daemonExtraArgs parameter specifies additional command-line arguments that are only used when the process is started as a daemon. In the above example, it used to provide Nginx a global directive with a PID file path that uniquely identifies the process instance.

For custom software and services implemented in different language than C, e.g. Node.js, Java or Python, it is far less common that they have the ability to daemonize -- they can typically only be used as foreground processes.

Nonetheless, we can still daemonize foreground-only processes, by using an external tool, such as libslack's daemon command:

$ daemon -U -i myforegroundprocess

The above command deamonizes the foreground process and creates a PID file for it, so that it can be managed by the sysvinit/BSD rc utility scripts.

The opposite kind of "simulation" is also possible -- if a process can only be used as a daemon, then we can use a proxy process to make it appear as a foreground process:

export _TOP_PID=$$

# Handle to SIGTERM and SIGINT signals and forward them to the daemon process
_term()
{
    trap "exit 0" TERM
    kill -TERM "$pid"
    kill $_TOP_PID
}

_interrupt()
{
    kill -INT "$pid"
}

trap _term SIGTERM
trap _interrupt SIGINT

# Start process in the background as a daemon
${executable} "$@"

# Wait for the PID file to become available.
# Useful to work with daemons that don't behave well enough.
count=1

while [ ! -f "${_pidFile}" ]
do
    if [ $count -eq 10 ]
    then
        echo "It does not seem that there isn't any pid file! Giving up!"
        exit 1
    fi

    echo "Waiting for ${_pidFile} to become available..."
    sleep 1

    ((count++))
done

# Determine the daemon's PID by using the PID file
pid=$(cat ${_pidFile})

# Wait in the background for the PID to terminate
${if stdenv.isDarwin then ''
  lsof -p $pid +r 3 &>/dev/null &
'' else if stdenv.isLinux || stdenv.isCygwin then ''
  tail --pid=$pid -f /dev/null &
 '' else if stdenv.isBSD || stdenv.isSunOS then ''
   pwait $pid &
 '' else
   throw "Don't know how to wait for process completion on system: ${stdenv.system}"}

# Wait for the blocker process to complete.
# We use wait, so that bash can still
# handle the SIGTERM and SIGINT signals that may be sent to it by
# a process manager
blocker_pid=$!
wait $blocker_pid

The idea of the proxy script shown above is that it runs as a foreground process as long as the daemon process is running and relays any relevant incoming signals (e.g. a terminate and interrupt) to the daemon process.

Implementing this proxy was a bit tricky:

  • In the beginning of the script we configure signal handlers for the TERM and INT signals so that the process manager can terminate the daemon process.
  • We must start the daemon and wait for it to become available. Although the parent process of a well-behaving daemon should only terminate when the initialization is done, this turns out not be a hard guarantee -- to make the process a bit more robust, we deliberately wait for the PID file to become available, before we attempt to wait for the termination of the daemon.
  • Then we wait for the PID to terminate. The bash shell has an internal wait command that can be used to wait for a background process to terminate, but this only works with processes in the same process group as the shell. Daemons are in a new session (with different process groups), so they cannot be monitored by the shell by using the wait command.

    From this Stackoverflow article, I learned that we can use the tail command of GNU Coreutils, or lsof on macOS/Darwin, and pwait on BSDs and Solaris/SunOS to monitor processes in other process groups.
  • When a command is being executed by a shell script (e.g. in this particular case: tail, lsof or pwait), the shell script can no longer respond to signals until the command completes. To still allow the script to respond to signals while it is waiting for the daemon process to terminate, we must run the previous command in background mode, and we use the wait instruction to block the script. While a wait command is running, the shell can respond to signals.

The generator function will automatically pick the best solution for the selected target process manager -- this means that when our target process manager are sysvinit or BSD rc scripts, the generator automatically picks the configuration settings to run the process as a daemon. For the remaining process managers, the generator will pick the configuration settings that runs it as a foreground process.

If a desired process model is not supported, then the generator will automatically simulate it. For instance, if we have a foreground-only process specification, then the generator will automatically configure a sysvinit script to call the daemon executable to daemonize it.

A similar process happens when a daemon-only process specification is deployed for a process manager that cannot work with it, such as supervisord.

State initialization


Another important aspect in process deployment is state initialization. Most system services require the presence of state directories in which they can store their PID, log and temp files. If these directories do not exist, the service may not work and refuse to start.

To cope with this problem, I typically make processes self initializing -- before starting the process, I check whether the state has been intialized (e.g. check if the state directories exist) and re-initialize the initial state if needed.

With most process managers, state initialization is easy to facilitate. For sysvinit and BSD rc scripts, we just use the generator to first execute the shell commands to initialize the state before the process gets started.

Supervisord allows you to execute multiple shell commands in a single command directive -- we can just execute a script that initializes the state before we execute the process that we want to manage.

systemd has a ExecStartPre directive that can be used to specify shell commands to execute before the main process starts.

Apple launchd and cygrunsrv, however, do not have a generic shell execution mechanism or some facility allowing you to execute things before a process starts. Nonetheless, we can still ensure that the state is going to be initialized by creating a wrapper script -- first the wrapper script does the state initialization and then executes the main process.

If a state initialization procedure was specified and the target process manager does not support scripting, then the generator function will transparently wrap the main process into a wrapper script that supports state initialization.

Process dependencies


Another important generic concept is process dependency management. For example, Nginx can act as a reverse proxy for another web application process. To provide a functional Nginx service, we must be sure that the web application process gets activated as well, and that the web application is activated before Nginx.

If the web application process is activated after Nginx or missing completely, then Nginx is (temporarily) unable to redirect incoming requests to the web application process causing end-users to see bad gateway errors.

The process managers that I have experimented with all have a different notion of process dependencies.

sysvinit scripts can optionally declare dependencies in their comment sections. Tools that know how to interpret these dependency specifications can use it to decide the right activation order. Systems using sysvinit typically ignore this specification. Instead, they work with sequence numbers in their file names -- each run level configuration directory contains a prefix (S or K) followed by two numeric digits that defines the start or stop order.

supervisord does not work with dependency specifications, but every program can optionally provide a priority setting that can be used to order the activation and deactivation of programs -- lower priority numbers have precedence over high priority numbers.

From dependency specifications in a process management expression, the generator function can automatically derive sequence numbers for process managers that require it.

Similar to sysvinit scripts, BSD rc scripts can also declare dependencies in their comment sections. Contrary to sysvinit scripts, BSD rc scripts can use the rcorder tool to parse these dependencies from the comments section and automatically derive the order in which the BSD rc scripts need to be activated.

cygrunsrv also allows you directly specify process dependencies. The Windows service manager makes sure that the service get activated in the right order and that all process dependencies are activated first. The only limitation is that cygrunsrv only allows up to 16 dependencies to be specified per service.

To simulate process dependencies with systemd, we can use two properties. The Wants property can be used to tell systemd that another service needs to be activated first. The After property can be used to specify the ordering.

Sadly, it seems that launchd has no notion of process dependencies at all -- processes can be activated by certain events, e.g. when a kernel module was loaded or through socket activation, but it does not seem to have the ability to configure process dependencies or the activation ordering. When our target process manager is launchd, then we simply have to inform the user that proper activation ordering cannot be guaranteed.

Changing user privileges


Another general concept, that has subtle differences in each process manager, is changing user privileges. Typically for the deployment of system services, you do not want to run these services as root user (that has full access to the filesystem), but as an unprivileged user.

sysvinit and BSD rc scripts have to change users through the su command. The su command can be used to change the user ID (UID), and will automatically adopt the primary group ID (GID) of the corresponding user.

Supervisord and cygrunsrv can also only change user IDs (UIDs), and will adopt the primary group ID (GID) of the corresponding user.

Systemd and launchd can both change the user IDs and group IDs of the process that it invokes.

Because only changing UIDs are universally supported amongst process managers, I did not add a configuration property that allows you to change GIDs in a process manager-agnostic way.

Deploying process manager-agnostic configurations


With a processes Nix expression, we can define which process instances we want to run (and how they can be constructed from source code and their dependencies):

{ pkgs ? import  { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  constructors = import ./constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir;
    inherit forceDisableUserChange processManager;
  };                                                                                                                                                                                               
in                                                                                                                                                                                                 
rec {                                                                                                                                                                                              
  webapp = rec {                                                                                                                                                                                   
    port = 5000;                                                                                                                                                                                   
    dnsName = "webapp.local";                                                                                                                                                                      
                                                                                                                                                                                                   
    pkg = constructors.webapp {                                                                                                                                                                    
      inherit port;                                                                                                                                                                                
    };                                                                                                                                                                                             
  };                                                                                                                                                                                               
                                                                                                                                                                                                   
  nginxReverseProxy = rec {
    port = 8080;

    pkg = constructors.nginxReverseProxy {
      webapps = [ webapp ];
      inherit port;
    } {};
  };
}

In the above Nix expression, we compose two running process instances:

  • webapp is a trivial web application process that will simply return a static HTML page by using the HTTP protocol.
  • nginxReverseProxy is a Nginx server configured as a reverse proxy server. It will forward incoming HTTP requests to the appropriate web application instance, based on the virtual host name. If a virtual host name is webapp.local, then Nginx forwards the request to the webapp instance.

To generate the configuration artifacts for the process instances, we refer to a separate constructors Nix expression. Each constructor will call the createManagedProcess function abstraction (as shown earlier) to construct a process configuration in a process manager-agnostic way.

With the following command-line instruction, we can generate sysvinit scripts for the webapp and Nginx processes declared in the processes expression, and run them as an unprivileged user with the state files managed in our home directory:

$ nixproc-build --process-manager sysvinit \
  --state-dir /home/sander/var \
  --force-disable-user-change processes.nix

By adjusting the --process-manager parameter we can also generate artefacts for a different process manager. For example, the following command will generate systemd unit config files instead of sysvinit scripts:

$ nixproc-build --process-manager systemd \
  --state-dir /home/sander/var \
  --force-disable-user-change processes.nix

The following command will automatically build and deploy all processes, using sysvinit as a process manager:

$ nixproc-sysvinit-switch --state-dir /home/sander/var \
  --force-disable-user-change processes.nix

We can also run a life-cycle management activity on all previously deployed processes. For example, to retrieve the statuses of all processes, we can run:

$ nixproc-sysvinit-runactivity status

We can also traverse the processes in reverse dependency order. This is particularly useful to reliably stop all processes, without breaking any process dependencies:

$ nixproc-sysvinit-runactivity -r stop

Similarly, there are command-line tools to use the other supported process managers. For example, to deploy systemd units instead of sysvinit scripts, you can run:

$ nixproc-systemd-switch processes.nix

Distributed process manager-agnostic deployment with Disnix


As shown in the previous process management framework blog post, it is also possible to deploy processes to machines in a network and have inter-dependencies between processes. These kinds of deployments can be managed by Disnix.

Compared to the previous blog post (in which we could only deploy sysvinit scripts), we can now also use any process manager that the framework supports. The Dysnomia toolset provides plugins that supports all process managers that this framework supports:

{ pkgs, distribution, invDistribution, system
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager ? "sysvinit"
}:

let
  constructors = import ./constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir;
    inherit forceDisableUserChange processManager;
  };

  processType =
    if processManager == "sysvinit" then "sysvinit-script"
    else if processManager == "systemd" then "systemd-unit"
    else if processManager == "supervisord" then "supervisord-program"
    else if processManager == "bsdrc" then "bsdrc-script"
    else if processManager == "cygrunsrv" then "cygrunsrv-service"
    else throw "Unknown process manager: ${processManager}";
in
rec {
  webapp = rec {
    name = "webapp";
    port = 5000;
    dnsName = "webapp.local";
    pkg = constructors.webapp {
      inherit port;
    };
    type = processType;
  };

  nginxReverseProxy = rec {
    name = "nginxReverseProxy";
    port = 8080;
    pkg = constructors.nginxReverseProxy {
      inherit port;
    };
    dependsOn = {
      inherit webapp;
    };
    type = processType;
  };
}

In the above expression, we have extended the previously shown processes expression into a Disnix service expression, in which every attribute in the attribute set represents a service that can be distributed to a target machine in the network.

The type attribute of each service indicates which Dysnomia plugin needs to manage its life-cycle. We can automatically select the appropriate plugin for our desired process manager by deriving it from the processManager parameter.

The above Disnix expression has a drawback -- in a heteregenous network of machines (that run multiple operating systems and/or process managers), we need to compose all desired variants of each service with configuration files for each process manager that we want to use.

It is also possible to have target-agnostic services, by delegating the translation steps to the corresponding target machines. Instead of directly generating a configuration file for a process manager, we generate a JSON specification containing all parameters that are passed to createManagedProcess. We can use this JSON file to build the corresponding configuration artefacts on the target machine:

{ pkgs, distribution, invDistribution, system
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager ? null
}:

let
  constructors = import ./constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir;
    inherit forceDisableUserChange processManager;
  };
in
rec {
  webapp = rec {
    name = "webapp";
    port = 5000;
    dnsName = "webapp.local";
    pkg = constructors.webapp {
      inherit port;
    };
    type = "managed-process";
  };

  nginxReverseProxy = rec {
    name = "nginxReverseProxy";
    port = 8080;
    pkg = constructors.nginxReverseProxy {
      inherit port;
    };
    dependsOn = {
      inherit webapp;
    };
    type = "managed-process";
  };
}

In the above services model, we have set the processManager parameter to null causing the generator to print JSON presentations of the function parameters passed to createManagedProcess.

The managed-process type refers to a Dysnomia plugin that consumes the JSON specification and invokes the createManagedProcess function to convert the JSON configuration to a configuration file used by the preferred process manager.

In the infrastructure model, we can configure the preferred process manager for each target machine:

{
  test1 = {
    properties = {
      hostname = "test1";
    };
    containers = {
      managed-process = {
        processManager = "sysvinit";
      };
    };
  };

  test2 = {
    properties = {
      hostname = "test2";
    };
    containers = {
      managed-process = {
        processManager = "systemd";
      };
    };
  };
}

In the above infrastructure model, the managed-proces container on the first machine: test1 has been configured to use sysvinit scripts to manage processes. On the second test machine: test2 the managed-process container is configured to use systemd to manage processes.

If we distribute the services in the services model to targets in the infrastructure model as follows:

{infrastructure}:

{
  webapp = [ infrastructure.test1 ];
  nginxReverseProxy = [ infrastructure.test2 ];
}

and the deploy the system as follows:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Then the webapp process will distributed to the test1 machine in the network and will be managed with a sysvinit script.

The nginxReverseProxy will be deployed to the test2 machine and managed as a systemd job. The Nginx reverse proxy forwards incoming connections to the webapp.local domain name to the web application process hosted on the first machine.

Discussion


In this blog post, I have introduced a process manager-agnostic function abstraction making it possible to target all kinds of process managers on a variety of operating systems.

By using a single set of declarative specifications, we can:

  • Target six different process managers on four different kinds of operating systems.
  • Implement various kinds of deployment scenarios: production deployments, test deployments as an unprivileged user.
  • Construct multiple instances of processes.

In a distributed-context, the advantage is that we can uniformly target all supported process managers and operating systems in a heterogeneous environment from a single declarative specification.

This is particularly useful to facilitate technology diversity -- for example, one of the key selling points of Microservices is that "any technology" can be used to implement them. In many cases, technology diversity is "restricted" to frameworks, programming languages, and storage technologies.

One particular aspect that is rarely changed is the choice of operating systems, because of the limitations of deployment tools -- most deployment solutions for Microservices are container-based and heavily rely on Linux-only concepts, such as Namespaces and cgroups.

With this process managemenent framework and the recent Dysnomia plugin additions for Disnix, it is possible to target all kinds of operating systems that support the Nix package manager, making the operating system component selectable as well. This, for example, allows you to also pick the best operating system to implement a certain requirement -- for example, when performance is important you might pick Linux, and when there is a strong emphasis on security, you could pick OpenBSD to host a mission criticial component.

Limitations


The following table, summarizes the differences between the process manager solutions that I have investigated:

sysvinit bsdrc supervisord systemd launchd cygrunsrv
Process type daemon daemon foreground foreground
daemon
foreground foreground
Process control method PID files PID files Process PID cgroups Process PID Process PID
Scripting support yes yes yes yes no no
Process dependency management Numeric ordering Dependency-based Numeric ordering Dependency-based
+ dependency loading
None Dependency-based
+ dependency loading
User changing capabilities user user user and group user and group user and group user
Unprivileged user deployments yes* yes* yes yes* no no
Operating system support Linux FreeBSD
OpenBSD
NetBSD
Many UNIX-like:
Linux
macOS
FreeBSD
Solaris
Linux (+glibc) only macOS (Darwin) Windows (Cygwin)

Although we can facilitate lifecycle management from a common specification with a variety of process managers, only the most important common features are supported.

Not every concept can be done in a process manager agnostic way. For example, we cannot generically do any isolation of resources (except for packages, because we use Nix). It is difficult to generalize these concepts because these they are not standardized, e.g. the POSIX standard does not descibe namespaces and cgroups (or similar concepts).

Furthermore, most process managers (with the exception of supervisord) are operating system specific. As a result, it still matters what process manager is picked.

Related work


Process manager-agnostic deployment is not entirely a new idea. Dysnomia already has a target-agnostic 'process' plugin for quite a while, that translates a simple deployment specification (constisting of key-value pairs) to a systemd unit configuration file or sysvinit script.

The features of Dysnomia's process plugin are much more limited compared to the createManagedProcess abstraction function described in this blog post. It does not support any other than process managers than sysvint and systemd, and it can only work with foreground processes.

Furthermore, target agnostic configurations cannot be easily extended -- it is possible to (ab)use the templating mechanism, but it has no first class overridde facilities.

I also found a project called pleaserun that also has the objective to generate configuration files for a variety of process managers (my approach and pleaserunit, both support sysvinit scripts, systemd and launchd).

It seems to use template files to generate the configuration artefacts, and it does not seem to have a generic extension mechanism. Furthermore, it provides no framework to configure the location of shared resources, automatically install package dependencies or to compose multiple instances of processes.

Some remaining thoughts


Although the Nix package manager (not the NixOS distribution), should be portable amongst a variety of UNIX-like systems, it turns out that the only two operating systems that are well supported are Linux and macOS. Nix was reported to work on a variety of other UNIX-like systems in the past, but recently it seems that many things are broken.

To make Nix work on FreeBSD 12.1, I have used the latest stable Nix package manager version with patches from this repository. It turns out that there is still a patch missing to work around in a bug in FreeBSD that incorrectly kills all processes in a process group. Fortunately, when we run Nix as as unprivileged user, this bug does not seem to cause any serious problems.

Recent versions of Nixpkgs turn out to be horribly broken on FreeBSD -- the FreeBSD stdenv does not seem to work at all. I tried switching back to stdenv-native (a stdenv environment that impurely uses the host system's compiler and core executables), but that also no longer seems to work in the last three major releases -- the Nix expression evaluation breaks in several places. Due to the intense amount of changes and assumptions that the stdenv infrastructure currently makes, it was as good as impossible for me to fix the infrastructure.

As another workaround, I reverted back very to a very old version of Nixpkgs (version 17.03 to be precise), that still has a working stdenv-native environment. With some tiny adjustments (e.g. adding some shell aliases for some GNU variants of certain shell executables to stdenv-native), I have managed to get some basic Nix packages working, including Nginx on FreeBSD.

Surprisingly, running Nix on Cygwin was less painful than FreeBSD (because of all the GNUisms that Cygwin provides). Similar to FreeBSD, recent versions of Nixpkgs also appear to be broken, including the Cygwin stdenv environment. By reverting back to release-18.03 (that still has a somewhat working stdenv for Cygwin), I have managed to build a working Nginx version.

As a future improvement to Nixpkgs, I would like to propose a testing solution for stdenv-native. Although I understand that is difficult to dedicate manpower to maintain all unconventional Nix/Nixpkgs ports, stdenv-native is something that we can also convienently test on Linux and prevent from breaking in the future.

Availability


The latest version of my experimental Nix-based process framework, that includes the process manager-agnostic configuration function described in this blog post, can be obtained from my GitHub page.

In addition, the repository also contains some example cases, including the web application system described in this blog post, and a set of common system services: MySQL, Apache HTTP server, PostgreSQL and Apache Tomcat.

2 comments:

  1. On lobste.rs somebody asked how this solution deals with logging?

    The answer is that this abstraction function does not deal with that at all, but leaves that responsibility up to the process manager that it intends to target.

    In theory, we could also generalize the redirection of standard output, and standard error to files (these properties are support by systemd, launchd, supervisord and cygrunsrv), but this will only work for foreground processes.

    Processes that daemonize properly, detach themselves from the terminal and close the standard file descriptors: stdin, stderr and stdout.

    When they need to log information, they need to facilitate this themselves, for example, by writing to the syslog or by opening a log file themselves.

    Because it is not generalizable between foreground processes and daemons, I have not included it in the framework as a generic concept.

    However, it is still possible to use any unsupported property of a process manager by defining process manager specific overrides.

    ReplyDelete