Saturday, February 15, 2020

A declarative process manager-agnostic deployment framework based on Nix tooling

In a previous blog post written two months ago, I have introduced a new experimental Nix-based process framework, that provides the following features:

  • It uses the Nix expression language for configuring running process instances, including their dependencies. The configuration process is based on only a few simple concepts: function definitions to define constructors that generate process manager configurations, function invocations to compose running process instances, and Nix profiles to make collections of process configurations accessible from a single location.
  • The Nix package manager delivers all packages and configuration files and isolates them in the Nix store, so that they never conflict with other running processes and packages.
  • It identifies process dependencies, so that a process manager can ensure that processes are activated and deactivated in the right order.
  • The ability to deploy multiple instances of the same process, by making conflicting resources configurable.
  • Deploying processes/services as an unprivileged user.
  • Advanced concepts and features, such as namespaces and cgroups, are not required.

Another objective of the framework is that it should work with a variety of process managers on a variety of operating systems.

In my previous blog post, I was deliberately using sysvinit scripts (also known as LSB Init compliant scripts) to manage the lifecycle of running processes as a starting point, because they are universally supported on Linux and self contained -- sysvinit scripts only require the right packages installed, but they do not rely on external programs that manage the processes' life-cycle. Moreover, sysvinit scripts can also be conveniently used as an unprivileged user.

I have also developed a Nix function that can be used to more conveniently generate sysvinit scripts. Traditionally, these scripts are written by hand and basically require that the implementer writes the same boilerplate code over and over again, such as the activities that start and stop the process.

The sysvinit script generator function can also be used to directly specify the implementation of all activities that manage the life-cycle of a process, such as:

{createSystemVInitScript, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createSystemVInitScript {
  name = instanceName;
  description = "Nginx";
  activities = {
    start = ''
      mkdir -p ${nginxLogDir}
      log_info_msg "Starting Nginx..."
      loadproc ${nginx}/bin/nginx -c ${configFile} -p ${stateDir}
      evaluate_retval
    '';
    stop = ''
      log_info_msg "Stopping Nginx..."
      killproc ${nginx}/bin/nginx
      evaluate_retval
    '';
    reload = ''
      log_info_msg "Reloading Nginx..."
      killproc ${nginx}/bin/nginx -HUP
      evaluate_retval
    '';
    restart = ''
      $0 stop
      sleep 1
      $0 start
    '';
    status = "statusproc ${nginx}/bin/nginx";
  };
  runlevels = [ 3 4 5 ];

  inherit dependencies instanceName;
}

In the above Nix expression, we specify five activities to manage the life-cycle of Nginx, a free/open source web server:

  • The start activity initializes the state of Nginx and starts the process (as a daemon that runs in the background).
  • stop stops the Nginx daemon.
  • reload instructs Nginx to reload its configuration
  • restart restarts the process
  • status shows whether the process is running or not.

Besides directly implementing activities, the Nix function invocation shown above can also be used on a much higher level -- typically, sysvinit scripts follow the same conventions. Nearly all sysvinit scripts implement the activities described above to manage the life-cycle of a process, and these typically need to be re-implemented over and over again.

We can also generate the implementations of these activities automatically from a high level specification, such as:

{createSystemVInitScript, nginx,  stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createSystemVInitScript {
  name = instanceName;
  description = "Nginx";
  initialize = ''
    mkdir -p ${nginxLogDir}
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-c" configFile "-p" stateDir ];
  runlevels = [ 3 4 5 ];

  inherit dependencies instanceName;
}

You could basically say that the above createSystemVInitScript function invocation makes the configuration process of a sysvinit script "more declarative" -- you do not need to specify the activities that need to be executed to manage processes, but instead, you specify the relevant characteristics of a running process.

From this high level specification, the implementations for all required activities will be derived, using conventions that are commonly used to write sysvinit scripts.

After completing the initial version of the process management framework that works with sysvinit scripts, I have also been investigating other process managers. I discovered that their configuration processes have many things in common with the sysvinit approach. As a result, I have decided to explore these declarative deployment concepts a bit further.

In this blog post, I will describe a declarative process manager-agnostic deployment approach that we can integrate into the experimental Nix-based process management framework.

Writing declarative deployment specifications for managed running processes


As explained in the introduction, I have also been experimenting with other process managers than sysvinit. For example, instead of generating a sysvinit script that manages the life-cycle of a process, such as the Nginx server, we can also generate a supervisord configuration file to define Nginx as a program that can be managed with supervisord:

{createSupervisordProgram, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createSupervisordProgram {
  name = instanceName;
  command = "mkdir -p ${nginxLogDir}; "+
    "${nginx}/bin/nginx -c ${configFile} -p ${stateDir}";
  inherit dependencies;
}

Invoking the above function will generate a supervisord program configuration file, instead of a sysvinit script.

With the following Nix expression, we can generate a systemd unit file so that Nginx's life-cycle can be managed by systemd:

{createSystemdService, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createSystemdService {
  name = instanceName;
  Unit = {
    Description = "Nginx";
  };
  Service = {
    ExecStartPre = "+mkdir -p ${nginxLogDir}";
    ExecStart = "${nginx}/bin/nginx -c ${configFile} -p ${stateDir}";
    Type = "simple";
  };

  inherit dependencies;
}

What you may probably notice when comparing the above two Nix expressions with the last sysvinit example (that captures process characteristics instead of activities), is that they all contain very similar properties. Their main difference is a slightly different organization and naming convention, because each abstraction function is tailored towards the configuration conventions that each target process manager uses.

As discussed in my previous blog post about declarative programming and deployment, declarativity is a spectrum -- the above specifications are (somewhat) declarative because they do not capture the activities to manage the life-cycle of the process (the how). Instead, they specify what process we want to run. The process manager derives and executes all activities to bring that process in a running state.

sysvinit scripts themselves are not declarative, because they specify all activities (i.e. shell commands) that need to be executed to accomplish that goal. supervisord configurations and systemd services configuration files are (somewhat) declarative, because they capture process characteristics -- the process manager executes derives all required activities to bring the process in a running state.

Despite the fact that I am not specifying any process management activities, these Nix expressions could still be considered somewhat a "how specification", because each configuration is tailored towards a specific process manager. A process manager, such as syvinit, is a means to accomplish something else: getting a running process whose life-cycle can be conveniently managed.

If I would revise the above specifications to only express what I kind of running process I want, disregarding the process manager, then I could simply write:

{createManagedProcess, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createManagedProcess {
  name = instanceName;
  description = "Nginx";
  initialize = ''
    mkdir -p ${nginxLogDir}
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-c" configFile" -p" "${stateDir}/${instanceName}" ];

  inherit dependencies instanceName;
}

The above Nix expression simply states that we want to run a managed Nginx process (using certain command-line arguments) and before starting the process, we want to initialize the state by creating the log directory, if it does not exists yet.

I can translate the above specification to all kinds of configuration artifacts that can be used by a variety of process managers to accomplish the same outcome. I have developed six kinds of generators allowing me to target the following process managers:


Translating the properties of the process manager-agnostic configuration to a process manager-specific properties is quite straight forward for most concepts -- in many cases, there is a direct mapping between a property in the process manager-agnostic configuration to a process manager-specific property.

For example, when we intend to target supervisord, then we can translate the process and args parameters to a command invocation. For systemd, we can translate process and args to the ExecStart property that refers to a command-line instruction that starts the process.

Although the process manager-agnostic abstraction function supports enough features to get some well known system services working (e.g. Nginx, Apache HTTP service, PostgreSQL, MySQL etc.), it does not facilitate all possible features of each process manager -- it will provide a reasonable set of common features to get a process running and to impose some restrictions on it.

It is still possible work around the feature limitations of process manager-agnostic deployment specifications. We can also influence the generation process by defining overrides to get process manager-specific properties supported:

{createManagedProcess, nginx, stateDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createManagedProcess {
  name = instanceName;
  description = "Nginx";
  initialize = ''
    mkdir -p ${nginxLogDir}
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-c" configFile" -p" "${stateDir}/${instanceName}" ];

  inherit dependencies instanceName;

  overrides = {
    sysvinit = {
      runlevels = [ 3 4 5 ];
    };
  };
}

In the above example, we have added an override specifically for sysvinit to tell that the init system that the process should be started in runlevels 3, 4 and 5 (which implies the process should stopped in the remaining runlevels: 0, 1, 2, and 6). The other process managers that I have worked with do not have a notion of runlevels.

Similarly, we can use an override to, for example, use systemd-specific features to run a process in a Linux namespace etc.

Simulating process manager-agnostic concepts with no direct equivalents


For some process manager-agnostic concepts, process managers do not always have direct equivalents. In such cases, there is still the possibility to apply non-trivial simulation strategies.

Foreground processes or daemons


What all deployment specifications shown in this blog post have in common is that their main objective is to bring a process in a running state. How these processes are expected to behave is different among process managers.

sysvinit and BSD rc scripts expect processes to daemonize -- on invocation, a process spawns another process that keeps running in the background (the daemon process). After the initialization of the daemon process is done, the parent process terminates. If processes do not deamonize, the startup process execution blocks indefinitely.

Daemons introduce another complexity from a process management perspective -- when invoking an executable from a shell session in background mode, the shell can you tell its process ID, so that it can be stopped when it is no longer necessary.

With deamons, an invoked processes forks another child process (or when it supposed to really behave well: it double forks) that becomes the daemon process. The daemon process gets adopted by the init system, and thus remains in the background even if the shell session ends.

The shell that invokes the executable does not know the PIDs of the resulting daemon processes, because that value is only propagated to the daemon's parent process, not the calling shell session. To still be able to control it, a well-behaving daemon typically writes its process IDs to a so-called PID file, so that it can be reliably terminated by a shell command when it is no longer required.

sysvinit and BSD rc scripts extensively use PID files to control daemons. By using a process' PID file, the managing sysvinit/BSD rc script can tell you whether a process is running or not and reliably terminate a process instance.

"More modern" process managers, such as launchd, supervisord, and cygrunsrv, do not work with processes that daemonize -- instead, these process managers are daemons themselves that invoke processes that work in "foreground mode".

One of the advantages of this approach is that services can be more reliably controlled -- because their PIDs are directly propagated to the controlling daemon from the fork() library call, it is no longer required to work with PID files, that may not always work reliably (for example: a process might abrubtly terminate and never clean its PID file, giving the system the false impression that it is still running).

systemd improves process control even further by using Linux cgroups -- although foreground process may be controlled more reliably than daemons, they can still fork other processes (e.g. a web service that creates processes per connection). When the controlling parent process terminates, and does not properly terminate its own child processes, they may keep running in the background indefintely. With cgroups it is possible for the process manager to retain control over all processes spawned by a service and terminate them when a service is no longer needed.

systemd has another unique advantage over the other process managers -- it can work both with foreground processes and daemons, although foreground processes seem to have to preference according to the documentation, because they are much easier to control and develop.

Many common system services, such as OpenSSH, MySQL or Nginx, have the ability to both run as a foreground process and as a daemon, typically by providing a command-line parameter or defining a property in a configuration file.

To provide an optimal user experience for all supported process managers, it is typically a good thing in the process manager-agnostic deployment specification to specify both how a process can be used as a foreground process and as a daemon:

{createManagedProcess, nginx, stateDir, runtimeDir}:
{configFile, dependencies ? [], instanceSuffix ? ""}:

let
  instanceName = "nginx${instanceSuffix}";
  nginxLogDir = "${stateDir}/${instanceName}/logs";
in
createManagedProcess {
  name = instanceName;
  description = "Nginx";
  initialize = ''
    mkdir -p ${nginxLogDir}
  '';
  process = "${nginx}/bin/nginx";
  args = [ "-p" "${stateDir}/${instanceName}" "-c" configFile ];
  foregroundProcessExtraArgs = [ "-g" "daemon off;" ];
  daemonExtraArgs = [ "-g" "pid ${runtimeDir}/${instanceName}.pid;" ];

  inherit dependencies instanceName;

  overrides = {
    sysvinit = {
      runlevels = [ 3 4 5 ];
    };
  };
}

In the above example, we have revised Nginx expression to both specify how the process can be started as a foreground process and as a daemon. The only thing that needs to be configured differently is one global directive in the Nginx configuration file -- by default, Nginx runs as a deamon, but by adding the daemon off; directive to the configuration we can run it in foreground mode.

When we run Nginx as daemon, we configure a PID file that refers to the instance name so that multiple instances can co-exist.

To make this conveniently configurable, the above expression does the following:

  • The process parameter specifies the process that needs to be started both in foreground mode and as a daemon. The args parameter specifies common command-line arguments that both the foreground and daemon process will use.
  • The foregroundProcessExtraArgs parameter specifies additional command-line arguments that are only used when the process is started in foreground mode. In the above example, it is used to provide Nginx the global directive that disables the daemon setting.
  • The daemonExtraArgs parameter specifies additional command-line arguments that are only used when the process is started as a daemon. In the above example, it used to provide Nginx a global directive with a PID file path that uniquely identifies the process instance.

For custom software and services implemented in different language than C, e.g. Node.js, Java or Python, it is far less common that they have the ability to daemonize -- they can typically only be used as foreground processes.

Nonetheless, we can still daemonize foreground-only processes, by using an external tool, such as libslack's daemon command:

$ daemon -U -i myforegroundprocess

The above command deamonizes the foreground process and creates a PID file for it, so that it can be managed by the sysvinit/BSD rc utility scripts.

The opposite kind of "simulation" is also possible -- if a process can only be used as a daemon, then we can use a proxy process to make it appear as a foreground process:

export _TOP_PID=$$

# Handle to SIGTERM and SIGINT signals and forward them to the daemon process
_term()
{
    trap "exit 0" TERM
    kill -TERM "$pid"
    kill $_TOP_PID
}

_interrupt()
{
    kill -INT "$pid"
}

trap _term SIGTERM
trap _interrupt SIGINT

# Start process in the background as a daemon
${executable} "$@"

# Wait for the PID file to become available.
# Useful to work with daemons that don't behave well enough.
count=0

while [ ! -f "${_pidFile}" ]
do
    if [ $count -eq 10 ]
    then
        echo "It does not seem that there isn't any pid file! Giving up!"
        exit 1
    fi

    echo "Waiting for ${_pidFile} to become available..."
    sleep 1

    ((count=count++))
done

# Determine the daemon's PID by using the PID file
pid=$(cat ${_pidFile})

# Wait in the background for the PID to terminate
${if stdenv.isDarwin then ''
  lsof -p $pid +r 3 &>/dev/null &
'' else if stdenv.isLinux || stdenv.isCygwin then ''
  tail --pid=$pid -f /dev/null &
 '' else if stdenv.isBSD || stdenv.isSunOS then ''
   pwait $pid &
 '' else
   throw "Don't know how to wait for process completion on system: ${stdenv.system}"}

# Wait for the blocker process to complete.
# We use wait, so that bash can still
# handle the SIGTERM and SIGINT signals that may be sent to it by
# a process manager
blocker_pid=$!
wait $blocker_pid

The idea of the proxy script shown above is that it runs as a foreground process as long as the daemon process is running and relays any relevant incoming signals (e.g. a terminate and interrupt) to the daemon process.

Implementing this proxy was a bit tricky:

  • In the beginning of the script we configure signal handlers for the TERM and INT signals so that the process manager can terminate the daemon process.
  • We must start the daemon and wait for it to become available. Although the parent process of a well-behaving daemon should only terminate when the initialization is done, this turns out not be a hard guarantee -- to make the process a bit more robust, we deliberately wait for the PID file to become available, before we attempt to wait for the termination of the daemon.
  • Then we wait for the PID to terminate. The bash shell has an internal wait command that can be used to wait for a background process to terminate, but this only works with processes in the same process group as the shell. Daemons are in a new session (with different process groups), so they cannot be monitored by the shell by using the wait command.

    From this Stackoverflow article, I learned that we can use the tail command of GNU Coreutils, or lsof on macOS/Darwin, and pwait on BSDs and Solaris/SunOS to monitor processes in other process groups.
  • When a command is being executed by a shell script (e.g. in this particular case: tail, lsof or pwait), the shell script can no longer respond to signals until the command completes. To still allow the script to respond to signals while it is waiting for the daemon process to terminate, we must run the previous command in background mode, and we use the wait instruction to block the script. While a wait command is running, the shell can respond to signals.

The generator function will automatically pick the best solution for the selected target process manager -- this means that when our target process manager are sysvinit or BSD rc scripts, the generator automatically picks the configuration settings to run the process as a daemon. For the remaining process managers, the generator will pick the configuration settings that runs it as a foreground process.

If a desired process model is not supported, then the generator will automatically simulate it. For instance, if we have a foreground-only process specification, then the generator will automatically configure a sysvinit script to call the daemon executable to daemonize it.

A similar process happens when a daemon-only process specification is deployed for a process manager that cannot work with it, such as supervisord.

State initialization


Another important aspect in process deployment is state initialization. Most system services require the presence of state directories in which they can store their PID, log and temp files. If these directories do not exist, the service may not work and refuse to start.

To cope with this problem, I typically make processes self initializing -- before starting the process, I check whether the state has been intialized (e.g. check if the state directories exist) and re-initialize the initial state if needed.

With most process managers, state initialization is easy to facilitate. For sysvinit and BSD rc scripts, we just use the generator to first execute the shell commands to initialize the state before the process gets started.

Supervisord allows you to execute multiple shell commands in a single command directive -- we can just execute a script that initializes the state before we execute the process that we want to manage.

systemd has a ExecStartPre directive that can be used to specify shell commands to execute before the main process starts.

Apple launchd and cygrunsrv, however, do not have a generic shell execution mechanism or some facility allowing you to execute things before a process starts. Nonetheless, we can still ensure that the state is going to be initialized by creating a wrapper script -- first the wrapper script does the state initialization and then executes the main process.

If a state initialization procedure was specified and the target process manager does not support scripting, then the generator function will transparently wrap the main process into a wrapper script that supports state initialization.

Process dependencies


Another important generic concept is process dependency management. For example, Nginx can act as a reverse proxy for another web application process. To provide a functional Nginx service, we must be sure that the web application process gets activated as well, and that the web application is activated before Nginx.

If the web application process is activated after Nginx or missing completely, then Nginx is (temporarily) unable to redirect incoming requests to the web application process causing end-users to see bad gateway errors.

The process managers that I have experimented with all have a different notion of process dependencies.

sysvinit scripts can optionally declare dependencies in their comment sections. Tools that know how to interpret these dependency specifications can use it to decide the right activation order. Systems using sysvinit typically ignore this specification. Instead, they work with sequence numbers in their file names -- each run level configuration directory contains a prefix (S or K) followed by two numeric digits that defines the start or stop order.

supervisord does not work with dependency specifications, but every program can optionally provide a priority setting that can be used to order the activation and deactivation of programs -- lower priority numbers have precedence over high priority numbers.

From dependency specifications in a process management expression, the generator function can automatically derive sequence numbers for process managers that require it.

Similar to sysvinit scripts, BSD rc scripts can also declare dependencies in their comment sections. Contrary to sysvinit scripts, BSD rc scripts can use the rcorder tool to parse these dependencies from the comments section and automatically derive the order in which the BSD rc scripts need to be activated.

cygrunsrv also allows you directly specify process dependencies. The Windows service manager makes sure that the service get activated in the right order and that all process dependencies are activated first. The only limitation is that cygrunsrv only allows up to 16 dependencies to be specified per service.

To simulate process dependencies with systemd, we can use two properties. The Wants property can be used to tell systemd that another service needs to be activated first. The After property can be used to specify the ordering.

Sadly, it seems that launchd has no notion of process dependencies at all -- processes can be activated by certain events, e.g. when a kernel module was loaded or through socket activation, but it does not seem to have the ability to configure process dependencies or the activation ordering. When our target process manager is launchd, then we simply have to inform the user that proper activation ordering cannot be guaranteed.

Changing user privileges


Another general concept, that has subtle differences in each process manager, is changing user privileges. Typically for the deployment of system services, you do not want to run these services as root user (that has full access to the filesystem), but as an unprivileged user.

sysvinit and BSD rc scripts have to change users through the su command. The su command can be used to change the user ID (UID), and will automatically adopt the primary group ID (GID) of the corresponding user.

Supervisord and cygrunsrv can also only change user IDs (UIDs), and will adopt the primary group ID (GID) of the corresponding user.

Systemd and launchd can both change the user IDs and group IDs of the process that it invokes.

Because only changing UIDs are universally supported amongst process managers, I did not add a configuration property that allows you to change GIDs in a process manager-agnostic way.

Deploying process manager-agnostic configurations


With a processes Nix expression, we can define which process instances we want to run (and how they can be constructed from source code and their dependencies):

{ pkgs ? import  { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  constructors = import ./constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir;
    inherit forceDisableUserChange processManager;
  };                                                                                                                                                                                               
in                                                                                                                                                                                                 
rec {                                                                                                                                                                                              
  webapp = rec {                                                                                                                                                                                   
    port = 5000;                                                                                                                                                                                   
    dnsName = "webapp.local";                                                                                                                                                                      
                                                                                                                                                                                                   
    pkg = constructors.webapp {                                                                                                                                                                    
      inherit port;                                                                                                                                                                                
    };                                                                                                                                                                                             
  };                                                                                                                                                                                               
                                                                                                                                                                                                   
  nginxReverseProxy = rec {
    port = 8080;

    pkg = constructors.nginxReverseProxy {
      webapps = [ webapp ];
      inherit port;
    } {};
  };
}

In the above Nix expression, we compose two running process instances:

  • webapp is a trivial web application process that will simply return a static HTML page by using the HTTP protocol.
  • nginxReverseProxy is a Nginx server configured as a reverse proxy server. It will forward incoming HTTP requests to the appropriate web application instance, based on the virtual host name. If a virtual host name is webapp.local, then Nginx forwards the request to the webapp instance.

To generate the configuration artifacts for the process instances, we refer to a separate constructors Nix expression. Each constructor will call the createManagedProcess function abstraction (as shown earlier) to construct a process configuration in a process manager-agnostic way.

With the following command-line instruction, we can generate sysvinit scripts for the webapp and Nginx processes declared in the processes expression, and run them as an unprivileged user with the state files managed in our home directory:

$ nixproc-build --process-manager sysvinit \
  --state-dir /home/sander/var \
  --force-disable-user-change processes.nix

By adjusting the --process-manager parameter we can also generate artefacts for a different process manager. For example, the following command will generate systemd unit config files instead of sysvinit scripts:

$ nixproc-build --process-manager systemd \
  --state-dir /home/sander/var \
  --force-disable-user-change processes.nix

The following command will automatically build and deploy all processes, using sysvinit as a process manager:

$ nixproc-sysvinit-switch --state-dir /home/sander/var \
  --force-disable-user-change processes.nix

We can also run a life-cycle management activity on all previously deployed processes. For example, to retrieve the statuses of all processes, we can run:

$ nixproc-sysvinit-runactivity status

We can also traverse the processes in reverse dependency order. This is particularly useful to reliably stop all processes, without breaking any process dependencies:

$ nixproc-sysvinit-runactivity -r stop

Similarly, there are command-line tools to use the other supported process managers. For example, to deploy systemd units instead of sysvinit scripts, you can run:

$ nixproc-systemd-switch processes.nix

Distributed process manager-agnostic deployment with Disnix


As shown in the previous process management framework blog post, it is also possible to deploy processes to machines in a network and have inter-dependencies between processes. These kinds of deployments can be managed by Disnix.

Compared to the previous blog post (in which we could only deploy sysvinit scripts), we can now also use any process manager that the framework supports. The Dysnomia toolset provides plugins that supports all process managers that this framework supports:

{ pkgs, distribution, invDistribution, system
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager ? "sysvinit"
}:

let
  constructors = import ./constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir;
    inherit forceDisableUserChange processManager;
  };

  processType =
    if processManager == "sysvinit" then "sysvinit-script"
    else if processManager == "systemd" then "systemd-unit"
    else if processManager == "supervisord" then "supervisord-program"
    else if processManager == "bsdrc" then "bsdrc-script"
    else if processManager == "cygrunsrv" then "cygrunsrv-service"
    else throw "Unknown process manager: ${processManager}";
in
rec {
  webapp = rec {
    name = "webapp";
    port = 5000;
    dnsName = "webapp.local";
    pkg = constructors.webapp {
      inherit port;
    };
    type = processType;
  };

  nginxReverseProxy = rec {
    name = "nginxReverseProxy";
    port = 8080;
    pkg = constructors.nginxReverseProxy {
      inherit port;
    };
    dependsOn = {
      inherit webapp;
    };
    type = processType;
  };
}

In the above expression, we have extended the previously shown processes expression into a Disnix service expression, in which every attribute in the attribute set represents a service that can be distributed to a target machine in the network.

The type attribute of each service indicates which Dysnomia plugin needs to manage its life-cycle. We can automatically select the appropriate plugin for our desired process manager by deriving it from the processManager parameter.

The above Disnix expression has a drawback -- in a heteregenous network of machines (that run multiple operating systems and/or process managers), we need to compose all desired variants of each service with configuration files for each process manager that we want to use.

It is also possible to have target-agnostic services, by delegating the translation steps to the corresponding target machines. Instead of directly generating a configuration file for a process manager, we generate a JSON specification containing all parameters that are passed to createManagedProcess. We can use this JSON file to build the corresponding configuration artefacts on the target machine:

{ pkgs, distribution, invDistribution, system
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager ? null
}:

let
  constructors = import ./constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir;
    inherit forceDisableUserChange processManager;
  };
in
rec {
  webapp = rec {
    name = "webapp";
    port = 5000;
    dnsName = "webapp.local";
    pkg = constructors.webapp {
      inherit port;
    };
    type = "managed-process";
  };

  nginxReverseProxy = rec {
    name = "nginxReverseProxy";
    port = 8080;
    pkg = constructors.nginxReverseProxy {
      inherit port;
    };
    dependsOn = {
      inherit webapp;
    };
    type = "managed-process";
  };
}

In the above services model, we have set the processManager parameter to null causing the generator to print JSON presentations of the function parameters passed to createManagedProcess.

The managed-process type refers to a Dysnomia plugin that consumes the JSON specification and invokes the createManagedProcess function to convert the JSON configuration to a configuration file used by the preferred process manager.

In the infrastructure model, we can configure the preferred process manager for each target machine:

{
  test1 = {
    properties = {
      hostname = "test1";
    };
    containers = {
      managed-process = {
        processManager = "sysvinit";
      };
    };
  };

  test2 = {
    properties = {
      hostname = "test2";
    };
    containers = {
      managed-process = {
        processManager = "systemd";
      };
    };
  };
}

In the above infrastructure model, the managed-proces container on the first machine: test1 has been configured to use sysvinit scripts to manage processes. On the second test machine: test2 the managed-process container is configured to use systemd to manage processes.

If we distribute the services in the services model to targets in the infrastructure model as follows:

{infrastructure}:

{
  webapp = [ infrastructure.test1 ];
  nginxReverseProxy = [ infrastructure.test2 ];
}

and the deploy the system as follows:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Then the webapp process will distributed to the test1 machine in the network and will be managed with a sysvinit script.

The nginxReverseProxy will be deployed to the test2 machine and managed as a systemd job. The Nginx reverse proxy forwards incoming connections to the webapp.local domain name to the web application process hosted on the first machine.

Discussion


In this blog post, I have introduced a process manager-agnostic function abstraction making it possible to target all kinds of process managers on a variety of operating systems.

By using a single set of declarative specifications, we can:

  • Target six different process managers on four different kinds of operating systems.
  • Implement various kinds of deployment scenarios: production deployments, test deployments as an unprivileged user.
  • Construct multiple instances of processes.

In a distributed-context, the advantage is that we can uniformly target all supported process managers and operating systems in a heterogeneous environment from a single declarative specification.

This is particularly useful to facilitate technology diversity -- for example, one of the key selling points of Microservices is that "any technology" can be used to implement them. In many cases, technology diversity is "restricted" to frameworks, programming languages, and storage technologies.

One particular aspect that is rarely changed is the choice of operating systems, because of the limitations of deployment tools -- most deployment solutions for Microservices are container-based and heavily rely on Linux-only concepts, such as Namespaces and cgroups.

With this process managemenent framework and the recent Dysnomia plugin additions for Disnix, it is possible to target all kinds of operating systems that support the Nix package manager, making the operating system component selectable as well. This, for example, allows you to also pick the best operating system to implement a certain requirement -- for example, when performance is important you might pick Linux, and when there is a strong emphasis on security, you could pick OpenBSD to host a mission criticial component.

Limitations


The following table, summarizes the differences between the process manager solutions that I have investigated:

sysvinit bsdrc supervisord systemd launchd cygrunsrv
Process type daemon daemon foreground foreground
daemon
foreground foreground
Process control method PID files PID files Process PID cgroups Process PID Process PID
Scripting support yes yes yes yes no no
Process dependency management Numeric ordering Dependency-based Numeric ordering Dependency-based
+ dependency loading
None Dependency-based
+ dependency loading
User changing capabilities user user user and group user and group user and group user
Unprivileged user deployments yes* yes* yes yes* no no
Operating system support Linux FreeBSD
OpenBSD
NetBSD
Many UNIX-like:
Linux
macOS
FreeBSD
Solaris
Linux (+glibc) only macOS (Darwin) Windows (Cygwin)

Although we can facilitate lifecycle management from a common specification with a variety of process managers, only the most important common features are supported.

Not every concept can be done in a process manager agnostic way. For example, we cannot generically do any isolation of resources (except for packages, because we use Nix). It is difficult to generalize these concepts because these they are not standardized, e.g. the POSIX standard does not descibe namespaces and cgroups (or similar concepts).

Furthermore, most process managers (with the exception of supervisord) are operating system specific. As a result, it still matters what process manager is picked.

Related work


Process manager-agnostic deployment is not entirely a new idea. Dysnomia already has a target-agnostic 'process' plugin for quite a while, that translates a simple deployment specification (constisting of key-value pairs) to a systemd unit configuration file or sysvinit script.

The features of Dysnomia's process plugin are much more limited compared to the createManagedProcess abstraction function described in this blog post. It does not support any other than process managers than sysvint and systemd, and it can only work with foreground processes.

Furthermore, target agnostic configurations cannot be easily extended -- it is possible to (ab)use the templating mechanism, but it has no first class overridde facilities.

I also found a project called pleaserun that also has the objective to generate configuration files for a variety of process managers (my approach and pleaserunit, both support sysvinit scripts, systemd and launchd).

It seems to use template files to generate the configuration artefacts, and it does not seem to have a generic extension mechanism. Furthermore, it provides no framework to configure the location of shared resources, automatically install package dependencies or to compose multiple instances of processes.

Some remaining thoughts


Although the Nix package manager (not the NixOS distribution), should be portable amongst a variety of UNIX-like systems, it turns out that the only two operating systems that are well supported are Linux and macOS. Nix was reported to work on a variety of other UNIX-like systems in the past, but recently it seems that many things are broken.

To make Nix work on FreeBSD 12.1, I have used the latest stable Nix package manager version with patches from this repository. It turns out that there is still a patch missing to work around in a bug in FreeBSD that incorrectly kills all processes in a process group. Fortunately, when we run Nix as as unprivileged user, this bug does not seem to cause any serious problems.

Recent versions of Nixpkgs turn out to be horribly broken on FreeBSD -- the FreeBSD stdenv does not seem to work at all. I tried switching back to stdenv-native (a stdenv environment that impurely uses the host system's compiler and core executables), but that also no longer seems to work in the last three major releases -- the Nix expression evaluation breaks in several places. Due to the intense amount of changes and assumptions that the stdenv infrastructure currently makes, it was as good as impossible for me to fix the infrastructure.

As another workaround, I reverted back very to a very old version of Nixpkgs (version 17.03 to be precise), that still has a working stdenv-native environment. With some tiny adjustments (e.g. adding some shell aliases for some GNU variants of certain shell executables to stdenv-native), I have managed to get some basic Nix packages working, including Nginx on FreeBSD.

Surprisingly, running Nix on Cygwin was less painful than FreeBSD (because of all the GNUisms that Cygwin provides). Similar to FreeBSD, recent versions of Nixpkgs also appear to be broken, including the Cygwin stdenv environment. By reverting back to release-18.03 (that still has a somewhat working stdenv for Cygwin), I have managed to build a working Nginx version.

As a future improvement to Nixpkgs, I would like to propose a testing solution for stdenv-native. Although I understand that is difficult to dedicate manpower to maintain all unconventional Nix/Nixpkgs ports, stdenv-native is something that we can also convienently test on Linux and prevent from breaking in the future.

Availability


The latest version of my experimental Nix-based process framework, that includes the process manager-agnostic configuration function described in this blog post, can be obtained from my GitHub page.

In addition, the repository also contains some example cases, including the web application system described in this blog post, and a set of common system services: MySQL, Apache HTTP server, PostgreSQL and Apache Tomcat.

Monday, January 6, 2020

Writing a well-behaving daemon in the C programming language

Slightly over one month ago, I wrote a blog post about a new experimental Nix-based process management framework that I have been developing. For this framework, I need to experiment with processes that run in the foreground (i.e. they block the shell of the user that invokes it as long as it is running), and daemons -- processes that run in the background and are not directly controlled by the user.

Deamons are (still) a common practice in the UNIX world (although this is changing nowadays with process managers, such as systemd and launchd) to make system services available to end users, such as web servers, the secure shell, and FTP.

To make experimentation more convenient, I wanted to write a very simple service that can run both in the foreground and as a deamon. Initially, I thought writing a daemon would be straight forward, but this turned out to be much more difficult than I initially anticipated.

I have learned that daemonizing a process is quite simple, but writing a well-behaving deamon is quite complicated. I have been studying a number of sources on how to properly write one and none of them provided me all the information that I needed. As a result, I have decided to do some investigation myself and write a blog post about my findings.

The basics


As I have stated earlier, the basics of writing a daemon in the C programming language are simple. For example, I can write a very trivial service whose only purpose is to print: Hello on the terminal every second until it receives a terminate or interrupt signal:

#include <stdio.h>
#include <unistd.h>
#include <signal.h>

volatile int terminate = FALSE;

static void handle_termination(int signum)
{
    terminate = TRUE;
}

static void init_service(void)
{
    signal(SIGINT, handle_termination);
    signal(SIGTERM, handle_termination);
}

static void run_main_loop(void)
{
    while(!terminated)
    {
        fprintf(stderr, "Hello!\n");
        sleep(1);
    }
}

The following trivial main method allows us to let the service run in "foreground mode":

int main()
{
    init_service();
    run_main_loop();
    return 0; 
}

The above main method initializes the service (that configures the signal handlers) and invokes the main loop (as defined in the previous code example). The main loop keeps running until it receives a terminate (SIGTERM) or interrupt (SIGINT) signal that unblocks the main loop.

When we run the above program in a shell session, we should observe:

$ ./simpleservice
Hello!
Hello!
Hello!

We will see that the service prints Hello! every second until it gets terminated. Moreover, we will notice that the shell is blocked from receiving user input until we terminate the process. Furthermore, if we terminate the shell (for example by sending it a TERM signal from another shell session), the service gets terminated as well.

We can easily change the main method, shown earlier, to turn our trivial service (that runs in foreground mode) into a deamon:

#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>

int main()
{
    pid_t pid = fork();

    if(pid == -1)
    {
        fprintf(stderr, "Can't fork daemon process!\n");
        return 1;
    }
    else if(pid == 0)
        run_main_loop();

    return 0;
}

The above code forks a child process, the child process executes the main loop, and the parent process terminates immediately.

When running the above program on the terminal, we should see that the ./simpleservice command returns almost immediately and a daemon process keeps running in the background. Stopping our shell session (e.g. with the exit command or killing it by sending a TERM signal to it), does not cause the daemon process to be stopped.

This behaviour can be easily explained -- because the shell only waits for the completion of the process that it invokes (the parent process), it will no longer block indefinitely, because it terminates directly after forking the child process.

The daemon process keeps running (even if we end our shell session), because it gets orphaned from the parent and adopted by the process that runs at PID 1 -- the init system.

Writing a well-behaving daemon


The above code fragments may probably look very trivial. Is this really sufficient to create a deamon? You could probably already guess that the answer is: no.

To learn more about properly writing a daemon, I studied various sources. The first source I consulted was the Linux Daemon HOWTO, but that document turned out to be a bit outdated (to be precise: it was last updated in 2004). This document basically shows how to implement a very minimalistic version of a well-behaving daemon. It does much more than just forking a child process, for reasons that I will explain later in this blog post.

After some additional searching, I stumbled on systemd's recommendations for writing a traditional SysV daemon (this information can also be found by opening the following manual page: man 7 daemon). systemd's daemon manual page specifies even more steps. Contrary to the Linux Daemon HOWTO, it does not provide any code examples.

Despite the fact that the HOWTO implements extra requirements than just a simple fork, it still looked quite simple. Implementing all systemd recommendations, however, turned out to be much more complicated than I expected.

It also made me realize: why is all this stuff needed? None of the sources that I studied so far, explain me why all these additional steps need to be implemented.

After some thinking, I believe I understand why: a well-behaving daemon needs to be fully detached from user control, be controllable from an external program, and act safely and predictably.

In the following sections I will explain what I believe is the rationale for each step described in the systemd daemon manual page. Moreover, I will describe the means that I used to implement each requirement:

Closing all file descriptors, except the standard ones: stdin, stdout, stderr


Closing all, but the standard file descriptors, is a good practice, because the daemon process inherits all open files from the calling process (e.g. the shell session from which the daemon is invoked).

Not closing any additional open file descriptors may cause the file descriptors to remain open for an indefinite amoung of time, making it impossible to cleanly unmount the partition where these files may have been stored. Moreover, it also keeps file descriptors unnecessarily allocated.

The daemon manual page describes two strategies to implement closing these non-standard file descriptors. On Linux, it is possible to iterate over the content of the: /proc/self/fd file. A portable, but less efficient, way is to iterate from file descriptor 3 to the value returned by getrlimit for RLIMIT_NOFILE.

I ended up implementing this step with the following function:

#include <sys/time.h>
#include <sys/resource.h>

static int close_non_standard_file_descriptors(void)
{
    unsigned int i;

    struct rlimit rlim;
    int num_of_fds = getrlimit(RLIMIT_NOFILE, &rlim);

    if(num_of_fds == -1)
        return FALSE;

    for(i = 3; i < num_of_fds; i++)
        close(i);

    return TRUE;
}

Resetting all signal handlers to their defaults


Similar to file descriptors, the daemon process also inherits the signal handler configuration of the caller process. If signal handlers have been altered, then the daemon process may behave in a non-standard and unpredictable way.

For example, the TERM signal handler could have been overridden so that the daemon no longer cleanly shuts down when it receives a TERM signal. As a countermeasure, the signal handlers must be reset to their default behaviour.

The systemd daemon manual page suggests to iterate over all signals up to the limit of _NSIG and resetting them to SIG_DFL.

I did some investigation and it seems that this method is not standardized by e.g. POSIX -- _NSIG is a constant that glibc defines and it is not a guarantee that other libc implementations will provide the same constant.

I ended up implementing the following function:

#include <signal.h>

static int reset_signal_handlers_to_default(void)
{
#if defined _NSIG
    unsigned int i;

    for(i = 1; i < _NSIG; i++)
    {
         if(i != SIGKILL && i != SIGSTOP)
             signal(i, SIG_DFL);
    }
#endif
    return TRUE;
}

The above implementation iterates from the first signal handler, up until the maximum signal handler. It will ignore SIGKILL and SIGSTOP because they cannot be overridden.

Unfortunately, this implementation will not work with libc implementations that lack the _NSIG constant. I am really curious if somebody could suggest me a standards compliant way to reset all signal handlers.

Resetting the signal mask


It is also possible to completely block certain signals by adjusting the signal mask. The signal mask also gets inherited by the daemon from the calling process. To make a daemon act predicatably, e.g. it should do a proper shutdown when it receives the TERM signal, it would be a good thing to reset the signal mask to the default configuration.

I ended up implementing this requirement with the following function:

static int clear_signal_mask(void)
{
    sigset_t set;

    return((sigemptyset(&set) == 0)
      && (sigprocmask(SIG_SETMASK, &set, NULL) == 0));
}

Sanitizing the environment block


Another property that a daemon process inherits from the caller are the environment variables. Some environment variables might negatively affect the behaviour of the daemon. Furthermore, environment variables may also contain privacy-sensitive information that could get exposed if the security of a deamon gets compromised.

As a counter-measure, it would be good to sanitize the environment block. For example, by removing environment variables with clearenv() or using a white listing approach.

For my trivial example case, I did not need to sanitize the environment block because no environment variables are used.

Forking a background process


After closing all non-standard file descriptors, and resetting the signal handlers to their default behaviour, we can fork a background process. The primary reason to fork a background process, as explained earlier, is to get it orphaned from the parent so that it gets adopted by PID 1, the init system, and stays in the background.

We must actually fork twice, as I will explain later. First, I will fork a child process that I will call a helper process. The helper process will do some more housekeeping work and forks another child process, that will become our daemon process.

Detaching from the terminal


The child process is still attached to the terminal of the caller process, and can still read input from the terminal and send output to the terminal. To completely detach it from the terminal (and any user interaction), we must adjust the session ID:

if(setsid() == -1)
{
    /* Do some error handling */
}

and then we must fork again, so that the daemon can never re-acquire a terminal again. The second fork will create the real daemon process. The helper process should terminate so that the newly created daemon process gets adopted by the init system (that runs on PID 1):

if(fork_daemon_process(pipefd[1],
  pid_file,
  data,
  initialize_daemon,
  run_main_loop) == -1)
{
    /* Do some error handling */
}

/*
 * Exit the helper process,
 * so that the daemon process gets adopted by PID 1
 */
exit(0);

Connecting /dev/null to standard input, output and error in the daemon process


Since we have detached from the terminal, we should connect /dev/null to the standard file descriptors in the daemon process, because these file descriptors are still connected to the terminal from which we have detached.

I implemented this requirement with the following function:

static int attach_standard_file_descriptors_to_null(void)
{
    int null_fd_read, null_fd_write;

    return(((null_fd_read = open(NULL_DEV_FILE, O_RDONLY)) != -1)
      && (dup2(null_fd_read, STDIN_FILENO) != -1)
      && ((null_fd_write = open(NULL_DEV_FILE, O_WRONLY)) != -1)
      && (dup2(null_fd_write, STDOUT_FILENO) != -1)
      && (dup2(null_fd_write, STDERR_FILENO) != -1));
}

Resetting the umask to 0 in the daemon process


The umask (a setting that globally alters file permissions of newly created files) may have been adjusted by the calling process, causing directories and files created by the daemon to have unpredictable file permissions.

As a countermeasure, we should reset the umask to 0 with the following function call:

umask(0);

Changing current working directory to / in the daemon process


The daemon process also inherits the current working directory of the caller process. It may happen that the current working directory refers to an external drive or partition. As a result, it can no longer be cleanly unmounted while the daemon is running.

To prevent this from happening, we should change the current working directory to the root folder, because that is the only partition that is guaranteed to stay mounted while the system is running:

if(chdir("/") == -1)
{
    /* Do some error handling */
}

Creating a PID file in the daemon process


Because a program that daemonizes forks another process, and terminates immediately, there is no way for the caller (e.g. the shell) to know what the process ID (PID) of the daemon process is. The caller can only know the PID of the parent process, that terminates right after setting up the daemon.

A common practice to know the PID of the daemon process is to write a PID file that contains its process ID (PID). A PID file can be used to reliably terminate service, when it is no longer needed.

According to the systemd recommendations, a PID file must be created in a race free fashion, e.g. when a daemon has been started already it should not attempt to create another PID file with the same name.

I ended up implementing this requirement as follows:

static int create_pid_file(const char *pid_file)
{
    pid_t my_pid = getpid();
    char my_pid_str[10];
    int fd;

    sprintf(my_pid_str, "%d", my_pid);

    if((fd = open(pid_file, O_CREAT | O_EXCL | O_WRONLY, S_IRUSR | S_IWUSR)) == -1)
        return FALSE;

    if(write(fd, my_pid_str, strlen(my_pid_str)) == -1)
        return FALSE;

    close(fd);

    return TRUE;
}

In the above implementation, the O_EXCL flag makes sure that the a previously generated PID file cannot already exist or belong to another process. If a PID file happens to exist already, the initialization of the daemon fails.

Dropping privileges in the daemon process, if applicable


Since daemons are typically long running, and they are typically started by the super user (root), they are also typically a security risk. By default, if a process is started as root, the daemon process also has root privileges and full access to the entire filesystem, if its security gets compromised.

For this reason, it is typically a good idea to drop privileges in the daemon process. There are a variety of restrictions you can impose, such as changing the ownership of the process to an unprivileged user:

if(setgid(100) == 0 && setuid(1000) == 0)
{
    /* Execute some code with restrictive user permissions */
    ...
}
else
{
    fprintf(stderr, "Cannot change user permissions!\n");
    exit(1);
}

In my trivial example case, I had no such requirement.

Notifying the parent process when the initialization of the daemon is complete


Another practical problem you may run into with daemons is that you do not know (for sure) when they are ready to be used. Because the parent process terminates immediately and delegates most of the work, including the initialization steps, to the daemon process (that runs in the background), you may already attempt to use it before the initialization is done. If you rely on a network connection, then it may happen that right after starting the daemon, the network link does not work.

Furthermore, there is no way to know for sure how long it would take before all the daemon's services become available. This particularly inconvenient for scripting.

For me personally, notification was the most complicated requirement to implement.

systemd's daemon manual page suggests to use an unnamed pipe. I ended up with an implementation that looks as follows:

Before doing any forking, I will create a pipe, and pass the corresponding file descriptors the to utility function that creates the helper process, as described earlier:

int pipefd[2];

if(pipe(pipefd) == -1)
    return STATUS_CANNOT_CREATE_PIPE;
else
{
    if(fork_helper_process(pipefd, pid_file, data, initialize_daemon, run_main_loop) == -1)
        return STATUS_CANNOT_FORK_HELPER_PROCESS;
    else
    {
         /* Wait for notification from the parent */
    }
}

The helper and daemon process will use the write end of the pipe to send notification messages. I ended up using it as follows:

static pid_t fork_helper_process(int pipefd[2],
  const char *pid_file,
  void *data,
  int (*initialize_daemon) (void *data),
  int (*run_main_loop) (void *data))
{
    pid_t pid = fork();

    if(pid == 0)
    {
        close(pipefd[0]); /* Close unneeded read-end */

        if(setsid() == -1)
        {
            notify_parent_process(pipefd[1], STATUS_CANNOT_SET_SID);
            exit(STATUS_CANNOT_SET_SID);
        }

        /* Fork again, so that the terminal can not be acquired again */
        if(fork_daemon_process(pipefd[1], pid_file, data, initialize_daemon, run_main_loop) == -1)
        {
            notify_parent_process(pipefd[1], STATUS_CANNOT_FORK_DAEMON_PROCESS);
            exit(STATUS_CANNOT_FORK_DAEMON_PROCESS);
        }

        exit(0); /* Exit the helper process, so that the daemon process gets adopted by PID 1 */
    }

    return pid;
}

If something fails or the entire initialization process finishes successfully completes, the helper and daemon processes invoke the notify_parent_process() function to send a message over the write end of the pipe to notify the parent. In case of an error, the helper or daemon process also terminates with the same exit status.

I implemented the notification function as follows:

static void notify_parent_process(int writefd, DaemonStatus message)
{
    char byte = (char)message;
    while(write(writefd, &byte, 1) == 0);
    close(writefd);
}

The above function simply sends a message (of only one byte in size) over the pipe and then closes the connection. The possible messages are encoded in the following enumeration:

typedef enum
{
    STATUS_INIT_SUCCESS                  = 0x0,
    STATUS_CANNOT_ATTACH_STD_FDS_TO_NULL = 0x1,
    STATUS_CANNOT_CHDIR                  = 0x2,
    ...
    STATUS_CANNOT_SET_SID                = 0xc,
    STATUS_CANNOT_FORK_DAEMON_PROCESS    = 0xd,
    STATUS_UNKNOWN_DAEMON_ERROR          = 0xe
}
DaemonStatus;

The parent process will not terminate immediately, but waits for a notification message from the helper or daemon processes:

DaemonStatus exit_status;

close(pipefd[1]); /* Close unneeded write end */
exit_status = wait_for_notification_message(pipefd[0]);
close(pipefd[0]);
return exit_status;

When the parent receives a notification message, it will simply propagate the value as an exit status (which is 0 if everything succeeds, and non-zero when the process fails somewhere). The non-zero exit status corresponds to a value in the enumeration (shown earlier) allowing us to trace the origins of the error.

The function that waits for the the notification of the daemon process is implemented as follows:

static DaemonStatus wait_for_notification_message(int readfd)
{
    char buf[BUFFER_SIZE];
    ssize_t bytes_read = read(readfd, buf, 1);

    if(bytes_read == -1)
        return STATUS_CANNOT_READ_FROM_PIPE;
    else if(bytes_read == 0)
        return STATUS_UNKNOWN_DAEMON_ERROR;
    else
        return buf[0];
}

The above method reads from the pipe, will block as long as no data was sent and the write end of the pipe was not closed, and returns the byte that it has received.

Exiting the parent process after the daemon initialization is done


This requirement overlaps with the previous requirement and can be met by calling exit() after the notification message was sent (and/or the write end of the pipe was closed).

Discussion


I really did not expect that writing a well-behaving deamon (that follows systemd's recommendations) would be so difficult. I ended up writing 206 LOC to implement all the functionality listed above. Maybe I could reduce this amount a bit with some clever programming tricks, but my objective was to keep the code clear, have it decomposed into functions and make it understandable.

There are solutions that alleviate the burden of creating a daemon. A prominent example would be BSD's daemon() function (that is also included with glibc). It is a single function call that can be used to automatically daemonize a process. Unfortunately, it does not seem to meet all requirements that systemd specifies.

I also looked at many Stackoverflow posts, and although they correctly cite the systemd's daemon manual page with requirements for a well-behaving daemon, none of the solutions that I could find fully meet all requirements -- in particular, I could not find any good examples that implement a protocol that notify the parent process when the daemon process was successfully initialized.

Because none of these Stackoverflow posts provide what I need, I have decided to not use any of these articles as an example, but start from scratch and look for all relevant pieces myself.

One aspect that still puzzles me is how to "properly" iterate over all signal handlers. The solution hinted by systemd is non-standard requiring a glibc specific constant. Some sources say that there is no standardized equivalent, so I am still curious whether there is a recipe that can reset all signal handlers to their default behaviour in a standard compliant way.

In the introduction section, I mentioned that daemons are still a common practice in UNIX-like systems, such as Linux, but that this is changing. IMO this is for a good reason -- services typically need to reimplement the same kind of functionality over and over again. Furthermore, I have noticed that not all daemons meet all requirements and could behave incorrectly. For example, it is not a guarantee that a deamon correctly writes a PID file with the PID of the daemon process.

For these reasons, systemd's daemon manual page also describes "new style daemons", that are considerably easier to implement with less boilerplate code. Apple has similar recommendations for launchd.

With "new style daemons", processes just spawn in foreground mode, and the process manager (e.g. systemd, launchd or supervisord) takes care of all "housekeeping tasks" -- the process manager makes sure that it runs in the background, drops user privileges etc.

Furthermore, because the process manager directly invokes the daemon process (and as a result knows its PID), controlling a daemon is also less fragile -- the requirement that a PID file needs to be properly created is also dropped.

Availability


The daemonize infrastructure described in this blog post is used by the example webapp that can be found in my experimental Nix process framework repository.