Sander van der Burg's blog: Disnix

Showing posts with label Disnix. Show all posts

Friday, March 12, 2021

Using the Nix process management framework as an infrastructure deployment solution for Disnix

As explained in many previous blog posts, I have developed Disnix as a solution for automating the deployment of service-oriented systems -- it deploys heterogeneous systems, that consist of many different kinds of components (such as web applications, web services, databases and processes) to networks of machines.

The deployment models for Disnix are typically not fully self-contained. Foremost, a precondition that must be met before a service-oriented system can be deployed, is that all target machines in the network require the presence of Nix package manager, Disnix, and a remote connectivity service (e.g. SSH).

For multi-user Disnix installations, in which the user does not have super-user privileges, the Disnix service is required to carry out deployment operations on behalf of a user.

Moreover, the services in the services model typically need to be managed by other services, called containers in Disnix terminology (not to be confused with Linux containers).

Examples of container services are:

The MySQL DBMS container can manage multiple databases deployed by Disnix.
The Apache Tomcat servlet container can manage multiple Java web applications deployed by Disnix.
systemd can act as a container that manages multiple systemd units deployed by Disnix.

Managing the life-cycles of services in containers (such as activating or deactivating them) is done by a companion tool called Dysnomia.

In addition to Disnix, these container services also typically need to be deployed in advance to the target machines in the network.

The problem domain that Disnix works in is called service deployment, whereas the deployment of machines (bare metal or virtual machines) and the container services is called infrastructure deployment.

Disnix can be complemented with a variety of infrastructure deployment solutions:

NixOps can deploy networks of NixOS machines, both physical and virtual machines (in the cloud), such as Amazon EC2.

As part of a NixOS configuration, the Disnix service can be deployed that facilitates multi-user installations. The Dysnomia NixOS module can expose all relevant container services installed by NixOS as container deployment targets.
disnixos-deploy-network is a tool that is included with the DisnixOS extension toolset. Since services in Disnix can be any kind of deployment unit, it is also possible to deploy an entire NixOS configuration as a service. This tool is mostly developed for demonstration purposes.

A limitation of this tool is that it cannot instantiate virtual machines and bootstrap Disnix.
Disnix itself. The above solutions are all NixOS-based, a software distribution that is Linux-based and fully managed by the Nix package manager.

Although NixOS is very powerful, it has two drawbacks for Disnix:
- NixOS uses the NixOS module system for configuring system aspects. It is very powerful but you can only deploy one instance of a system service -- Disnix can also work with multiple container instances of the same type on a machine.
- Services in NixOS cannot be deployed to other kinds software distributions: conventional Linux distributions, and other operating systems, such as macOS and FreeBSD.
To overcome these limitations, Disnix can also be used as a container deployment solution on any operating system that is capable of running Nix and Disnix. Services deployed by Disnix can automatically be exposed as container providers.

Similar to disnix-deploy-network, a limitation of this approach is that it cannot be used to bootstrap Disnix.

Last year, I have also added a new major feature to Disnix making it possible to deploy both application and container services in the same Disnix deployment models, minimizing the infrastructure deployment problem -- the only requirement is to have machines with Nix, Disnix, and a remote connectivity service (such as SSH) pre-installed on them.

Although this integrated feature is quite convenient, in particular for test setups, a separated infrastructure deployment process (that includes container services) still makes sense in many scenarios:

The infrastructure parts and service parts can be managed by different people with different specializations. For example, configuring and tuning an application server is a different responsibility than developing a Java web application.
The service parts typically change more frequently than the infrastructure parts. As a result, they typically have different kinds of update cycles.
The infrastructure components can typically be reused between projects (e.g. many systems use a database backend such as PostgreSQL or MySQL), whereas the service components are typically very project specific.

I also realized that my other project: the Nix process management framework can serve as a partial infrastructure deployment solution -- it can be used to bootstrap Disnix and deploy container services.

Moreover, it can also deploy multiple instances of container services and used on any operating system that the Nix process management framework supports, including conventional Linux distributions and other operating systems, such as macOS and FreeBSD.

Deploying and exposing the Disnix service with the Nix process management framework

As explained earlier, to allow Disnix to deploy services to a remote machine, a machine needs to have Disnix installed (and run the Disnix service for a multi-user installation), and be remotely connectible, e.g. through SSH.

I have packaged all required services as constructor functions for the Nix process management framework.

The following process model captures the configuration of a basic multi-user Disnix installation:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  ids = if builtins.pathExists ./ids-bare.nix then (import ./ids-bare.nix).ids else {};

  constructors = import ../../services-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };
in
rec {
  sshd = {
    pkg = constructors.sshd {
      extraSSHDConfig = ''
        UsePAM yes
      '';
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  dbus-daemon = {
    pkg = constructors.dbus-daemon {
      services = [ disnix-service ];
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  disnix-service = {
    pkg = constructors.disnix-service {
      inherit dbus-daemon;
    };

    requiresUniqueIdsFor = [ "gids" ];
  };
}

The above processes model (processes.nix) captures three process instances:

sshd is the OpenSSH server that makes it possible to remotely connect to the machine by using the SSH protocol.
dbus-daemon runs a D-Bus system daemon, that is a requirement for the Disnix service. The disnix-service is propagated as a parameter, so that its service directory gets added to the D-Bus system daemon configuration.
disnix-service is a service that executes deployment operations on behalf of an authorized unprivileged user. The disnix-service has a dependency on the dbus-service making sure that the latter gets activated first.

We can deploy the above configuration on a machine that has the Nix process management framework already installed.

For example, to deploy the configuration on a machine that uses supervisord, we can run:

$ nixproc-supervisord-switch processes.nix

Resulting in a system that consists of the following running processes:

$ supervisorctl 
dbus-daemon                      RUNNING   pid 2374, uptime 0:00:34
disnix-service                   RUNNING   pid 2397, uptime 0:00:33
sshd                             RUNNING   pid 2375, uptime 0:00:34

As may be noticed, the above supervised services correspond to the processes in the processes model.

On the coordinator machine, we can write a bootstrap infrastructure model (infra-bootstrap.nix) that only contains connectivity settings:

{
  test1.properties.hostname = "192.168.2.1";
}

and use the bootstrap model to capture the full infrastructure model of the system:

$ disnix-capture-infra infra-bootstrap.nix

resulting in the following configuration:

{
  "test1" = {
    properties = {
      "hostname" = "192.168.2.1";
      "system" = "x86_64-linux";
    };
    containers = {
      echo = {
      };
      fileset = {
      };
      process = {
      };
      supervisord-program = {
        "supervisordTargetDir" = "/etc/supervisor/conf.d";
      };
      wrapper = {
      };
    };
    "system" = "x86_64-linux";
  };
}

Despite the fact that we have not configured any containers explicitly, the above configuration (infrastructure.nix) already exposes a number of container services:

The echo, fileset and process container services are built-in container providers that any Dysnomia installation includes.

The process container can be used to automatically deploy services that daemonize. Services that daemonize themselves do not require the presence of any external service.
The supervisord-program container refers to the process supervisor that manages the services deployed by the Nix process management framework. It can also be used as a container for processes deployed by Disnix.

With the above infrastructure model, we can deploy any system that depends on the above container services, such as the trivial Disnix proxy example:

{ system, distribution, invDistribution, pkgs
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager ? "supervisord"
, nix-processmgmt ? ../../../nix-processmgmt
}:

let
  customPkgs = import ../top-level/all-packages.nix {
    inherit system pkgs stateDir logDir runtimeDir tmpDir forceDisableUserChange processManager nix-processmgmt;
  };

  ids = if builtins.pathExists ./ids.nix then (import ./ids.nix).ids else {};

  processType = import "${nix-processmgmt}/nixproc/derive-dysnomia-process-type.nix" {
    inherit processManager;
  };
in
rec {
  hello_world_server = rec {
    name = "hello_world_server";
    port = ids.ports.hello_world_server or 0;
    pkg = customPkgs.hello_world_server { inherit port; };
    type = processType;
    requiresUniqueIdsFor = [ "ports" ];
  };

  hello_world_client = {
    name = "hello_world_client";
    pkg = customPkgs.hello_world_client;
    dependsOn = {
      inherit hello_world_server;
    };
    type = "package";
  };
}

The services model shown above (services.nix) captures two services:

The hello_world_server service is a simple service that listens on a TCP port for a "hello" message and responds with a "Hello world!" message.
The hello_world_client service is a package providing a client executable that automatically connects to the hello_world_server.

With the following distribution model (distribution.nix), we can map all the services to our deployment machine (that runs the Disnix service managed by the Nix process management framework):

{infrastructure}:

{
  hello_world_client = [ infrastructure.test1 ];
  hello_world_server = [ infrastructure.test1 ];
}

and deploy the system by running the following command:

$ disnix-env -s services-without-proxy.nix \
  -i infrastructure.nix \
  -d distribution.nix \
  --extra-params '{ processManager = "supervisord"; }'

The last parameter: --extra-params configures the services model (that indirectly invokes the createManagedProcess abstraction function from the Nix process management framework) in such a way that supervisord configuration files are generated.

(As a sidenote: without the --extra-params parameter, the process instances will be built for the disnix process manager generating configuration files that can be deployed to the process container, expecting programs to daemonize on their own and leave a PID file behind with the daemon's process ID. Although this approach is convenient for experiments, because no external service is required, it is not as reliable as managing supervised processes).

The result of the above deployment operation is that the hello-world-service service is deployed as a service that is also managed by supervisord:

$ supervisorctl 
dbus-daemon                      RUNNING   pid 2374, uptime 0:09:39
disnix-service                   RUNNING   pid 2397, uptime 0:09:38
hello-world-server               RUNNING   pid 2574, uptime 0:00:06
sshd                             RUNNING   pid 2375, uptime 0:09:39

and we can use the hello-world-client executable on the target machine to connect to the service:

$ /nix/var/nix/profiles/disnix/default/bin/hello-world-client 
Trying 192.168.2.1...
Connected to 192.168.2.1.
Escape character is '^]'.
hello
Hello world!

Deploying container providers and exposing them

With Disnix, it is also possible to deploy systems that are composed of different kinds of components, such as web services and databases.

For example, the Java variant of the ridiculous Staff Tracker example consists of the following services:

The services in the diagram above have the following purpose:

The StaffTracker service is the front-end web application that shows an overview of staff members and their locations.
The StaffService service is web service with a SOAP interface that provides read and write access to the staff records. The staff records are stored in the staff database.
The RoomService service provides read access to the rooms records, that are stored in a separate rooms database.
The ZipcodeService service provides read access to zip codes, that are stored in a separate zipcodes database.
The GeolocationService infers the location of a staff member from its IP address using the GeoIP service.

To deploy the system shown above, we need a target machine that provides Apache Tomcat (for managing the web application front-end and web services) and MySQL (for managing the databases) as container provider services:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  ids = if builtins.pathExists ./ids-tomcat-mysql.nix then (import ./ids-tomcat-mysql.nix).ids else {};

  constructors = import ../../services-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };

  containerProviderConstructors = import ../../service-containers-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };
in
rec {
  sshd = {
    pkg = constructors.sshd {
      extraSSHDConfig = ''
        UsePAM yes
      '';
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  dbus-daemon = {
    pkg = constructors.dbus-daemon {
      services = [ disnix-service ];
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  tomcat = containerProviderConstructors.simpleAppservingTomcat {
    commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
    webapps = [
      pkgs.tomcat9.webapps # Include the Tomcat example and management applications
    ];

    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  mysql = containerProviderConstructors.mysql {
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  disnix-service = {
    pkg = constructors.disnix-service {
      inherit dbus-daemon;
      containerProviders = [ tomcat mysql ];
    };

    requiresUniqueIdsFor = [ "gids" ];
  };
}

The process model above is an extension of the previous processes model, adding two container provider services:

tomcat is the Apache Tomcat server. The constructor function: simpleAppServingTomcat composes a configuration for a supported process manager, such as supervisord.

Moreover, it bundles a Dysnomia container configuration file, and a Dysnomia module: tomcat-webapplication that can be used to manage the life-cycles of Java web applications embedded in the servlet container.
mysql is the MySQL DBMS server. The constructor function also creates a process manager configuration file, and bundles a Dysnomia container configuration file and module that manages the life-cycles of databases.
The container services above are propagated as containerProviders to the disnix-service. This function parameter is used to update the search paths for container configuration and modules, so that services can be deployed to these containers by Disnix.

After deploying the above processes model, we should see the following infrastructure model after capturing it:

$ disnix-capture-infra infra-bootstrap.nix
{
  "test1" = {
    properties = {
      "hostname" = "192.168.2.1";
      "system" = "x86_64-linux";
    };
    containers = {
      echo = {
      };
      fileset = {
      };
      process = {
      };
      supervisord-program = {
        "supervisordTargetDir" = "/etc/supervisor/conf.d";
      };
      wrapper = {
      };
      tomcat-webapplication = {
        "tomcatPort" = "8080";
        "catalinaBaseDir" = "/var/tomcat";
      };
      mysql-database = {
        "mysqlPort" = "3306";
        "mysqlUsername" = "root";
        "mysqlPassword" = "";
        "mysqlSocket" = "/var/run/mysqld/mysqld.sock";
      };
    };
    "system" = "x86_64-linux";
  };
}

As may be observed, the tomcat-webapplication and mysql-database containers (with their relevant configuration properties) were added to the infrastructure model.

With the following command we can deploy the example system's services to the containers in the network:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

resulting in a fully functional system:

Deploying multiple container provider instances

As explained in the introduction, a limitation of the NixOS module system is that it is only possible to construct one instance of a service on a machine.

Process instances in a processes model deployed by the Nix process management framework as well as services in a Disnix services model are instantiated from functions that make it possible to deploy multiple instances of the same service to the same machine, by making conflicting properties configurable.

The following processes model was modified from the previous example to deploy two MySQL servers and two Apache Tomcat servers to the same machine:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  ids = if builtins.pathExists ./ids-tomcat-mysql-multi-instance.nix then (import ./ids-tomcat-mysql-multi-instance.nix).ids else {};

  constructors = import ../../services-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };

  containerProviderConstructors = import ../../service-containers-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };
in
rec {
  sshd = {
    pkg = constructors.sshd {
      extraSSHDConfig = ''
        UsePAM yes
      '';
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  dbus-daemon = {
    pkg = constructors.dbus-daemon {
      services = [ disnix-service ];
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  tomcat-primary = containerProviderConstructors.simpleAppservingTomcat {
    instanceSuffix = "-primary";
    httpPort = 8080;
    httpsPort = 8443;
    serverPort = 8005;
    ajpPort = 8009;
    commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
    webapps = [
      pkgs.tomcat9.webapps # Include the Tomcat example and management applications
    ];
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  tomcat-secondary = containerProviderConstructors.simpleAppservingTomcat {
    instanceSuffix = "-secondary";
    httpPort = 8081;
    httpsPort = 8444;
    serverPort = 8006;
    ajpPort = 8010;
    commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
    webapps = [
      pkgs.tomcat9.webapps # Include the Tomcat example and management applications
    ];
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  mysql-primary = containerProviderConstructors.mysql {
    instanceSuffix = "-primary";
    port = 3306;
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  mysql-secondary = containerProviderConstructors.mysql {
    instanceSuffix = "-secondary";
    port = 3307;
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  disnix-service = {
    pkg = constructors.disnix-service {
      inherit dbus-daemon;
      containerProviders = [ tomcat-primary tomcat-secondary mysql-primary mysql-secondary ];
    };

    requiresUniqueIdsFor = [ "gids" ];
  };
}

In the above processes model, we made the following changes:

We have configured two Apache Tomcat instances: tomcat-primary and tomcat-secondary. Both instances can co-exist because they have been configured in such a way that they listen to unique TCP ports and have a unique instance name composed from the instanceSuffix.
We have configured two MySQL instances: mysql-primary and mysql-secondary. Similar to Apache Tomcat, they can both co-exist because they listen to unique TCP ports (e.g. 3306 and 3307) and have a unique instance name.
Both the primary and secondary instances of the above services are propagated to the disnix-service (with the containerProviders parameter) making it possible for a client to discover them.

After deploying the above processes model, we can run the following command to discover the machine's configuration:

$ disnix-capture-infra infra-bootstrap.nix
{
  "test1" = {
    properties = {
      "hostname" = "192.168.2.1";
      "system" = "x86_64-linux";
    };
    containers = {
      echo = {
      };
      fileset = {
      };
      process = {
      };
      supervisord-program = {
        "supervisordTargetDir" = "/etc/supervisor/conf.d";
      };
      wrapper = {
      };
      tomcat-webapplication-primary = {
        "tomcatPort" = "8080";
        "catalinaBaseDir" = "/var/tomcat-primary";
      };
      tomcat-webapplication-secondary = {
        "tomcatPort" = "8081";
        "catalinaBaseDir" = "/var/tomcat-secondary";
      };
      mysql-database-primary = {
        "mysqlPort" = "3306";
        "mysqlUsername" = "root";
        "mysqlPassword" = "";
        "mysqlSocket" = "/var/run/mysqld-primary/mysqld.sock";
      };
      mysql-database-secondary = {
        "mysqlPort" = "3307";
        "mysqlUsername" = "root";
        "mysqlPassword" = "";
        "mysqlSocket" = "/var/run/mysqld-secondary/mysqld.sock";
      };
    };
    "system" = "x86_64-linux";
  };
}

As may be observed, the infrastructure model contains two Apache Tomcat instances and two MySQL instances.

With the following distribution model (distribution.nix), we can divide each database and web application over the two container instances:

{infrastructure}:

{
  GeolocationService = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-primary";
      }
    ];
  };
  RoomService = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-secondary";
      }
    ];
  };
  StaffService = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-primary";
      }
    ];
  };
  StaffTracker = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-secondary";
      }
    ];
  };
  ZipcodeService = {
    targets = [
      { target = infrastructure.test1;
        container = "tomcat-webapplication-primary";
      }
    ];
  };
  rooms = {
    targets = [
      { target = infrastructure.test1;
        container = "mysql-database-primary";
      }
    ];
  };
  staff = {
    targets = [
      { target = infrastructure.test1;
        container = "mysql-database-secondary";
      }
    ];
  };
  zipcodes = {
    targets = [
      { target = infrastructure.test1;
        container = "mysql-database-primary";
      }
    ];
  };
}

Compared to the previous distribution model, the above model uses a more verbose notation for mapping services.

As explained in an earlier blog post, in deployments in which only a single container is deployed, services are automapped to the container that has the same name as the service's type. When multiple instances exist, we need to manually specify the container where the service needs to be deployed to.

After deploying the system with the following command:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

we will get a running system with the following deployment architecture:

Using the Disnix web service for executing remote deployment operations

By default, Disnix uses SSH to communicate to target machines in the network. Disnix has a modular architecture and is also capable of communicating to target machines by other means, for example via NixOps, the backdoor client, D-Bus, and directly executing tasks on a local machine.

There is also an external package: DisnixWebService that remotely exposes all deployment operations from a web service with a SOAP API.

To use the DisnixWebService, we must deploy a Java servlet container (such as Apache Tomcat) with the DisnixWebService application, configured in such a way that it can connect to the disnix-service over the D-Bus system bus.

The following processes model is an extension of the non-multi containers Staff Tracker example, with an Apache Tomcat service that bundles the DisnixWebService:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, spoolDir ? "${stateDir}/spool"
, cacheDir ? "${stateDir}/cache"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  ids = if builtins.pathExists ./ids-tomcat-mysql.nix then (import ./ids-tomcat-mysql.nix).ids else {};

  constructors = import ../../services-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };

  containerProviderConstructors = import ../../service-containers-agnostic/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir cacheDir spoolDir forceDisableUserChange processManager ids;
  };
in
rec {
  sshd = {
    pkg = constructors.sshd {
      extraSSHDConfig = ''
        UsePAM yes
      '';
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  dbus-daemon = {
    pkg = constructors.dbus-daemon {
      services = [ disnix-service ];
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  tomcat = containerProviderConstructors.disnixAppservingTomcat {
    commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
    webapps = [
      pkgs.tomcat9.webapps # Include the Tomcat example and management applications
    ];
    enableAJP = true;
    inherit dbus-daemon;

    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  apache = {
    pkg = constructors.basicAuthReverseProxyApache {
      dependency = tomcat;
      serverAdmin = "admin@localhost";
      targetProtocol = "ajp";
      portPropertyName = "ajpPort";

      authName = "DisnixWebService";
      authUserFile = pkgs.stdenv.mkDerivation {
        name = "htpasswd";
        buildInputs = [ pkgs.apacheHttpd ];
        buildCommand = ''
          htpasswd -cb ./htpasswd admin secret
          mv htpasswd $out
        '';
      };
      requireUser = "admin";
    };

    requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  mysql = containerProviderConstructors.mysql {
    properties.requiresUniqueIdsFor = [ "uids" "gids" ];
  };

  disnix-service = {
    pkg = constructors.disnix-service {
      inherit dbus-daemon;
      containerProviders = [ tomcat mysql ];
      authorizedUsers = [ tomcat.name ];
      dysnomiaProperties = {
        targetEPR = "http://$(hostname)/DisnixWebService/services/DisnixWebService";
      };
    };

    requiresUniqueIdsFor = [ "gids" ];
  };
}

The above processes model contains the following changes:

The Apache Tomcat process instance is constructed with the containerProviderConstructors.disnixAppservingTomcat constructor function automatically deploying the DisnixWebService and providing the required configuration settings so that it can communicate with the disnix-service over the D-Bus system bus.

Because the DisnixWebService requires the presence of the D-Bus system daemon, it is configured as a dependency for Apache Tomcat ensuring that it is started before Apache Tomcat.
Connecting to the Apache Tomcat server including the DisnixWebService requires no authentication. To secure the web applications and the DisnixWebService, I have configured an apache reverse proxy that forwards connections to Apache Tomcat using the AJP protocol.

Moreover, the reverse proxy protects incoming requests by using HTTP basic authentication requiring a username and password.

We can use the following bootstrap infrastructure model to discover the machine's configuration:

{
  test1.properties.targetEPR = "http://192.168.2.1/DisnixWebService/services/DisnixWebService";
}

The difference between this bootstrap infrastructure model and the previous is that it uses a different connection property (targetEPR) that refers to the URL of the DisnixWebService.

By default, Disnix uses the disnix-ssh-client to communicate to target machines. To use a different client, we must set the following environment variables:

$ export DISNIX_CLIENT_INTERFACE=disnix-soap-client
$ export DISNIX_TARGET_PROPERTY=targetEPR

The above environment variables instruct Disnix to use the disnix-soap-client executable and the targetEPR property from the infrastructure model as a connection string.

To authenticate ourselves, we must set the following environment variables with a username and password:

$ export DISNIX_SOAP_CLIENT_USERNAME=admin
$ export DISNIX_SOAP_CLIENT_PASSWORD=secret

The following command makes it possible to discover the machine's configuration using the disnix-soap-client and DisnixWebService:

$ disnix-capture-infra infra-bootstrap.nix
{
  "test1" = {
    properties = {
      "hostname" = "192.168.2.1";
      "system" = "x86_64-linux";
      "targetEPR" = "http://192.168.2.1/DisnixWebService/services/DisnixWebService";
    };
    containers = {
      echo = {
      };
      fileset = {
      };
      process = {
      };
      supervisord-program = {
        "supervisordTargetDir" = "/etc/supervisor/conf.d";
      };
      wrapper = {
      };
      tomcat-webapplication = {
        "tomcatPort" = "8080";
        "catalinaBaseDir" = "/var/tomcat";
        "ajpPort" = "8009";
      };
      mysql-database = {
        "mysqlPort" = "3306";
        "mysqlUsername" = "root";
        "mysqlPassword" = "";
        "mysqlSocket" = "/var/run/mysqld/mysqld.sock";
      };
    };
    "system" = "x86_64-linux";
  }
  ;
}

After capturing the full infrastructure model, we can deploy the system with disnix-env if desired, using the disnix-soap-client to carry out all necessary remote deployment operations.

Miscellaneous: using Docker containers as light-weight virtual machines

As explained earlier in this blog post, the Nix process management framework is only a partial infrastructure deployment solution -- you still need to somehow obtain physical or virtual machines with a software distribution running the Nix package manager.

In a blog post written some time ago, I have explained that Docker containers are not virtual machines or even light-weight virtual machines.

In my previous blog post, I have shown that we can also deploy mutable Docker multi-process containers in which process instances can be upgraded without stopping the container.

The deployment workflow for upgrading mutable containers, is very machine-like -- NixOS has a similar workflow that consists of updating the machine configuration (/etc/nixos/configuration.nix) and running a single command-line instruction to upgrade machine (nixos-rebuild switch).

We can actually start using containers as VMs by adding another ingredient in the mix -- we can also assign static IP addresses to Docker containers.

With the following Nix expression, we can create a Docker image for a mutable container, using any of the processes models shown previously as the "machine's configuration":

let
  pkgs = import <nixpkgs> {};

  createMutableMultiProcessImage = import ../nix-processmgmt/nixproc/create-image-from-steps/create-mutable-multi-process-image-universal.nix {
    inherit pkgs;
  };
in
createMutableMultiProcessImage {
  name = "disnix";
  tag = "test";
  contents = [ pkgs.mc pkgs.disnix ];
  exprFile = ./processes.nix;
  interactive = true;
  manpages = true;
  processManager = "supervisord";
}

The exprFile in the above Nix expression refers to a previously shown processes model, and the processManager the desired process manager to use, such as supervisord.

With the following command, we can build the image with Nix and load it into Docker:

$ nix-build
$ docker load -i result

With the following command, we can create a network to which our containers (with IP addresses) should belong:

$ docker network create --subnet=192.168.2.0/8 disnixnetwork

The above command creates a subnet with a prefix: 192.168.2.0 and allocates an 8-bit block for host IP addresses.

We can create and start a Docker container named: containervm using our previously built image, and assign it an IP address:

$ docker run --network disnixnetwork --ip 192.168.2.1 \
  --name containervm disnix:test

By default, Disnix uses SSH to connect to remote machines. With the following commands we can create a public-private key pair and copy the public key to the container:

$ ssh-keygen -t ed25519 -f id_test -N ""

$ docker exec containervm mkdir -m0700 -p /root/.ssh
$ docker cp id_test.pub containervm:/root/.ssh/authorized_keys
$ docker exec containervm chmod 600 /root/.ssh/authorized_keys
$ docker exec containervm chown root:root /root/.ssh/authorized_keys

On the coordinator machine, that carries out the deployment, we must add the private key to the SSH agent and configure the disnix-ssh-client to connect to the disnix-service:

$ ssh-add id_test
$ export DISNIX_REMOTE_CLIENT=disnix-client

By executing all these steps, containervm can be (mostly) used as if it were a virtual machine, including connecting to it with an IP address over SSH.

Conclusion

In this blog post, I have described how the Nix process management framework can be used as a partial infrastructure deployment solution for Disnix. It can be used both for deploying the disnix-service (to facilitate multi-user installations) as well as deploying container providers: services that manage the life-cycles of services deployed by Disnix.

Moreover, the Nix process management framework makes it possible to do these deployments on all kinds of software distributions that can use the Nix package manager, including NixOS, conventional Linux distributions and other operating systems, such as macOS and FreeBSD.

If I had developed this solution a couple of years ago, it would probably have saved me many hours of preparation work for my first demo in my NixCon 2015 talk in which I wanted demonstrate that it is possible to deploy services to a heterogeneous network that consists of a NixOS, Ubuntu and Windows machine. Back then, I had to do all the infrastructure deployment tasks manually.

I also have to admit (but this statement is mostly based on my personal preferences, not facts), is that I find the functional style that the framework uses is IMO far more intuitive than the NixOS module system for certain service configuration aspects, especially for configuring container services and exposing them with Disnix and Dysnomia:

Because every process instance is constructed from a constructor function that makes all instance parameters explicit, you are guarded against common configuration errors such as undeclared dependencies.

For example, the DisnixWebService-enabled Apache Tomcat service requires access to the dbus-service providing the system bus. Not having this service in the processes model, causes a missing function parameter error.
Function parameters in the processes model make it more clear that a process depends on another process and what that relationship may be. For example, with the containerProviders parameter it becomes IMO really clear that the disnix-service uses them as potential deployment targets for services deployed by Disnix.

In comparison, the implementations of the Disnix and Dysnomia NixOS modules are far more complicated and monolithic -- the Dysnomia module has to figure for all potential container services deployed as part of a NixOS configuration, their properties, convert them to Dysnomia configuration files, and configure the systemd configuration for the disnix-service for proper activation ordering.

The wants parameter (used for activation ordering) is just a list of strings, not knowing whether it contains valid references to services that have been deployed already.

Availability

The constructor functions for the services as well as the deployment examples described in this blog post can be found in the Nix process management services repository.

Future work

Slowly more and more of my personal use cases are getting supported by the Nix process management framework.

Moreover, the services repository is steadily growing. To ensure that all the services that I have packaged so far do not break, I really need to focus my work on a service test solution.

Sunday, November 29, 2020

Constructing a simple alerting system with well-known open source projects

Some time ago, I have been experimenting with all kinds of monitoring and alerting technologies. For example, with the following technologies, I can develop a simple alerting system with relative ease:

Telegraf is an agent that can be used to gather measurements and transfer the corresponding data to all kinds of storage solutions.
InfluxDB is a time series database platform that can store, manage and analyze timestamped data.
Kapacitor is a real-time streaming data process engine, that can be used for a variety of purposes. I can use Kapacitor to analyze measurements and see if a threshold has been exceeded so that an alert can be triggered.
Alerta is a monitoring system that can store, de-duplicate alerts, and arrange black outs.
Grafana is a multi-platform open source analytics and interactive visualization web application.

These technologies appear to be quite straight forward to use. However, as I was learning more about them, I discovered a number of oddities, that may have big implications.

Furthermore, testing and making incremental changes also turns out to be much more challenging than expected, making it very hard to diagnose and fix problems.

In this blog post, I will describe how I built a simple monitoring and alerting system, and elaborate about my learning experiences.

Building the alerting system

As described in the introduction, I can combine several technologies to create an alerting system. I will explain them more in detail in the upcoming sections.

Telegraf

Telegraf is a pluggable agent that gathers measurements from a variety of inputs (such as system metrics, platform metrics, database metrics etc.) and sends them to a variety of outputs, typically storage solutions (database management systems such as InfluxDB, PostgreSQL or MongoDB). Telegraf has a large plugin eco-system that provides all kinds integrations.

In this blog post, I will use InfluxDB as an output storage backend. For the inputs, I will restrict myself to capturing a sub set of system metrics only.

With the following telegraf.conf configuration file, I can capture a variety of system metrics every 10 seconds:

[agent]
  interval = "10s"

[[outputs.influxdb]]
  urls = [ "http://test1:8086" ]
  database = "sysmetricsdb"
  username = "sysmetricsdb"
  password = "sysmetricsdb"

[[inputs.system]]
  # no configuration

[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## If true, collect raw CPU time metrics.
  collect_cpu_time = false
  ## If true, compute and report the sum of all non-idle CPU states.
  report_active = true

[[inputs.mem]]
  # no configuration

With the above configuration file, I can collect the following metrics:

System metrics, such as the hostname and system load.
CPU metrics, such as how much the CPU cores on a machine are utilized, including the total CPU activity.
Memory (RAM) metrics.

The data will be stored in an InfluxDB database name: sysmetricsdb hosted on a remote machine with host name: test1.

InfluxDB

As explained earlier, InfluxDB is a timeseries platform that can store, manage and analyze timestamped data. In many ways, InfluxDB resembles relational databases, but there are also some notable differences.

The query language that InfluxDB uses is called InfluxQL (that shares many similarities with SQL).

For example, with the following query I can retrieve the first three data points from the cpu measurement, that contains the CPU-related measurements collected by Telegraf:

> precision rfc3339
> select * from "cpu" limit 3

providing me the following result set:

name: cpu
time                 cpu       host  usage_active       usage_guest usage_guest_nice usage_idle        usage_iowait        usage_irq usage_nice usage_softirq       usage_steal usage_system      usage_user
----                 ---       ----  ------------       ----------- ---------------- ----------        ------------        --------- ---------- -------------       ----------- ------------      ----------
2020-11-16T15:36:00Z cpu-total test2 10.665258711721098 0           0                89.3347412882789  0.10559662090813073 0         0          0.10559662090813073 0           8.658922914466714 1.79514255543822
2020-11-16T15:36:00Z cpu0      test2 10.665258711721098 0           0                89.3347412882789  0.10559662090813073 0         0          0.10559662090813073 0           8.658922914466714 1.79514255543822
2020-11-16T15:36:10Z cpu-total test2 0.1055966209080346 0           0                99.89440337909197 0                   0         0          0.10559662090813073 0           0                 0

As you may probably notice by looking at the output above, every data point has a timestamp and a number of fields capturing CPU metrics:

cpu identifies the CPU core.
host contains the host name of the machine.
The remainder of the fields contain all kinds of CPU metrics, e.g. how much CPU time is consumed by the system (usage_system), the user (usage_user), by waiting for IO (usage_iowait) etc.
The usage_active field contains the total CPU activity percentage, which is going to be useful to develop an alert that will warn us if there is too much CPU activity for a long period of time.

Aside from the fact that all data is timestamp based, data in InfluxDB has another notable difference compared to relational databases: an InfluxDB database is schemaless. You can add an arbitrary number of fields and tags to a data point without having to adjust the database structure (and migrating existing data to the new database structure).

Fields and tags can contain arbitrary data, such as numeric values or strings. Tags are also indexed so that you can search for these values more efficiently. Furthermore, tags can be used to group data.

For example, the cpu measurement collection has the following tags:

> SHOW TAG KEYS ON "sysmetricsdb" FROM "cpu";
name: cpu
tagKey
------
cpu
host

As shown in the above output, the cpu and host fields are tags in the cpu measurement.

We can use these tags to search for all data points related to a CPU core and/or host machine. Moreover, we can use these tags for grouping allowing us to compute aggregate values, sch as the mean value per CPU core and host.

Beyond storing and retrieving data, InfluxDB has many useful additional features:

You can also automatically sample data and run continuous queries that generate and store sampled data in the background.
Configure retention policies so that data is no longer stored for an indefinite amount of time. For example, you can configure a retention policy to drop raw data after a certain amount of time, but retain the corresponding sampled data.

InfluxDB has a "open core" development model. The free and open source edition (FOSS) of InfluxDB server (that is MIT licensed) allows you to host multiple databases on a multiple servers.

However, if you also want horizontal scalability and/or high assurance, then you need to switch to the hosted InfluxDB versions -- data in InfluxDB is partitioned into so-called shards of a fixed size (the default shard size is 168 hours).

These shards can be distributed over multiple InfluxDB servers. It is also possible to deploy multiple read replicas of the same shard to multiple InfluxDB servers improving read speed.

Kapacitor

Kapacitor is a real-time streaming data process engine developed by InfluxData -- the same company that also develops InfluxDB and Telegraf.

It can be used for all kinds of purposes. In my example cases, I will only use it to determine whether some threshold has been exceeded and an alert needs to be triggered.

Kapacitor works with customly implemented tasks that are written in a domain-specific language called the TICK script language. There are two kinds of tasks: stream and batch tasks. Both task types have advantages and disadvantages.

We can easily develop an alert that gets triggered if the CPU activity level is high for a relatively long period of time (more than 75% on average over 1 minute).

To implement this alert as a stream job, we can write the following TICK script:

dbrp "sysmetricsdb"."autogen"

stream
    |from()
        .measurement('cpu')
        .groupBy('host', 'cpu')
        .where(lambda: "cpu" != 'cpu-total')
    |window()
        .period(1m)
        .every(1m)
    |mean('usage_active')
    |alert()
        .message('Host: {{ index .Tags "host" }} has high cpu usage: {{ index .Fields "mean" }}')
        .warn(lambda: "mean" > 75.0)
        .crit(lambda: "mean" > 85.0)
        .alerta()
            .resource('{{ index .Tags "host" }}/{{ index .Tags "cpu" }}')
            .event('cpu overload')
            .value('{{ index .Fields "mean" }}')

A stream job is built around the following principles:

A stream task does not execute queries on an InfluxDB server. Instead, it creates a subscription to InfluxDB -- whenever a data point gets inserted into InfluxDB, the data points gets forwarded to Kapacitor as well.

To make subscriptions work, both InfluxDB and Kapacitor need to be able to connect to each other with a public IP address.
A stream task defines a pipeline consisting of a number of nodes (connected with the | operator). Each node can consume data points, filter, transform, aggregate, or execute arbitrary operations (such as calling an external service), and produce new data points that can be propagated to the next node in the pipeline.
Every node also has property methods (such as .measurement('cpu')) making it possible to configure parameters.

The TICK script example shown above does the following:

The from node consumes cpu data points from the InfluxDB subscription, groups them by host and cpu and filters out data points with the the cpu-total label, because we are only interested in the CPU consumption per core, not the total amount.
The window node states that we should aggregate data points over the last 1 minute and pass the resulting (aggregated) data points to the next node after one minute in time has elapsed. To aggregate data, Kapacitor will buffer data points in memory.
The mean node computes the mean value for usage_active for the aggregated data points.
The alert node is used to trigger an alert of a specific severity level (WARNING if the mean activity percentage is bigger than 75%) and (CRITICAL if the mean activity percentage is bigger than 85%). In the remainder of the case, the status is considered OK. The alert is sent to Alerta.

It is also possible to write a similar kind of alerting script as a batch task:

dbrp "sysmetricsdb"."autogen"

batch
    |query('''
        SELECT mean("usage_active")
        FROM "sysmetricsdb"."autogen"."cpu"
        WHERE "cpu" != 'cpu-total'
    ''')
        .period(1m)
        .every(1m)
        .groupBy('host', 'cpu')
    |alert()
        .message('Host: {{ index .Tags "host" }} has high cpu usage: {{ index .Fields "mean" }}')
        .warn(lambda: "mean" > 75.0)
        .crit(lambda: "mean" > 85.0)
        .alerta()
            .resource('{{ index .Tags "host" }}/{{ index .Tags "cpu" }}')
            .event('cpu overload')
            .value('{{ index .Fields "mean" }}')

The above TICK script looks similar to the stream task shown earlier, but instead of using a subscription, the script queries the InfluxDB database (with an InfluxQL query) for data points over the last minute with a query node.

Which approach for writing a CPU alert is best, you may wonder? Each of these two approaches have their pros and cons:

Stream tasks offer low latency responses -- when a data point appears, a stream task can immediately respond, whereas a batch task needs to query every minute all the data points to compute the mean percentage over the last minute.
Stream tasks maintain a buffer for aggregating the data points making it possible to only send incremental updates to Alerta. Batch tasks are stateless. As a result, they need to update the status of all hosts and CPUs every minute.
Processing data points is done synchronously and in sequential order -- if an update round to Alerta takes too long (which is more likely to happen with a batch task), then the next processing run may overlap with the previous, causing all kinds of unpredictable results.

It may also cause Kapacitor to eventually crash due to growing resource consumption.
Batch tasks may also miss data points -- while querying data over a certain time window, it may happen that a new data point gets inserted in that time window (that is being queried). This new data point will not be picked up by Kapacitor.

A subscription made by a stream task, however, will never miss any data points.
Stream tasks can only work with data points that appear from the moment Kapacitor is started -- it cannot work with data points in the past.

For example, if Kapacitor is restarted and some important event is triggered in the restart time window, Kapacitor will not notice that event, causing the alert to remain in its previous state.

To work effectively with stream tasks, a continuous data stream is required that frequently reports on the status of a resource. Batch tasks, on the other hand, can work with historical data.
The fact that nodes maintain a buffer may also cause the RAM consumption of Kapacitor to grow considerably, if the data volumes are big.

A batch task on the other hand, does not buffer any data and is more memory efficient.

Another compelling advantage of batch tasks over stream tasks is that InfluxDB does all the work. The hosted version of InfluxDB can also horizontally scale.
Batch tasks can also aggregate data more efficiently (e.g. computing the mean value or sum of values over a certain time period).

I consider neither of these script types the optimal solution. However, for implementing the alerts I tend to have a slight preference for stream jobs, because of its low latency, and incremental update properties.

Alerta

As explained in the introduction, Alerta is a monitoring system that can store and de-duplicate alerts, and arrange black outs.

The Alerta server provides a REST API that can be used to query and modify alerting data and uses MongoDB or PostgreSQL as a storage database.

There are also a variety of Alerta clients: there is the alerta-cli allows you to control the service from the command-line. There is also a web user interface that I will show later in this blog post.

Running experiments

With all the components described above in place, we can start running experiments to see if the CPU alert will work as expected. To gain better insights in the process, I can install Grafana that allows me to visualize the measurements that are stored in InfluxDB.

Configuring a dashboard and panel for visualizing the CPU activity rate was straight forward. I configured a new dashboard, with the following variables:

The above variables allow me to select for each machine in the network, which CPU core's activity percentage I want to visualize.

I have configured the CPU panel as follows:

In the above configuration, I query the usage_activity from the cpu measurement collection, using the dashboard variables: cpu and host to filter for the right target machine and CPU core.

I have also configured the field unit to be a percentage value (between 0 and 100).

When running the following command-line instruction on a test machine that runs Telegraf (test2), I can deliberately hog the CPU:

$ dd if=/dev/zero of=/dev/null

The above command reads zero bytes (one-by-one) and discards them by sending them to /dev/null, causing the CPU to remain utilized at a high level:

In the graph shown above, it is clearly visible that CPU core 0 on the test2 machine remains utilized at 100% for several minutes.

(As a sidenote, we can also hog both the CPU and consume RAM at the same time with a simple command line instruction).

If we keep hogging the CPU and wait for at least a minute, the Alerta web interface dashboard will show a CRITICAL alert:

If we stop the dd command, then the TICK script should eventually notice that the mean percentage drops below the WARNING threshold causing the alert to go back into the OK state and disappearing from the Alerta dashboard.

Developing test cases

Being able to trigger an alert with a simple command-line instruction is useful, but not always convenient or effective -- one of the inconveniences is that we always have to wait at least one minute to get feedback.

Moreover, when an alert does not work, it is not always easy to find the root cause. I have encountered the following problems that contribute to a failing alert:

Telegraf may not be running and, as a result, not capturing the data points that need to be analyzed by the TICK script.
A subscription cannot be established between InfluxDB and Kapacitor. This may happen when Kapacitor cannot be reached through a public IP address.
There are data points collected, but only the wrong kinds of measurements.
The TICK script is functionally incorrect.

Fortunately, for stream tasks it is relatively easy to quickly find out whether an alert is functionally correct or not -- we can generate test cases that almost instantly trigger each possible outcome with a minimal amount of data points.

An interesting property of stream tasks is that they have no notion of time -- the .window(1m) property may suggest that Kapacitor computes the mean value of the data points every minute, but that is not what it actually does. Instead, Kapacitor only looks at the timestamps of the data points that it receives.

When Kapacitor sees that the timestamps of the data points fit in the 1 minute time window, then it keeps buffering. As soon as a data point appears that is outside this time window, the window node relays an aggregated data point to the next node (that computes the mean value, than in turn is consumed by the alert node deciding whether an alert needs to be raised or not).

We can exploit that knowledge, to create a very minimal bash test script that triggers every possible outcome: OK, WARNING and CRITICAL:

influxCmd="influx -database sysmetricsdb -host test1"

export ALERTA_ENDPOINT="http://test1"

### Trigger CRITICAL alert

# Force the average CPU consumption to be 100%
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=100   0000000000"
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=100  60000000000"
# This data point triggers the alert
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=100 120000000000"

sleep 1
actualSeverity=$(alerta --output json query | jq '.[0].severity')

if [ "$actualSeverity" != "critical" ]
then
     echo "Expected severity: critical, but we got: $actualSeverity" >&2
     false
fi
      
### Trigger WARNING alert

# Force the average CPU consumption to be 80%
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=80 180000000000"
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=80 240000000000"
# This data point triggers the alert
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=80 300000000000"

sleep 1
actualSeverity=$(alerta --output json query | jq '.[0].severity')

if [ "$actualSeverity" != "warning" ]
then
     echo "Expected severity: warning, but we got: $actualSeverity" >&2
     false
fi

### Trigger OK alert

# Force the average CPU consumption to be 0%
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=0 300000000000"
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=0 360000000000"
# This data point triggers the alert
$influxCmd -execute "INSERT cpu,cpu=cpu0,host=test2 usage_active=0 420000000000"

sleep 1
actualSeverity=$(alerta --output json query | jq '.[0].severity')

if [ "$actualSeverity" != "ok" ]
then
     echo "Expected severity: ok, but we got: $actualSeverity" >&2
     false
fi

The shell script shown above automatically triggers all three possible outcomes of the CPU alert:

CRITICAL is triggered by generating data points that force a mean activity percentage of 100%.
WARNING is triggered by a mean activity percentage of 80%.
OK is triggered by a mean activity percentage of 0%.

It uses the Alerta CLI to connect to the Alerta server to check whether the alert's severity level has the expected value.

We need three data points to trigger each alert type -- the first two data points are on the boundaries of the 1 minute window (0 seconds and 60 seconds), forcing the mean value to become the specified CPU activity percentage.

The third data point is deliberately outside the time window (of 1 minute), forcing the alert node to be triggered with a mean value over the previous two data points.

Although the above test strategy works to quickly validate all possible outcomes, one impractical aspect is that the timestamps in the above example start with 0 (meaning 0 seconds after the epoch: January 1st 1970 00:00 UTC).

If we also want to observe the data points generated by the above script in Grafana, we need to configure the panel to go back in time 50 years.

Fortunately, I can also easily adjust the script to start with a base timestamp, that is 1 hour in the past:

offset="$(($(date +%s) - 3600))"

With this tiny adjustment, we should see the following CPU graph (displaying data points from the last hour) after running the test script:

As you may notice, we can see that the CPU activity level quickly goes from 100%, to 80%, to 0%, using only 9 data points.

Although testing stream tasks (from a functional perspective) is quick and convenient, testing batch tasks in a similar way is difficult. Contrary to the stream task implementation, the query node in the batch task does have a notion of time (because of the WHERE clause that includes the now() expression).

Moreover, the embedded InfluxQL query evaluates the mean values every minute, but the test script does not exactly know when this event triggers.

The only way I could think of to (somewhat reliably) validate the outcomes is by creating a test script that continuously inserts data points for at least double the time window size (2 minutes) until Alerta reports the right alert status (if it does not after a while, I can conclude that the alert is incorrectly implemented).

Automating the deployment

As you may probably have already guessed, to be able to conveniently experiment with all these services, and to reliably run tests in isolation, some form of deployment automation is an absolute must-have.

Most people who do not know anything about my deployment technology preferences, will probably go for Docker or docker-compose, but I have decided to use a variety of solutions from the Nix project.

NixOps is used to automatically deploy a network of NixOS machines -- I have created a logical and physical NixOps configuration that deploys two VirtualBox virtual machines.

With the following command I can create and deploy the virtual machines:

$ nixops create network.nix network-virtualbox.nix -d test
$ nixops deploy -d test

The first machine: test1 is responsible for hosting the entire monitoring infrastructure (InfluxDB, Kapacitor, Alerta, Grafana), and the second machine (test2) runs Telegraf and the load tests.

Disnix (my own deployment tool) is responsible for deploying all services, such as InfluxDB, Kapacitor, Alarta, and the database storage backends. Contrary to docker-compose, Disnix does not work with containers (or other Docker objects, such as networks or volumes), but with arbitrary deployment units that are managed with a plugin system called Dysnomia.

Moreover, Disnix can also be used for distributed deployment in a network of machines.

I have packaged all the services and captured them in a Disnix services model that specifies all deployable services, their types, and their inter-dependencies.

If I combine the services model with the NixOps network models, and a distribution model (that maps Telegraf and the test scripts to the test2 machine and the remainder of the services to the first: test1), I can deploy the entire system:

$ export NIXOPS_DEPLOYMENT=test
$ export NIXOPS_USE_NIXOPS=1

$ disnixos-env -s services.nix \
  -n network.nix \
  -n network-virtualbox.nix \
  -d distribution.nix

The following diagram shows a possible deployment scenario of the system:

The above diagram describes the following properties:

The light-grey colored boxes denote machines. In the above diagram, we have two of them: test1 and test2 that correspond to the VirtualBox machines deployed by NixOps.
The dark-grey colored boxes denote containers in a Disnix-context (not to be confused with Linux or Docker containers). These are environments that manage other services.

For example, a container service could be the PostgreSQL DBMS managing a number of PostgreSQL databases or the Apache HTTP server managing web applications.
The ovals denote services that could be any kind of deployment unit. In the above example, we have services that are running processes (managed by systemd), databases and web applications.
The arrows denote inter-dependencies between services. When a service has an inter-dependency on another service (i.e. the arrow points from the former to the latter), then the latter service needs to be activated first. Moreover, the former service also needs to know how the latter can be reached.
Services can also be container providers (as denoted by the arrows in the labels), stating that other services can be embedded inside this service.

As already explained, the PostgreSQL DBMS is an example of such a service, because it can host multiple PostgreSQL databases.

Although the process components in the diagram above can also be conveniently deployed with Docker-based solutions (i.e. as I have explained in an earlier blog post, containers are somewhat confined and restricted processes), the non-process integrations need to be managed by other means, such as writing extra shell instructions in Dockerfiles.

In addition to deploying the system to machines managed by NixOps, it is also possible to use the NixOS test driver -- the NixOS test driver automatically generates QEMU virtual machines with a shared Nix store, so that no disk images need to be created, making it possible to quickly spawn networks of virtual machines, with very small storage footprints.

I can also create a minimal distribution model that only deploys the services required to run the test scripts -- Telegraf, Grafana and the front-end applications are not required, resulting in a much smaller deployment:

As can be seen in the above diagram, there are far fewer components required.

In this virtual network that runs a minimal system, we can run automated tests for rapid feedback. For example, the following test driver script (implemented in Python) will run my test shell script shown earlier:

test2.succeed("test-cpu-alerts")

With the following command I can automatically run the tests on the terminal:

$ nix-build release.nix -A tests

Availability

The deployment recipes, test scripts and documentation describing the configuration steps are stored in the monitoring playground repository that can be obtained from my GitHub page.

Besides the CPU activity alert described in this blog post, I have also developed a memory alert that triggers if too much RAM is consumed for a longer period of time.

In addition to virtual machines and services, there is also deployment automation in place allowing you also easily deploy Kapacitor TICK scripts and Grafana dashboards.

To deploy the system, you need to use the very latest version of Disnix (version 0.10) that was released very recently.

Acknowledgements

I would like to thank my employer: Mendix for writing this blog post. Mendix allows developers to work two days per month on research projects, making projects like these possible.

Presentation

I have given a presentation about this subject at Mendix. For convienence, I have embedded the slides:

The Monitoring Playground from Sander van der Burg

Thursday, October 8, 2020

Transforming Disnix models to graphs and visualizing them

In my previous blog post, I have described a new tool in the Dynamic Disnix toolset that can be used to automatically assign unique numeric IDs to services in a Disnix service model. Unique numeric IDs can represent all kinds of useful resources, such as TCP/UDP port numbers, user IDs (UIDs), and group IDs (GIDs).

Although I am quite happy to have this tool at my disposal, implementing it was much more difficult and time consuming than I expected. Aside from the fact that the problem is not as obvious as it may sound, the main reason is that the Dynamic Disnix toolset was originally developed as a proof of concept implementation for a research paper under very high time pressure. As a result, it has accumulated quite a bit of technical debt, that as of today, is still at a fairly high level (but much better than it was when I completed the PoC).

For the ID assigner tool, I needed to make changes to the foundations of the tools, such as the model parsing libraries. As a consequence, all kinds of related aspects in the toolset started to break, such as the deployment planning algorithm implementations.

Fixing some of these algorithm implementations was much more difficult than I expected -- they were not properly documented, not decomposed into functions, had little to no reuse of common concepts and as a result, were difficult to understand and change. I was forced to re-read the papers that I used as a basis for these algorithms.

To prevent myself from having to go through such a painful process again, I have decided to revise them in such a way that they are better understandable and maintainable.

Dynamically distributing services

The deployment models in the core Disnix toolset are static. For example, the distribution of services to machines in the network is done in a distribution model in which the user has to manually map services in the services model to target machines in the infrastructure model (and optionally to container services hosted on the target machines).

Each time a condition changes, e.g. the system needs to scale up or a machine crashes and the system needs to recover, a new distribution model must be configured and the system must be redeployed. For big complex systems that need to be reconfigured frequently, manually specifying new distribution models becomes very impractical.

As I have already explained in older blog posts, to cope with the limitations of static deployment models (and other static configuration aspects), I have developed Dynamic Disnix, in which various configuration aspects can be automated, including the distribution of services to machines.

A strategy for dynamically distributing services to machines can be specified in a QoS model, that typically consists of two phases:

First, a candidate target selection must be made, in which for each service the appropriate candidate target machines are selected.

Not all machines are capable of hosting a certain service for functional and non-functional reasons -- for example, a i686-linux machine is not capable of running a binary compiled for a x86_64-linux machine.

A machine can also be exposed to the public internet, and as a result, may not be suitable to host a service that exposes privacy-sensitive information.
After the suitable candidate target machines are known for each service, we must decide to which candidate machine each service gets distributed.

This can be done in many ways. The strategy that we want to use is typically based on all kinds of non-functional requirements.

For example, we can optimize a system's reliability by minimizing the amount of network links between services, requiring a strategy in which services that depend on each other are mapped to the same machine, as much as possible.

Graph-based optimization problems

In the Dynamic Disnix toolset, I have implemented various kinds of distribution algorithms/strategies for all kinds of purposes.

I did not "invent" most of them -- for some, I got inspiration from papers in the academic literature.

Two of the more advanced deployment planning algorithms are graph-based, to accomplish the following goals:

Reliable deployment. Network links are a potential source making a distributed system unreliable -- connections may fail, become slow, or could be interrupted frequently. By minimizing the amount of network links between services (by co-locating them on the same machine), their impact can be reduced. To not make deployments not too expensive, it should be done with a minimal amount of machines.

As described in the paper: "Reliable Deployment of Component-based Applications into Distributed Environments" by A. Heydarnoori and F. Mavaddat, this problem can be transformed into a graph problem: the multiway cut problem (which is NP-hard).

It can only be solved in polynomial time with an approximation algorithm that comes close to the optimal solution, unless a proof that P = NP exists.
Fragile deployment. Inspired by the above deployment problem, I also came up with the opposite problem (as my own "invention") -- how can we make any connection between a service a true network link (not local), so that we can test a system for robustness, using a minimal amount of machines?

This problem can be modeled as a graph coloring problem (that is a NP-hard problem as well). I used one of the approximation algorithms described in the paper: "New Methods to Color the Vertices of a Graph" by D. Brélaz to implement a solution.

To work with these graph-based algorithms, I originally did not apply any transformations -- because of time pressure, I directly worked with objects from the Disnix models (e.g. services, target machines) and somewhat "glued" these together with generic data structures, such as lists and hash tables.

As a result, when looking at the implementation, it is very hard to get an understanding of the process and how an implementation aspect relates to a concept described in the papers shown above.

In my revised version, I have implemented a general purpose graph library that can be used to solve all kinds of general graph related problems.

Aside from using a general graph library, I have also separated the graph-based generation processes into the following steps:

After opening the Disnix input models (such as the services, infrastructure, and distribution models) I transform the models to a graph representing an instance of the problem domain.
After the graph has been generated, I apply the approximation algorithm to the graph data structure.
Finally, I transform the resolved graph back to a distribution model that should provide our desired distribution outcome.

This new organization provides better separation of concerns, common concepts can be reused (such as graph operations), and as a result, the implementations are much closer to the approximation algorithms described in the papers.

Visualizing the generation process

Another advantage of having a reusable graph implementation is that we can easily extend it to visualize the problem graphs.

When I combine these features together with my earlier work that visualizes services models, and a new tool that visualizes infrastructure models, I can make the entire generation process transparent.

For example, the following services model:

{system, pkgs, distribution, invDistribution}:

let
  customPkgs = import ./pkgs { inherit pkgs system; };
in
rec {
  testService1 = {
    name = "testService1";
    pkg = customPkgs.testService1;
    type = "echo";
  };

  testService2 = {
    name = "testService2";
    pkg = customPkgs.testService2;
    dependsOn = {
      inherit testService1;
    };
    type = "echo";
  };

  testService3 = {
    name = "testService3";
    pkg = customPkgs.testService3;
    dependsOn = {
      inherit testService1 testService2;
    };
    type = "echo";
  };
}

can be visualized as follows:

$ dydisnix-visualize-services -s services.nix

The above services model and corresponding visualization capture the following properties:

They describe three services (as denoted by ovals).
The arrows denote inter-dependency relationships (the dependsOn attribute in the services model).

When a service has an inter-dependency on another service means that the latter service has to be activated first, and that the dependent service needs to know how to reach the former.

testService2 depends on testService1 and testService3 depends on both the other two services.

We can also visualize the following infrastructure model:

{
  testtarget1 = {
    properties = {
      hostname = "testtarget1";
    };
    containers = {
      mysql-database = {
        mysqlPort = 3306;
      };
      echo = {};
    };
  };

  testtarget2 = {
    properties = {
      hostname = "testtarget2";
    };
    containers = {
      mysql-database = {
        mysqlPort = 3306;
      };
    };
  };

  testtarget3 = {
    properties = {
      hostname = "testtarget3";
    };
  };
}

with the following command:

$ dydisnix-visualize-infra -i infrastructure.nix

resulting in the following visualization:

The above infrastructure model declares three machines. Each target machine provides a number of container services (such as a MySQL database server, and echo that acts as a testing container).

With the following command, we can generate a problem instance for the graph coloring problem using the above services and infrastructure models as inputs:

$ dydisnix-graphcol -s services.nix -i infrastructure.nix \
  --output-graph

resulting in the following graph:

The graph shown above captures the following properties:

Each service translates to a node
When an inter-dependency relationship exists between services, it gets translated to a (bi-directional) link representing a network connection (the rationale is that a service that has an inter-dependency on another service, interact with each other by using a network connection).

Each target machine translates to a color, that we can represent with a numeric index -- 0 is testtarget1, 1 is testtarget2 and so on.

The following command generates the resolved problem instance graph in which each vertex has a color assigned:

$ dydisnix-graphcol -s services.nix -i infrastructure.nix \
  --output-resolved-graph

resulting in the following visualization:

(As a sidenote: in the above graph, colors are represented by numbers. In theory, I could also use real colors, but if I also want that the graph to remain visually appealing, I need to solve a color picking problem, which is beyond the scope of my refactoring objective).

The resolved graph can be translated back into the following distribution model:

$ dydisnix-graphcol -s services.nix -i infrastructure.nix
{
  "testService2" = [
    "testtarget2"
  ];
  "testService1" = [
    "testtarget1"
  ];
  "testService3" = [
    "testtarget3"
  ];
}

As you may notice, every service is distributed to a separate machine, so that every network link between a service is a real network connection between machines.

We can also visualize the problem instance of the multiway cut problem. For this, we also need a distribution model that, declares for each service, which target machine is a candidate.

The following distribution model makes all three target machines in the infrastructure model a candidate for every service:

{infrastructure}:

{
  testService1 = [ infrastructure.testtarget1 infrastructure.testtarget2 infrastructure.testtarget3 ];
  testService2 = [ infrastructure.testtarget1 infrastructure.testtarget2 infrastructure.testtarget3 ];
  testService3 = [ infrastructure.testtarget1 infrastructure.testtarget2 infrastructure.testtarget3 ];
}

With the following command we can generate a problem instance representing a host-application graph:

$ dydisnix-multiwaycut -s services.nix -i infrastructure.nix \
  -d distribution.nix --output-graph

providing me the following output:

The above problem graph has the following properties:

Each service translates to an app node (prefixed with app:) and each candidate target machine to a host node (prefixed with host:).
When a network connection between two services exists (implicitly derived from having an inter-dependency relationship), an edge is generated with a weight of 1.
When a target machine is a candidate target for a service, then an edge is generated with a weight of n² representing a very large number.

The objective of solving the multiway cut problem is to cut edges in the graph in such a way that each terminal (host node) is disconnected from the other terminals (host nodes), in which the total weight of the cuts is minimized.

When applying the approximation algorithm in the paper to the above graph:

$ dydisnix-multiwaycut -s services.nix -i infrastructure.nix \
  -d distribution.nix --output-resolved-graph

we get the following resolved graph:

that can be transformed back into the following distribution model:

$ dydisnix-multiwaycut -s services.nix -i infrastructure.nix \
  -d distribution.nix
{
  "testService2" = [
    "testtarget1"
  ];
  "testService1" = [
    "testtarget1"
  ];
  "testService3" = [
    "testtarget1"
  ];
}

As you may notice by looking at the resolved graph (in which the terminals: testtarget2 and testtarget3 are disconnected) and the distribution model output, all services are distributed to the same machine: testtarget1 making all connections between the services local connections.

In this particular case, the solution is not only close to the optimal solution, but it is the optimal solution.

Conclusion

In this blog post, I have described how I have revised the deployment planning algorithm implementations in the Dynamic Disnix toolset. Their concerns are now much better separated, and the graph-based algorithms now use a general purpose graph library, that can also be used for generating visualizations of the intermediate steps in the generation process.

This revision was not on my short-term planned features list, but I am happy that I did the work. Retrospectively, I regret that I never took the time to finish things up properly after the submission of the paper. Although Dynamic Disnix's quality is still not where I want it to be, it is quite a step forward in making the toolset more usable.

Sadly, it is almost 10 years ago that I started Dynamic Disnix and still there is no offical release yet and the technical debt in Dynamic Disnix is one of the important reasons that I never did an official release. Hopefully, with this step I can do it some day. :-)

The good news is that I made the paper submission deadline and that the paper got accepted for presentation. It brought me to the SEAMS 2011 conference (co-located with ICSE 2011) in Honolulu, Hawaii, allowing me to take pictures such as this one:

Availability

The graph library and new implementations of the deployment planning algorithms described in this blog post are part of the current development version of Dynamic Disnix.

The paper: "A Self-Adaptive Deployment Framework for Service-Oriented Systems" describes the Dynamic Disnix framework (developed 9 years ago) and can be obtained from my publications page.

Acknowledgements

To generate the visualizations I used the Graphviz toolset.