Wednesday, April 20, 2016

Managing the state of mutable components in NixOS configurations with Dysnomia


In an old blog post (and research paper) from a couple of years ago, I have described a prototype version of Dysnomia -- a toolset that can be used to deploy so-called "mutable components". In the middle of last year, I have integrated the majority of its concepts into the mainstream version of Dysnomia, because I had found some practical use for it.

So far, I have only used Dysnomia in conjunction with Disnix -- Disnix executes all activities required to deploy a service-oriented system, such as:

  • Building services and their intra-dependencies from source code. By default, Disnix performs the builds on the coordinator machine, but can also optionally delegate them to target machines in the network.
  • Distributing services and their intra-dependency closures to the appropriate target machines in the network.
  • Activating newly deployed services, and deactivating obsolete services.
  • Optionally snapshotting, transferring and restoring the state of services (or a subset of services) that have moved from a target machine to another.

For carrying out the building and distribution activities, Disnix invokes the Nix package manager as it provides a number of powerful features that makes deployment of packages more reliable and reproducible.

However, not all activities required to deploy service-oriented systems are supported by Nix and this is where Dysnomia comes in handy -- one of Dysnomia's objectives is to uniformly activate and deactivate mutable components in containers by modifying the latter's state. The other objective is to uniformly support snapshotting and restoring the state of mutable components deployed in a container.

The definitions of mutable components and containers are deliberately left abstract in a Dysnomia context. Basically, they can represent anything, such as:

  • A MySQL database schema component and a MySQL DBMS container.
  • An Java web application component (WAR file) and an Apache Tomcat container.
  • A UNIX process component and a systemd container.
  • Even NixOS configurations can be considered mutable components.

To support many kinds of component and container flavours, Dysnomia has been designed as a plugin system -- each Dysnomia module has a standardized interface (basically a process taking two standard command line parameters) and implement a set of standard deployment activities (e.g. activate, deactivate, snapshot and restore) for each type of container.

Despite the fact that Dysnomia has originally been designed for use with Disnix (the package was historically known as Disnix activation scripts), it can also be used a standalone tool or in combination with other deployment solutions. (As a sidenote: the reason why I picked the name Dysnomia is, because like Nix, it is the name of a moon of a Trans-Neptunian object).

Similar to Disnix, when deploying NixOS configurations, all activities to deploy the static parts of a system are carried out by the Nix package manager.

However, in the final step (the activation step) a big generated shell script is executed that is responsible for deploying the dynamic parts of a system, such as the updating the GRUB bootloader, reloading systemd units, creating folders that store variable data (e.g. /var), creating user accounts and so on.

In some cases, it may also be desired to deploy mutable components as part of a NixOS system configuration:

  • Some systems are monolithic and cannot be be decomposed into services (i.e. distributable units) of deployment.
  • Some NixOS modules have scripts to initialize the state of a system service on first startup, such as a database, but do it in their own ad-hoc way, e.g. there is no real formalism behind it.
  • You may also want to use Dysnomia's (primitive) snapshotting facilities for backup purposes.

Recently I did some interesting experiments with Dysnomia on NixOS-level. In this blog post, I will show how Dysnomia can be used in conjunction with NixOS.

Deploying NixOS configurations


As described in earlier blog posts, in NixOS, deployment is driven by a single NixOS configuration file (/etc/nixos/configuration.nix), such as:

{pkgs, ...}:

{
  boot.loader.grub = {
    enable = true;
    device = "/dev/sda";
  };

  fileSystems."/" = {
    device = "/dev/disk/by-label/nixos";
    fsType = "ext4";  
  };

  services = {
    openssh.enable = true;
    
    mysql = {
      enable = true;
      package = pkgs.mysql;
      rootPassword = ../configurations/mysqlpw;
    };
  };
}

The above configuration file states that we want to deploy a system using the GRUB bootloader, having a single root partition, running OpenSSH and MySQL as system services. The configuration can be deployed with a single-command line instruction:

$ nixos-rebuild switch

When running the above command-line instruction, the Nix package manager deploys all required packages and configuration files. After all packages have been successfully deployed, the activation script gets executed. As a result, we have a system running OpenSSH and MySQL.

By modifying the above configuration and adding another service after MySQL:

...

mysql = {
  enable = true;
  package = pkgs.mysql;
  rootPassword = ../configurations/mysqlpw;
};

tomcat = {
  enable = true;
  commonLibs = [ "${pkgs.mysql_jdbc}/share/java/mysql-connector-java.jar" ];
  catalinaOpts = "-Xms64m -Xmx256m";
};

...

and running the same command-line instruction again:

$ nixos-rebuild switch

The NixOS configuration gets upgraded to also run Apache Tomcat as a system service in addition to MySQL and OpenSSH. When upgrading, Nix only builds or downloads the packages that have not been deployed before making the upgrade process much more efficient than rebuilding it from scratch.

Managing collections of mutable components


Similar to NixOS configurations (that represent entire system configurations), we need to manage the deployment of mutable components belonging to a system configuration as a whole. I have developed a new tool called: dysnomia-containers for this purpose.

The following command-line instruction queries all available containers on a system that serve as potential deployment targets:

$ dysnomia-containers --query-containers
mysql-database
process
tomcat-webapplication
wrapper

What the above command-line instruction does is searching all folders in the DYSNOMIA_CONTAINERS_PATH environment variable (that defaults to: /etc/dysnomia/containers) for container configuration files and displays their names, such as mysql-database corresponding to a MySQL DBMS server, and process and wrapper that are virtual containers integrating with the host system's service manager, such as systemd.

We can also query the available mutable components that we can deploy to the above listed containers:

$ dysnomia-containers --query-available-components
mysql-database/rooms
mysql-database/staff
mysql-database/zipcodes
tomcat-webapplication/GeolocationService
tomcat-webapplication/RoomService
tomcat-webapplication/StaffService
tomcat-webapplication/StaffTracker
tomcat-webapplication/ZipcodeService

The above command-line instruction displays all the available mutable component configurations that reside in directories provided by the DYSNOMIA_COMPONENTS_PATH environment variable, such as three MySQL databases and five Apache Tomcat web applications.

We can deploy all the available mutable components to the available containers, by running:

$ dysnomia-containers --deploy
Activating component: rooms in container: mysql-database
Activating component: staff in container: mysql-database
Activating component: zipcodes in container: mysql-database
Activating component: GeolocationService in container: tomcat-webapplication
Activating component: RoomService in container: tomcat-webapplication
Activating component: StaffService in container: tomcat-webapplication
Activating component: StaffTracker in container: tomcat-webapplication
Activating component: ZipcodeService in container: tomcat-webapplication

Besides displaying the available mutable components and deploying them, we can also query which ones have been deployed already:

$ dysnomia-containers --query-activated-components
mysql-database/rooms
mysql-database/staff
mysql-database/zipcodes
tomcat-webapplication/GeolocationService
tomcat-webapplication/RoomServiceWrapper
tomcat-webapplication/StaffService
tomcat-webapplication/StaffTracker
tomcat-webapplication/ZipcodeService

The dysnomia-containers tool uses the set of available and activated components to make an upgrade more efficient -- when deploying a new system configuration, it will deactivate the components that have been activated that are not available anymore, and activate the available components that have not been activated yet. The components that are both in the old and new configuration remain untouched.

For example, if we would run dysnomia-containers --deploy again, then nothing will be deployed or undeployed as the configuration remained identical.

We can also take snapshots of all activated mutable components (for example, for backup purposes):

$ dysnomia-containers --snapshot

After running the above command, the Dysnomia snapshot utility may show you the following output:

$ dysnomia-snapshots --query-all
mysql-database/rooms/faede34f3bf658884020a31ca98f16503da9a90bf3313cc96adc5c2358c0b054
mysql-database/staff/e9af7042064c33379ba9fe9272f61986b5a85de63c57732f067695e499a3a18f
mysql-database/zipcodes/637faa3e79ec6c2db71ac4023e86f29890e54233ea6592680fd88481725d44a3

As may be noticed, for each MySQL database (we have three of them) we have taken a snapshot. (For the Apache Tomcat web applications, no snapshots have been taken because state management for these kinds of components is unsupported).

We can also restore the state from the snapshots that we just have taken:

$ dysnomia-containers --restore

The above command restores the state of all three databases.

Finally, as with services deployed by Disnix, deactivating a mutable component does not imply that its state is removed automatically. Instead, it has been marked as garbage and must be explicitly removed by running:

$ dysnomia-containers --collect-garbage

NixOS integration


To actually make the previously shown deployment activities work, we need configuration files for all the containers and mutable components and put them into locations that are reachable from the DYSNOMIA_CONTAINERS_PATH and DYSNOMIA_COMPONENTS_PATH environment variables.

Obviously, they can be written by hand (as demonstrated in my previous blog post about Dysnomia), but this is not always very practical to do on a system-level. Moreover, there is some repetition involved as a NixOS configuration and container configuration files capture common properties.

I have developed a Dysnomia NixOS module to automate Dysnomia's configuration through NixOS. It can be enabled by adding the following property to a NixOS configuration file:

dysnomia.enable = true;

We can specify container properties in a NixOS configuration file as follows:

dysnomia.containers = {
  mysql-database = {
    mysqlUsername = "root";
    mysqlPassword = "secret";
    mysqlPort = 3306;
  };
  tomcat-webapplication = {
    tomcatPort = 8080;
  };
  ...
};

The Dysnomia module generates the corresponding container configuration files having the same names as each attribute name in the dysnomia.containers set and composes their contents from the sub attribute sets by translating them to text files with key=value pairs.

Most of the dysnomia.containers properties can be automatically generated by the Dysnomia NixOS module as well, since most of them have already been specified elsewhere in a NixOS configuration. For example, by enabling MySQL in a Dysnomia-enabled NixOS configuration:

services.mysql = {
  enable = true;
  package = pkgs.mysql;
  rootPassword = ../configurations/mysqlpw;
};

The Dysnomia module automatically generates the corresponding container properties as shown previously. The Dysnomia NixOS module integrates with all NixOS features for which Dysnomia provides a plugin.

In addition to containers, we can also specify the available mutable components as part of a NixOS configuration:

dysnomia.components = {
  mysql-database = {
    rooms = pkgs.writeTextFile {
      name = "rooms";
      text = ''
        create table room
        ( Room     VARCHAR(10)    NOT NULL,
          Zipcode  VARCHAR(6)     NOT NULL,
          PRIMARY KEY(Room)
        );
      '';
    };
    staff = ...
    zipcodes = ...
  };

  tomcat-webapplication = {
    ...
  };
};

As can be observed in the above example, the dysnomia.components attribute set captures the available mutable components per container. For the mysql-database container, we have defined three databases: rooms, staff and zipcodes. Each attribute refers to a Nix build function that produces an SQL file representing the initial state of the database on first activation (typically a schema).

Besides MySQL databases, we can use the tomcat-webapplication attribute to automatically deploy Java web applications to the Apache Tomcat servlet container. The corresponding values of each mutable component refer to the result of a Nix build function that produce a Java web application archive (WAR file).

The Dysnomia module automatically composes a directory with symlinks referring to the generated mutable component configurations reachable through the DYSNOMIA_COMPONENTS_PATH environment variable.

Distributed infrastructure state management


In addition to deploying mutable components belonging to a single NixOS configuration, I have mapped the NixOS-level Dysnomia deployment concepts to networks of NixOS machines by extending the DisnixOS toolset (the Disnix extension integrating Disnix' service deployment concepts with NixOS' infrastructure deployment).

It may not have been stated explicitly in any of my previous blog posts, but DisnixOS can also be used deploy a network of NixOS configurations to target machines in a network. For example, we can compose a networked NixOS configuration that includes the machine configuration shown previously:

{
  test1 = import ./configurations/mysql-tomcat.nix;
  test2 = import ./configurations/empty.nix;
}

The above configuration file is an attribute set defining two machine configurations. The first attribute (test1) refers to our previous NixOS configuration running MySQL and Apache Tomcat as system services.

We can deploy the networked configuration with the following command-line instruction:

$ disnixos-deploy-network network.nix

As a sidenote: although DisnixOS can deploy networks of NixOS configurations, NixOps does a better job in accomplishing this. Moreover, DisnixOS only supports deployment of NixOS configurations to bare-metal servers and cannot instantiate any VMs in the cloud.

Furthermore, what DisnixOS also does differently compared to NixOps, is invoking Dysnomia to activate or deactivate NixOS configurations -- the corresponding NixOS plugin executes the big monolithic NixOS activation script for the activation step and runs nixos-rebuild --rollback switch for the deactivation step.

I have extended the Dysnomia's nixos-configuration plugin with state management operations. Snapshotting the state of a NixOS configuration simply means running:

$ dysnomia-containers --snapshot

Likewise, restoring the state of a NixOS configuration can be done with:

$ dysnomia-containers --restore

And removing obsolete state with:

$ dysnomia-containers --collect-garbage

When using Disnix to manage state, we may have mutable components deployed as part of a system configuration and mutable components deployed as services in the same environment. To prevent the snapshots of the services to conflict with the ones belonging to a machine's system configuration, we set the DYSNOMIA_STATEDIR environment variable to: /var/state/dysnomia-nixos for system-level state management and to /var/state/dysnomia for service-level state management to keep them apart.

With these additional operations, we can capture the state of all mutable components part of the system configurations in a network:

$ disnixos-snapshot-network network.nix

This yields a snapshot of the test1 machine stored in the Dysnomia snapshot store on the coordinator machine:

$ dysnomia-snapshots --query-latest
nixos-configuration/nixos-system-test1-16.03pre-git/4c4751f10648dfbbf8e25c924391e80913c8a6a600f7b481d73cd88ff3d32730

When inspecting the contents of the NixOS system configuration snapshot, we will observe:

$ cd /var/state/dysnomia/snapshots/$(dysnomia-snapshots --query-latest)
$ find -maxdepth 3 -mindepth 3 -type d
./mysql-database/rooms/faede34f3bf658884020a31ca98f16503da9a90bf3313cc96adc5c2358c0b054
./mysql-database/staff/e9af7042064c33379ba9fe9272f61986b5a85de63c57732f067695e499a3a18f
./mysql-database/zipcodes/637faa3e79ec6c2db71ac4023e86f29890e54233ea6592680fd88481725d44a3

The contents of the NixOS system configuration snapshot consist all snapshots of the mutable components belonging to its system configuration.

Similar to restoring the state of individual mutable components, we can restore the state of all mutable components part of a system configuration in a network of machines:

$ disnixos-snapshot-network network.nix

And remove their obsolete state, by running:

$ disnixos-delete-network-state network.nix

TL;DR: Discussion


In this blog post, I have described an extension to Dysnomia that makes it possible to manage the state of mutable components belonging to a system configuration, and a NixOS module making it possible to automatically configure Dysnomia from a NixOS configuration file.

This new extension makes it possible to deploy mutable components belonging to systems that cannot be divided into distributable deployment units (or services in a Disnix-context), such as monolithic system configurations.

To summarize: if it is desired to manage the state of mutable components in a NixOS configuration, you need to provide a number of additional configuration settings. First, we must enable Dysnomia:

dysnomia.enable = true;

Then enable a number of container services, such as MySQL:

services.mysql.enable = true;

(As explained earlier, the Dysnomia module will automatically generate its corresponding container properties).

Finally, we can specify a number of available mutable components that can be deployed automatically, such as a MySQL database:

dysnomia.components = {
  mysql-database = {
    rooms = pkgs.writeTextFile {
      name = "rooms";
      text = ''
        create table room
        ( Room     VARCHAR(10)    NOT NULL,
          Zipcode  VARCHAR(6)     NOT NULL,
          PRIMARY KEY(Room)
        );
      '';
    };
  };
}

After deploying a Dysnomia-enabled NixOS system configuration through:

$ nixos-rebuild switch

We can deploy the mutable components belonging to it, by running:

$ dysnomia-containers --deploy

Unfortunately, managing mutable components on a system-level also has a huge drawback, in particular in distributed environments. Snapshots of entire system configurations are typically too coarse -- whenever the state of any of the mutable components change, a new system-level composite snapshot is generated that is composed of the snapshots of all mutable components.

Typically, these snapshots contain redundant data that is not shared among snapshot generations (although there are potential solutions to cope with this, I have not implemented any optimizations yet). As explained in my previous Dysnomia-related blog posts, snapshotting individual components can already be quite expensive (such as large databases), and these costs may become significantly larger on a system-level.

Likewise, restoring state on system-level implies that the state of all mutable components will be restored. This is also typically undesired as it may be too destructive and time consuming. Moreover, moving the state from one machine to another when a mutable components gets migrated is also much more expensive.

For more control and more efficient deployment of mutable components, it would typically be better to develop a Disnix service-model so that they can be managed individually.

Because of these drawbacks, I am not prominently advertising DisnixOS' distributed state management features. Moreover, I also did not attempt to integrate these features into NixOps, for the same reasons.

References


The dysnomia-containers tool as well as the distributed infrastructure management facilities have been integrated into the development versions of Dysnomia and DisnixOS, and will become part of the next Disnix release.

I have also added a sub example to the Java version of the Disnix staff tracker example to demonstrate how these features can be used.

As a final note, the Dysnomia NixOS module has not yet been integrated in NixOS. Instead, the module must be imported from a Dysnomia Git clone, by adding the following line to a NixOS configuration file:

imports = [ /home/sander/dysnomia/dysnomia-module.nix ];