Sander van der Burg's blog: March 2015

Friday, March 13, 2015

Disnix 0.3 release announcement

In the previous blog post, I have explained the differences between Disnix (which does service deployment) and NixOps (which does infrastructure deployment), and shown that both tools can be used together to address both concerns in a deployment process.

Furthermore, I raised a couple of questions and intentionally left one unmentioned question unanswered. The unmentioned question that I was referring to is the following: "Is Disnix still alive or is it dead?".

The answer is that Disnix's development was progressing at a very low pace for some time after I left academia -- I made minor changes once in a while, but nothing really interesting happened.

However, for the last few months, I am using it on a daily basis and made many big improvements. Moreover, I have reached a stable point and decided that this is a good moment to announce the next release!

New features

So what is new in this release?

Visualization tool

I have added a new tool to the Disnix toolset named: disnix-visualize that generates Graphviz images to visualize a particular deployment scenario. An example image is shown below:

The above picture shows a deployment scenario of the StaffTracker Java example in which services are divided over two machines in a network and have all kinds of complex dependency relationships as denoted by the arrows.

The tool was already included in the development versions for a while, but has never been part of any release.

Dysnomia

I also made a major change in Disnix's architecture. As explained in my old blog post about Disnix, activating and deactivating services cannot be done generically and I have developed a plugin system to take care of that.

This plugin system package (formerly known as Disnix activation scripts) has now become an independent tool named Dysnomia and can also be applied in different contexts. Furthermore, it can also be used as a standalone tool.

For example, a running MySQL DBMS instance (called a container in Dysnomia terminology) could be specified in a configuration file (such as ~/mysql-production) as follows:

type=mysql-database
mysqlUsername=root
mysqlPassword=verysecret

A database can be encoded as an SQL file (~/test-database/createdb.sql) creating the schema:

create table author
( AUTHOR_ID  INTEGER       NOT NULL,
  FirstName  VARCHAR(255)  NOT NULL,
  LastName   VARCHAR(255)  NOT NULL,
  PRIMARY KEY(AUTHOR_ID)
);

create table books
( ISBN       VARCHAR(255)  NOT NULL,
  Title      VARCHAR(255)  NOT NULL,
  AUTHOR_ID  VARCHAR(255)  NOT NULL,
  PRIMARY KEY(ISBN),
  FOREIGN KEY(AUTHOR_ID) references author(AUTHOR_ID)
    on update cascade on delete cascade
);

We can use the following command-line instruction to let Dysnomia deploy the database to the MySQL DBMS container we have just specified earlier:

$ dysnomia --operation activate --component ~/test-database \
  --container ~/mysql-production

When Disnix has to execute deployment operations, two external tools are consulted -- Nix takes care of all deployment operations of the static parts of a system, and Dysnomia takes care of performing the dynamic activation and deactivation steps.

Concurrent closure transfers

In the previous versions of Disnix, only one closure (of a collection of services and its intra-dependencies) is transferred to a target machine at the time. If a target machine has more network bandwidth than the coordinator, this is usually fine, but in all other cases, it slows the deployment process down.

In the new version, two closures are transferred concurrently by default. The amount of concurrent closure transfers can be adjusted as follows:

$ disnix-env -s services.nix -i infrastructure.nix \
  -d distribution.nix --max-concurrent-transfers 4

The last command-line argument states that 4 closures should be transferred concurrently.

Concurrent service activation and deactivation

In the old Disnix versions, the activation and deactivation steps of a service on a target machine were executed sequentially, i.e. one service on a machine at the time. In all my old testcases these steps were quite cheap/quick, but now that I have encountered systems that are much bigger, I noticed that there is a lot of deployment time that we can save.

In the new implementation, Disnix tries to concurrently activate or deactivate one service per machine. The amount of services that can be concurrently activated or deactivated per machine can be raised in the infrastructure model:

{
  test1 = {
    hostname = "test1";
    numOfCores = 2;
  };
}

In the above infrastructure model, the numOfCores attribute states that two services can be concurrently activated/deactivated on machine test1. If this attribute has been omitted, it defaults to 1.

Multi connection protocol support

By default, Disnix uses an SSH protocol wrapper to connect to the target machines. There is also an extension available, called DisnixWebService, that uses SOAP + MTOM instead.

In the old version, changing the connection protocol means that every target machine should be reached with it. In the new version, you can also specify the target property and client interface in the infrastructure model to support multi connection protocol deployments:

{
  test1 = {
    hostname = "test1";
    targetProperty = "hostname";
    clientInterface = "disnix-ssh-client";
  };
  
  test2 = {
    hostname = "test2";
    targetEPR = http://test2:8080/DisnixWebService/services/DisnixWebService;
    targetProperty = "targetEPR";
    clientInterface = "disnix-soap-client";
  };
}

The above infrastructure model states the following:

To connect to machine: test1, the hostname attribute contains the address and the disnix-ssh-client tool should be invoked to connect it.
To connect to machine: test2, the targetEPR attribute contains the address and the disnix-soap-client tool should be invoked to connect it.

NixOps integration

As described in my previous blog post, Disnix does service deployment and can integrate NixOS' infrastructure deployment features with an extension called DisnixOS.

DisnixOS can now also be used in conjunction with NixOps -- NixOps can be used to instantiate and deploy a network of virtual machines:

$ nixops create ./network.nix ./network-ec2.nix -d ec2
$ nixops deploy -d ec2

and DisnixOS can be used to deploy services to them:

$ export NIXOPS_DEPLOYMENT=ec2
$ disnixos-env -s services.nix -n network.nix -d distribution.nix \
    --use-nixops

Omitted features

There are also a couple of features described in some older blog posts, papers, and my PhD thesis, which have not become part of the new Disnix release.

Dynamic Disnix

This is an extended framework built on top of Disnix supporting self-adaptive redeployment. Although I promised to make it part of the new release a long time ago, it did not happen. However, I did update the prototype to work with the current Disnix implementation, but it still needs refinements, documentation and other small things to make it usable.

Brave people who are eager to try it can pull the Dynamic Disnix repository from my GitHub page.

Snapshot/restore features of Dysnomia

In a paper I wrote about Dysnomia I also proposed state snapshotting/restoring facilities. These have not become part of the released versions of Dysnomia and Disnix yet.

The approach I have described is useful in some scenarios but also has a couple of very big drawbacks. Moreover, it also significantly alters the behavior of Disnix. I need to find a way to properly integrate these features in such a way that they do not break the standard approach. Moreover, these techniques must be selectively applied as well.

Conclusion

In this blog post, I have announced the availability of the next release of Disnix. Perhaps I should give it the codename: "Disnix Forever!" or something :-). Also, the release date (Friday the 13th) seems to be appropriate.

Moreover, the previous release was considered an advanced prototype. Although I am using Disnix on a daily basis now to eat my own dogfood, and the toolset has become much more usable, I would not yet classify this release as something that is very mature yet.

Disnix can be obtained by installing NixOS, through Nixpkgs or from the Disnix release page.

I have also updated the Disnix homepage a bit, which should provide you more information.

Monday, March 9, 2015

On NixOps, Disnix, service deployment and infrastructure deployment

I have written many software deployment related blog posts covering tools that are part of the Nix project. However, there is one tool that I have not elaborated about so far, namely: NixOps, which has become quite popular these days.

Although NixOps is quite popular, its availability also leads to a bit of confusion with another tool. Some Nix users, in particular newbies, are suffering from this. The purpose of this blog post is to clear up that confusion.

NixOps

NixOps is something that is advertised as the "NixOS-based cloud deployment tool". It basically expands NixOS' (a Linux distribution built around the Nix package manager) approach of deploying a complete system configuration from a declarative specification to networks of machines and instantiates and provisions the required machines (e.g. in an IaaS cloud environment, such as Amazon EC2) automatically if requested.

A NixOps deployment process is driven by one or more network models that encapsulate multiple (partial) NixOS configurations. In a standard NixOps workflow, network models are typically split into a logical network model capturing settings that are machine independent and a physical network model capturing machine specific properties.

For example, the following code fragment is a logical network model consisting of three machines capturing the configuration properties of a Trac deployment, a web-based issue tracker system:

{
  network.description = "Trac deployment";

  storage =
    { pkgs, ... }:

    { services.nfs.server.enable = true;
      services.nfs.server.exports = ''
        /repos 192.168.1.0/255.255.255.0(rw,no_root_squash)
      '';
      services.nfs.server.createMountPoints = true;
    };

  postgresql =
    { pkgs, ... }:

    { services.postgresql.enable = true;
      services.postgresql.package = pkgs.postgresql;
      services.postgresql.enableTCPIP = true;
      services.postgresql.authentication = ''
        local all all                trust
        host  all all 127.0.0.1/32   trust
        host  all all ::1/128        trust
        host  all all 192.168.1.0/24 trust
      '';
    };

  webserver =
    { pkgs, ... }:

    { fileSystems = [
        { mountPoint = "/repos";
          device = "storage:/repos";
          fsType = "nfs";
        }
      ];
      services.httpd.enable = true;
      services.httpd.adminAddr = "root@localhost";
      services.httpd.extraSubservices = [
        { serviceType = "trac"; }
      ];
      environment.systemPackages = [
        pkgs.pythonPackages.trac
        pkgs.subversion
      ];
    };
}

The three machines in the above example have the following purpose:

The first machine, named storage, is responsible for storing Subversion source code repositories in the following folder: /repos and makes the corresponding folder available as a NFS mount.
The second machine, named postgresql, runs a PostgreSQL database server storing the tickets.
The third machine, named webserver, runs the Apache HTTP server hosting the Trac web application front-end. Moreover, it mounts the /repos folder as a network file system connecting to the storage machine so that the Trac web application can view the Subversion repositories stored inside it.

The above specification can be considered a logical network model, because it captures the configuration we want to deploy, without any machine specific characteristics. Regardless of what kind of machine we intend deploy, we want these services to be available.

However, a NixOS configuration cannot be deployed without any machine specific settings. These remaining settings can be specified by writing a second model, the physical network model, capturing these:

{
  storage = 
    { pkgs, ...}:
    
    { boot.loader.grub.version = 2;
      boot.loader.grub.device = "/dev/sda";

      fileSystems = [
        { mountPoint = "/";
          label = "root";
        }
      ];

      swapDevices = [
        { label = "swap"; }
      ];

      networking.hostName = "storage";
    };

  postgresql = ...

  webserver = ...
}

The above partial network model specifies the following physical characteristics for the storage machine:

GRUB version 2 should be used as bootloader and should installed on the MBR of the hard drive partition: /dev/sda.
The hard drive partition with label: root should be mounted as root partition.
The hard drive partition with label: swap should be mounted as swap partition.
The hostname of the system should be: 'storage'

By invoking NixOps with the two network models shown earlier as parameters, we can create a NixOps deployment -- an environment containing a set of machines that belong together:

$ nixops create ./network-logical.nix ./network-physical.nix -d test

The above command creates a deployment named: test. We can run the following command to actually deploy the system configurations:

$ nixops deploy -d test

What the above command does is invoking the Nix package manager to build all the machine configurations, then it transfers their corresponding package closures to the target machines and finally activates the NixOS configurations. The end result is a collection of machines running the new configuration, if all previous steps have succeeded.

If we adapt any of the network models, and run the deploy command again, the system is upgraded. In case of an upgrade, only the packages that have been changed are built and transferred, making this phase as efficient as possible.

We can also replace the physical network model shown earlier with the following model:

{
  storage = {
    deployment.targetEnv = "virtualbox";
    deployment.virtualbox.memorySize = 1024;
  };

  postgresql = ...

  webserver = ...
}

The above physical network configuration states that the storage machine is a VirtualBox Virtual Machine (VM) requiring 1024 MiB of RAM.

When we instantiate a new deployment with the above physical network model and deploy it:

$ nixops create ./network-logical.nix ./network-vbox.nix -d vbox
$ nixops deploy -d vbox

NixOps does an extra step before doing the actual deployment of the system configurations -- it first instantiates the VMs by consulting VirtualBox and populates them with a basic NixOS disk image.

Similarly, we can also create a physical network model like this:

let
  region = "us-east-1";
  accessKeyId = "ABCD..."; # symbolic name looked up in ~/.ec2-keys

  ec2 =
    { resources, ... }:
    
    { deployment.targetEnv = "ec2";
      deployment.ec2.accessKeyId = accessKeyId;
      deployment.ec2.region = region;
      deployment.ec2.instanceType = "m1.medium";
      deployment.ec2.keyPair = resources.ec2KeyPairs.my-key-pair;
      deployment.ec2.securityGroups = [ "my-security-group" ];
    };
in
{
  storage = ec2;

  postgresql = ec2;

  webserver = ec2;

  resources.ec2KeyPairs.my-key-pair = {
    inherit region accessKeyId;
  };
}

The above physical network configuration states that the storage machine is a virtual machine residing in the Amazon EC2 cloud.

Running the following commands:

$ nixops create ./network-logical.nix ./network-ec2.nix -d ec2
$ nixops deploy -d ec2

Automatically instantiates the virtual machines in EC2, populates them with basic NixOS AMI images and finally deploys the machines to run our desired Trac deployment.

(In order to make EC2 deployment work, you need to create the security group (e.g. my-security-group) through the Amazon EC2 console first and you must set the AWS_SECRET_ACCESS_KEY environment variable to contain the secret access key that you actually need to connect to the Amazon services).

Besides physical machines, VirtualBox, and Amazon EC2, NixOps also supports the Google Computing Engine (GCE) and Hetzner. Moreover, preliminary Azure support is also available in the development version of NixOps.

With NixOps you can also do multi-cloud deployment -- it is not required to deploy all VMs in the same IaaS environment. For example, you could also deploy the first machine to Amazon EC2, the second to Hetzner and the third to a physical machine.

In addition to deploying system configurations, NixOps can be used to perform many other kinds of system administration tasks that work on machine level.

Disnix

Readers who happen to know me a bit, may probably notice that many NixOps features are quite similar to things I did in the past -- while I was working for Delft University of Technology as a PhD student, I was investigating distributed software deployment techniques and developed a tool named: Disnix that also performs distributed deployment tasks using the Nix package manager as underlying technology.

I have received quite a few questions from people asking me things such as: "What is the difference between Disnix and NixOps?", "Is NixOps the successor of/a replacement for Disnix?", "Are Disnix and NixOps in competition with each other?"

The short answer is: while both tools perform distributed deployment tasks and use the Nix package manager as underlying (local) deployment technology, they are designed for different purposes and address different concerns. Furthermore, they can also be effectively used together to automate deployment processes for certain kinds of systems.

In the next chapters I will try to clarify the differences and explain how they can be used together.

Infrastructure and service deployment

NixOps does something that I call infrastructure deployment -- it manages configurations that work on machine level and deploys entire system configurations as a whole.

What Disnix does is service deployment -- Disnix is primarily designed for deploying service-oriented systems. What "service-oriented system" is exactly supposed to mean has always been an open debate, but a definition I have seen in the literature is "systems composed of platform-independent entities that can be loosely coupled and automatically discovered" etc.

Disnix expects a system to be decomposed into distributable units (called services in Disnix terminology) that can be built and deployed independently to machines in a network. These components can be web services (systems composed of web services typically qualify themselves as service-oriented systems), but this is not a strict requirement. Services in a Disnix-context can be any unit that can be deployed independently, such as web services, UNIX processes, web applications, and databases. Even entire NixOS configurations can be considered a "service" by Disnix, since they are also a unit of deployment, although they are quite big.

Whereas NixOps builds, transfers and activates entire Linux system configurations, Disnix builds, transfers and activates individual services on machines in a network and manages/controls them and their dependencies individually. Moreover, the target machines to which Disnix deploys are neither required to run NixOS nor Linux. They can run any operating system and system distribution capable of running the Nix package manager.

Being able to deploy services to heterogeneous networks of machines is useful for service-oriented systems. Although services might manifest themselves as platform independent entities (e.g. because of their interfaces), they still have an underlying implementation that might be bound to technology that only works on a certain operating system. Furthermore, you might also want to have the ability to experiment with the portability of certain services among operating systems, or effectively use a heterogeneous network of operating systems to use their unique selling points effectively for the appropriate services (e.g. a Linux, Windows, OpenBSD hybrid).

For example, although I have mainly used Disnix to deploy services to Linux machines, I also once did an experiment with deploying services implemented with .NET technology to Windows machines running Windows specific system services (e.g. IIS and SQL server), because our industry partner in our research project was interested in this.

To be able to do service deployment with Disnix one important requirement must be met -- Disnix expects preconfigured machines to be present running a so-called "Disnix service" providing remote access to deployment operations and some other system services. For example, to allow Disnix to deploy Java web applications, a predeployed Servlet container, such as Apache Tomcat, must already be present on the target machine. Also other container services, such as a DBMS, may be required.

Disnix does not automate the deployment of machine configurations (that include the Disnix service and containers) requiring people to deploy a network of machines by other means first and writing an infrastructure model that reflects the machine configurations accordingly.

Combining service and infrastructure deployment

To be able to deploy a service-oriented system into a network of machines using Disnix, we must first deploy a collection of machines running some required system services first. In other words: infrastructure deployment is a prerequisite for doing service deployment.

Currently, there are two Disnix extensions that can be used to integrate service deployment and infrastructure deployment:

DisnixOS is an extension that complements Disnix with NixOS' deployment features to do infrastructure deployment. With this extension you can do tasks such as deploying a network of machines with NixOps first and then do service deployment inside the deployed network with Disnix.

Moreover, with DisnixOS you can also spawn a network of NixOS VMs using the NixOS test driver and run automated tests inside them.

A major difference from a user perspective between Disnix and DisnixOS is that the latter works with network models (i.e. networked NixOS configurations used by NixOps and the NixOS test driver) instead of infrastructure models and does the conversion between these models automatically.

A drawback of DisnixOS is that service deployment is effectively tied to NixOS, which is a Linux distribution. DisnixOS is not very helpful if a service-oriented system must be deployed in a heterogeneous network running multiple kinds of operating systems.
Dynamic Disnix. With this extension, each machine in the network is supposed to publish its configuration and a discovery service running on the coordinator machine generates an infrastructure model from the supplied settings. For each event in the network, e.g. a crashing machine, a newly added machine or a machine upgrade, a new infrastructure model is generated that can be used to trigger a redeployment.

The Dynamic Disnix approach is more powerful and not tied to NixOS specifically. Any infrastructure deployment approach (e.g. Norton Ghost for Windows machines) that includes the deployment of the Disnix service and container services can be used. Unfortunately, the Dynamic Disnix framework is still a rough prototype and needs to become more mature.

Is service deployment useful?

Some people have asked me: "Is service deployment really needed?", since it also possible to deploy services as part of a machine's configuration.

In my opinion it depends on the kinds of systems that you want to deploy and what problems you want to solve. For some kinds of distributed systems, Disnix is not really helpful. For example, if you want to deploy a cluster of DBMSes that are specifically tuned for specific underlying hardware, you cannot really make a decomposition into "distributable units" that can be deployed independently. Same thing with filesystem services, as shown in the Trac example -- doing an NFS mount is a deployment operation, but not a really an independent unit of deployment.

As a sidenote: That does not imply that you cannot do such things with Disnix. With Disnix you could still encapsulate an entire (or partial) machine specific configuration as a service and deploy that, or doing a network mount by deploying a script performing the mount, but that defeats its purpose.

At the same time, service-oriented systems can also be deployed on infrastructure level, but this sometimes leads to a number of inconveniences and issues. Let me illustrate that by giving an example:

The above picture reflects the architecture of one of the toy systems (Staff Tracker Java version) I have created for Disnix for demonstration purposes. The architecture consists of three layers:

Each component in the upper layer is a MySQL database storing certain kinds of information.
The middle layer encapsulates web services (implemented as Java web applications) responsible for retrieving and modifying data stored in the databases. An exception is the GeolocationService, which retrieves data by other means.
The bottom layer contains a Java web application that is consulted by end-users.

Each component in the above architecture is a distributable service and each arrow denotes dependency relationships between them which manifest themselves as HTTP and TCP connections. Because components are distributable, we could, for example, deploy all of them to one single machine, but we can also run each of them on a separate machine.

If we want to deploy the example system on infrastructure level, we may end up composing a networked machine configuration that looks as follows:

The above picture shows a deployment scenario in which the services are divided over two machines a network:

The MySQL databases are hosted inside a MySQL DBMS running on the first machine.
The web application front-end and one of the web services granting access to the databases are deployed inside the Apache Tomcat Servlet container running on the first machine.
The remaining web services are deployed inside an Apache Tomcat container running on the second machine.

When we use NixOps to deploy the above machine configurations, then the entire machine configurations are deployed and activated as a whole, which have a number of implications. For example, the containers have indirectly become dependent on each other as can be seen in the picture below in which I have translated the dependencies from service level to container level:

In principle, Apache Tomcat does not depend on MySQL, so under normal circumstances, these containers can be activated in any order. However, because we host a Java web application that requires a database, suddenly the order in which these services are activated does matter. If we would activate them in the wrong order then the web service (and also indirectly the web application front-end) will not work. (In extreme cases: if a system has been poorly developed, it may even crash and need to be manually reactivated again!)

Moreover, there is another implication -- the web application front-end also depends on services that are deployed to the second machine, and two of these services require access to databases deployed to the first machine. On container level, you could clearly see that this situation leads to two machines having a cyclic dependency on each other. That means that you cannot solve activation problems by translating service-level dependencies to machine-level dependencies.

As a matter of fact: NixOps allows cyclic dependencies between machines and activates their configurations in arbitrary order and is thus incapable of dealing with temporarily or permanent inconsistency issues (because of broken dependencies) while deploying a system as shown in the example.

Another practical issue with deploying such a system on infrastructure level is that it is tedious to do redeployment, for example when a requirement changes. You need to adapt machine configurations as a whole -- you cannot easily specify a different distribution scenario for services to machines.

As a final note, in some organisations (including a company that I have worked for in the past) it is common practice that infrastructure and service deployment are separated. For example, one department is responsible for providing machines and system services and another department (typically a development group) responsible for building the services and deploying them to the provided machines.

Conclusion

In this blog post, I have described NixOps and elaborated on the differences between NixOps and Disnix -- the former tool does infrastructure deployment, while the latter does service deployment. Infrastructure deployment is a prerequisite of doing service deployment and both tools can actually be combined to automate both concerns.

Service deployment is particularly useful for distributed systems that can be decomposed into "distributable units" (such as service-oriented systems), but not all kinds of distributed systems.

Moreover, NixOps is a tool that has been specifically designed to deploy NixOS configurations, while Disnix can deploy services to machines running any operating system capable of running the Nix package manager.

Finally, I have been trying to answer three questions, which I mentioned somewhere in the middle of this blog post. There is another question I have intentionally avoided that obviously needs an answer as well! I will elaborate more on this in the next blog post.

References

More information about Disnix, service and infrastructure deployment, and in particular: integrating deployment concerns can be found in my PhD thesis.

Interestingly enough, during my PhD thesis defence there was also a question about the difference between service and infrastructure deployment. This blog post is a more elaborate version of the answer I gave earlier. :-)