Sander van der Burg's blog: July 2015

Wednesday, July 29, 2015

Assigning port numbers to (micro)services in Disnix deployment models

I have been working on many Disnix related aspects for the last few months. For example, in my last blog post I have announced a new Disnix release supporting experimental state management.

Although I am quite happy with the most recent feature addition, another major concern that the basic Disnix toolset does not solve is coping with the dynamism of the services and the environment in which a system has been deployed.

Static modeling of services and the environment have the following consequences:

We must write an infrastructure model reflecting all relevant properties of all target machines. Although writing such a configuration file for a new environment is doable, it is quite tedious and error prone to keep it up to date and in sync with their actual configurations -- whenever a machine's property or the network changes, the infrastructure model must be updated accordingly.

(As a sidenote: when using the DisnixOS extension, a NixOS network model is used instead of an infrastructure model from which the machine's configurations can be automatically deployed making the consistency problem obsolete. However, the problem remains to persist if we need to deploy to a network of non-NixOS machines)
We must manually specify the distribution of services to machines. This problem typically becomes complicated if services have specific technical requirements on the host that they need to run (e.g. operating system, CPU architecture, infrastructure components such as an application server).

Moreover, a distribution could also be subject to non-functional requirements. For example, a service providing access to privacy-sensitive data should not be deployed to a machine that is publicly accessible from the internet.

Because requirements may be complicated, it is typically costly to repeat the deployment planning process whenever the network configuration changes, especially if the process is not automated.

To cope with the above listed issues, I have developed a prototype extension called Dynamic Disnix and wrote a paper about it. The extension toolset provides the following:

A discovery service that captures the properties of the machines in the network from which an infrastructure model is generated.
A framework allowing someone to automate deployment planning processes using a couple of algorithms described in the literature.

Besides the dynamism of the infrastructure model and distribution models, I also observed that the services model (capturing the components of which a system consists) may be too static in certain kinds of situations.

Microservices

Lately, I have noticed that some kind of new paradigm named Microservice architectures is gaining a lot of popularity. In many ways this new trend reminds me of the service-oriented architectures days -- everybody was talking about it and had success stories, but nobody had a full understanding of it, nor an idea what it exactly was supposed to mean.

However, if I would restrict myself to some of their practical properties, microservices (like "ordinary" services in a SOA-context) are software components and one important trait (according to Clemens Szyperski's Component Software book) is that a software component:

is a unit of independent deployment

Another important property of microservices is that they interact with other by sending messages through the HTTP communication protocol. In practice, many people accomplish this by running processes with an embedded HTTP server (as opposed to using application servers or external web servers).

Deploying Microservices with Disnix

Although Disnix was originally developed to deploy a "traditional" service-oriented system case-study (consisting of "real" web services using SOAP as communication protocol), it has been made flexible enough to deploy all kinds of components. Likewise, Disnix can also deploy components that qualify themselves as microservices.

However, when deploying microservices (running embedded HTTP servers) there is one practical issue -- every microservice must listen on their own unique TCP port on a machine. Currently, meeting this requirement is completely the responsibility of the person composing the Disnix deployment models.

In some cases, this problem is more complicated than expected. For example, manually assigning a unique TCP port to every service for the initial deployment is straight forward, but it may also be desired to move a service from one machine to another. It could happen that a previously assigned TCP port will conflict with another service after moving it, breaking the deployment of the system.

The port assignment problem

So far, I take the following aspects into account when assignment ports:

Each service must listen on a port that is unique to the machine the service runs on. In some cases, it may also be desirable to assign a port that is unique to the entire network (instead of a single machine) so that it can be uniformly accessed regardless of its location.
The assigned ports must be within a certain range so that (for example) they do not collide with system services.
Once a port number has been assigned to a service, it must remain reserved until it gets undeployed.

The alternative would be to reassign all port numbers to all services for each change in the network, but that can be quite costly in case of an upgrade. For example, if we upgrade a network running 100 microservices, all 100 of them may need to be deactivated and activated to make them listen on their newly assigned ports.

Dynamically configuring ports in Disnix models

Since it is quite tedious and error prone to maintain port assignments in Disnix models, I have developed a utility to automate the process. To dynamically assign ports to services, they must be annotated with the portAssign property in the services model (which can be changed to any other property through a command-line parameter):

{distribution, system, pkgs}:

let
  portsConfiguration = if builtins.pathExists ./ports.nix
    then import ./ports.nix else {};
  ...
in
rec {
  roomservice = rec {
    name = "roomservice";
    pkg = customPkgs.roomservicewrapper { inherit port; };
    dependsOn = {
      inherit rooms;
    };
    type = "process";
    portAssign = "private";
    port = portsConfiguration.ports.roomservice or 0;
  };

  ...

  stafftracker = rec {
    name = "stafftracker";
    pkg = customPkgs.stafftrackerwrapper { inherit port; };
    dependsOn = {
      inherit roomservice staffservice zipcodeservice;
    };
    type = "process";
    portAssign = "shared";
    port = portsConfiguration.ports.stafftracker or 0;
    baseURL = "/";
  };
}

In the above example, I have annotated the roomservice component with a private port assignment property meaning that we want to assign a TCP port that is unique to the machine and the stafftracker component with a shared port assignment meaning that we want to assign a TCP port that is unique to the network.

By running the following command we can assign port numbers:

$ dydisnix-port-assign -s services.nix -i infrastructure.nix \
    -d distribution.nix > ports.nix

The above command generates a port assignment configuration Nix expression (named: ports.nix) that contains port reservations for each service and port assignment configurations for the network and each individual machine:

{
  ports = {
    roomservice = 8001;
    ...
    zipcodeservice = 3003;
  };
  portConfiguration = {
    globalConfig = {
      lastPort = 3003;
      minPort = 3000;
      maxPort = 4000;
      servicesToPorts = {
        stafftracker = 3002;
      };
    };
    targetConfigs = {
      test2 = {
        lastPort = 8001;
        minPort = 8000;
        maxPort = 9000;
        servicesToPorts = {
          roomservice = 8001;
        };
      };
    };
  };
}

The above configuration attribute set contains three properties:

The ports attribute contains the actual port numbers that have been assigned to each service. The services defined in the services model (shown earlier) refer to the port values defined here.
The portConfiguration attribute contains port configuration settings for the network and each target machine. The globalConfig attribute defines a TCP port range with ports that must be unique to the network. Besides the port range it also stores the last assigned TCP port number and all global port reservations.
The targetConfigs attribute contains port configuration settings and reservations for each target machine.

We can also run the port assign command-utility again with an existing port assignment configuration as a parameter:

$ dydisnix-port-assign -s services.nix -i infrastructure.nix \
    -d distribution.nix -p ports.nix > ports2.nix

The above command-line invocation reassigns TCP ports, taking the previous port reservations into account so that these will be reused where possible (e.g. only new services get a port number assigned). Furthermore, it also clears all port reservations of the services that have been undeployed. The new port assignment configuration is stored in a file called ports2.nix.

Conclusion

In this blog post, I have identified another deployment planning problem that manifests itself when deploying microservices that all have to listen on a unique TCP port. I have developed a utility to automate this process.

Besides assigning port numbers, there are many other kinds of problems that need a solution while deploying microservices. For example, you might also want to restrict their privileges (e.g. by running all of them as separate unprivileged users). It is also possible to take care of that with Dysnomia.

Availability

The dydisnix-port-assign utility is part of the Dynamic Disnix toolset that can be obtained from my GitHub page. Unfortunately, the Dynamic Disnix toolset is still a prototype with no end-user documentation or a release, so you have to be brave to use it.

Moreover, I have created yet another Disnix example package (a Node.js variant of the ridiculous StaffTracker example) to demonstrate how "microservices" can be deployed. This particular variant uses Node.js as implementation platform and exposes the data sets through REST APIs. All components are microservices using Node.js' embedded HTTP server listening on their own unique TCP ports.

I have also modified the TCP proxy example to use port assignment configurations generated by the tool described in this blog post.

Wednesday, July 8, 2015

Deploying state with Disnix

A couple of months ago, I announced a new Disnix release after a long period of only little development activity.

As I have explained earlier, Disnix's main purpose is to automatically deploy service-oriented systems into heterogeneous networks of machines running various kinds of operating systems.

In addition to automating deployment, it has a couple of interesting non-functional properties as well. For example, it supports reliable deployment, because components implementing services are stored alongside existing versions and older versions are never automatically removed. As a result, we can always roll back to the previous configuration in case of a failure.

However, there is one major unaddressed concern when using Disnix to deploy a service-oriented system. Like the Nix the package manager -- that serves as the basis of Disnix --, Disnix does not manage state.

The absence of state management has a number of implications. For example, when deploying a database, it gets created on first startup, often with a schema and initial data set. However, the structure and contents of a database typically evolves over time. When updating a deployment configuration that (for example) moves a database from one machine to another, the changes that have been made since its initial deployment are not migrated.

So far, state management in combination with Disnix has always been a problem that must be solved manually or by using an external solution. For a single machine, manual state management is often tedious but still doable. For large networks of machines, however, it may become a problem that is too big too handle.

A few years ago, I rushed out a prototype tool called Dysnomia to address state management problems in conjunction with Disnix and wrote a research paper about it. In the last few months, I have integrated the majority of the concepts of this prototype into the master versions of Dysnomia and Disnix.

Executing state management activities

When deploying a service oriented system with Disnix, a number of deployment activities are executed. For the build and distribution activities, Disnix consults the Nix package manager.

After all services have been transferred, Disnix activates them and deactivates the ones that have become obsolete. Disnix consults Dysnomia to execute these activities through a plugin system that delegates the execution of these steps to an appropriate module for a given service type, such as a process, source code repository or a database.

Deployment activities carried out by Dysnomia require two mandatory parameters. The first parameter is a container specification capturing the properties of a container that hosts one or more mutable components. For example, a MySQL DBMS instance can be specified as follows:

type=mysql-database
mysqlUsername=root
mysqlPassword=verysecret

The above specification states the we have a container of type mysql-database that can be reached using the above listed credentials. The type attribute allows Dysnomia to invoke the module that executes the required deployment steps for MySQL.

The second parameter refers to a logical representation of the initial state of a mutable component. For example, a MySQL database is represented as a script that generates its schema:

create table author
( AUTHOR_ID  INTEGER       NOT NULL,
  FirstName  VARCHAR(255)  NOT NULL,
  LastName   VARCHAR(255)  NOT NULL,
  PRIMARY KEY(AUTHOR_ID)
);

create table books
( ISBN       VARCHAR(255)  NOT NULL,
  Title      VARCHAR(255)  NOT NULL,
  AUTHOR_ID  VARCHAR(255)  NOT NULL,
  PRIMARY KEY(ISBN),
  FOREIGN KEY(AUTHOR_ID) references author(AUTHOR_ID)
    on update cascade on delete cascade
);

A MySQL database can be activated in a MySQL DBMS, by running the following command-line instruction with the two configuration files shown earlier as parameters:

$ dysnomia --operation activate \
  --component ~/testdb \
  --container ~/mysql-production

The above command first checks if a MySQL database named testdb exists. If it does not exists, it gets created and the initial schema is imported. If the database with the given name exists already, it does nothing.

With the latest Dysnomia, it is also possible to run snapshot operations:

$ dysnomia --operation snapshot \
  --component ~/testdb \
  --container ~/mysql-production

The above command invokes the mysqldump utility to take a snapshot of the testdb in a portable and consistent manner and stores the output in a so-called Dysnomia snapshot store.

When running the following command-line instruction, the contents of the snapshot store is displayed for the MySQL container and testdb component:

$ dysnomia-snapshots --query-all --container mysql-database --component testdb
mysql-production/testdb/9b0c3562b57dafd00e480c6b3a67d29146179775b67dfff5aa7a138b2699b241
mysql-production/testdb/1df326254d596dd31d9d9db30ea178d05eb220ae51d093a2cbffeaa13f45b21c
mysql-production/testdb/330232eda02b77c3629a4623b498855c168986e0a214ec44f38e7e0447a3f7ef

As may be observed, the dysnomia-snapshots utility outputs three relative paths that correspond to three snapshots. The paths reflect over a number of properties, such as the container name and component name. The last path component is a SHA256 hash code reflecting its contents (that is computed from the actual dump).

Each container type follows its own naming convention to reflect its contents. While MySQL and most of the other Dysnomia modules use output hashes, also different naming conventions are used. For example, the Subversion module uses the revision id of the repository.

A naming convention using an attribute to reflect its contents has all kinds of benefits. For example, if the MySQL database does not change and we run the snapshot operation again, it discovers that a snapshot with the same output hash already exists, preventing it to store the same snapshot twice improving storage efficiency.

The absolute versions of the snapshot paths can be retrieved with the following command:

$ dysnomia-snapshots --resolve mysql-database/testdb/330232eda02b77c3629a4623b498855c...
/var/state/dysnomia/snapshots/mysql-production/testdb/330232eda02b77c3629a4623b498855...

Besides snapshotting, it is also possible to restore state with Dysnomia:

$ dysnomia --operation restore \
  --component ~/testdb \
  --container ~/mysql-production

The above command restores the latest snapshot generation. If no snapshot exist in the store, it does nothing.

Finally, it is also possible to clean things up. Similar to the Nix package manager, old components are never deleted automatically, but must be explicitly garbage collected. For example, deactivating the MySQL database can be done as follows:

$ dysnomia --operation deactivate \
  --component ~/testdb \
  --container ~/mysql-production

The above command does not delete the MySQL database. Instead, it simply marks it as garbage, but otherwise keeps it. Actually deleting the database can be done by invoking the garbage collect operation:

$ dysnomia --operation collect-garbage \
  --component ~/testdb \
  --container ~/mysql-production

The above command first checks whether the database has been marked as garbage. If this is the case (because it has been deactivated) it is dropped. Otherwise, this command does nothing (because we do not want to delete stuff that is actually in use).

Besides the physical state of components, also all generations of snapshots in the store are kept by default. They can be removed by running the snapshot garbage collector:

$ dysnomia-snapshots --gc --keep 3

The above command states that all but the last 3 snapshot generations should be removed from the snapshot store.

Managing state of service-oriented systems

With the new snapshotting facilities provided by Dysnomia, we have extended Disnix to support state deployment of service-oriented systems.

By default, the new version of Disnix does not manage state and its behaviour remains exactly the same as the previous version, i.e. it only manages the static parts of the system. To allow Disnix to manage state of services, they must be explicitly annotated as such in the services model:

staff = {
  name = "staff";
  pkg = customPkgs.staff;
  dependsOn = {};
  type = "mysql-database";
  deployState = true;
};

Adding the attribute deployState to a service that is set to true causes Disnix to manage its state as well. For example, when changing the target machine of the database in the distribution model and by running the following command:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix executes the data migration phase after the configuration has been successfully activated. In this phase, Disnix snapshots the state of the annotated services on the target machines, transfers the snapshots to the new targets (through the coordinator machine), and finally restores their state.

In addition to data migration, Disnix can also be used as a backup tool. Running the following command:

$ disnix-snapshot

Captures the state of all annotated services in the configuration that has been previously deployed and transfers their snapshots to the coordinator machine's snapshot store.

Likewise, the snapshots can be restored as follows:

$ disnix-restore

By default, the above command only restores the state of the services that are in the last configuration, but not in the configuration before. However, it may also be desirable to force the state of all annotated services in the current configuration to be restored. This can be done as follows:

$ disnix-restore --no-upgrade

Finally, the snapshots that are taken on the target machines are not deleted automatically. Disnix can also automatically clean the snapshot stores of a network of machines:

$ disnix-clean-snapshots --keep 3 infrastructure.nix

The above command deletes all but the last three snapshot generations from all machines defined in the infrastructure model.

Discussion

The extended implementations of Dysnomia and Disnix implement the majority of concepts described in my earlier blog post and the corresponding paper. However, there are a number of things that are different:

The prototype implementation stores snapshots in the /dysnomia folder (analogous to the Nix store that resides in /nix/store), which is a non-FHS compliant directory. Nix has a number of very good reasons to deviate from the FHS and requires packages to be addressed by their absolute paths across machines so that they can be uniformly accessed by a dynamic linker.

However, such a level of strictness is not required for addressing snapshots. In the current implementation, snapshots are stored in /var/state/dysnomia which is FHS-compliant. Furthermore, snapshots are identified by their relative paths to the store folder. The snapshot store's location can be changed by setting the DYSNOMIA_STATEDIR environment variable, allowing someone to have multiple snapshot stores.
In the prototype, the semantics of the deactivate operation also imply deleting the state of mutable component in a container. As this is a dangerous and destructive operation, the current implementation separates the actual delete operation into a garbage collect operation that must be invoked explicitly.
In both the prototype and the current implementation, a Dysnomia plugin can choose its own naming convention to identify snapshots. In the prototype, the naming must reflect both the contents and the order in which the snapshots have been taken. As a general fallback, I proposed using timestamps.

However, timestamps are unreliable in a distributed setting because the machines' clocks may not be in sync. In the current implementation, I use output hashes as a general fallback. As hashes cannot reflect the order in their names, Dysnomia provides a generations folder containing symlinks to snapshots which names reflect the order in which they have been taken.

The paper also describes two concepts that are still unimplemented in the current master version:

The incremental snapshot operation is unimplemented. Although this feature may sound attractive, I could only properly do this with Subversion repositories and MySQL databases with binary logging enabled.
To upgrade a service-oriented system (that includes moving state) atomically, access to the system in the transition and data migration phases must be blocked/queued. However, when moving large data sets, this time window could be incredibly big.

As an optimization, I proposed an improved upgrade process in which incremental snapshots are transferred inside the locking time window, while full snapshots are transferred before the locking starts. Although it may sound conceptually nice, it is difficult to properly apply it in practice. I may still integrate it some day, but currently I don't need it. :)

Finally, there are some practical notes when using Dysnomia's state management facilities. Its biggest drawback is that the way state is managed (by consulting tools that store dumps on the filesystem) is typically expensive in terms of time (because it may take a long time writing a dump to disk) and storage. For very large databases, the costs may actually be too high.

As described in the previous blog post and the corresponding paper, there are alternative ways of doing state management:

Filesystem-level snapshotting is typically faster since files only need to be copied. However, its biggest drawback is that physical state may be inconsistent (because of unfinished write operations) and non-portable. Moreover, it may be difficult to manage individual chunks of state. NixOps, for example, supports partition-level state management of EBS volumes.
Database replication engines can also typically capture and transfer state much more efficiently.

Because Dysnomia's way of managing state has some expensive drawbacks, it has not been enabled by default in Disnix. Moreover, this was also the main reason why I did not integrate the features of the Dysnomia prototype sooner.

The reason why I have proceeded anyway, is that I have to manage a big environment of small databases, which sizes are only several megabytes each. For such an environment, Dysnomia's snapshotting facilities work fine.

Availability

The state management facilities described in this blog post are part of Dysnomia and Disnix version 0.4. I also want to announce their immediate availability! Visit the Disnix homepage for more information!

As with the previous release, Disnix still remains a tool that should be considered an advanced prototype, despite the fact that I am using it on a daily basis to eat my own dogfood. :)