Sander van der Burg's blog: Reconstructing Disnix deployment configurations

In two earlier blog posts, I have described Dynamic Disnix, an experimental framework enabling self-adaptive redeployment on top of Disnix. The purpose of this framework is to redeploy a service-oriented system whenever the conditions of the environment change, so that the system can still meet its functional and non-functional requirements.

An important category of events that change the environment are machines that crash and disappear from the network -- when a disappearing machine used to host a crucial service, a system can no longer meet its functional requirements. Fortunately, Dynamic Disnix is capable of automatically responding to such events by deploying the missing components elsewhere.

Although Dynamic Disnix supports the recovery of missing services, there is one particular kind of failure I did not take into account. In addition to potentially crashing target machines that host the services of which a service-oriented systems consist, the coordinator machine that initiates the deployment process and stores the deployment state could also disappear. When the deployment state gets lost, it is no longer possible to reliably update the system.

In this blog post, I will describe a new addition to the Disnix toolset that can be used to cope with these kinds of failures by reconstructing a coordinator machine's deployment configuration from the meta data stored on the target machines.

The Disnix upgrade workflow

As explained in earlier blog posts, Disnix requires three kinds of deployment models to carry out a deployment process: a services model capturing the components of which a system consists, an infrastructure model describing the available target machines and their properties, and a distribution model mapping services in the services model to target machines in the infrastructure model. By writing instances of these three models and running the following command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix will carry out all activities necessary to deploy the system: building the services and its intra-dependencies from source code, distributing the services and its intra-dependencies, and activating all services in the right order.

When changing any of the models and running the same command-line instruction again, Disnix attempts to upgrade the system by only rebuilding the aspects that have changed, and only deactivating the obsolete services and activating new services.

Disnix (as well as other Nix-related tools) attempt to optimize a redeployment process by only executing the steps that are required to reach a new deployment state. In Disnix, the building and distribution steps are optimized due to the fact that every package is stored in isolation the Nix store in which each package has a unique filename with a hash prefix, such as:

/nix/store/acv1y1zf7w0i6jx02kfa6gxyn2kfwj3l-firefox-48.0.2

As explained in a number of earlier blog posts, the hash prefix (acv1y1zf7w0i6jx02kfa6gxyn2kfwj3l...) is derived from all inputs used to build the package including its source code, build script, and libraries that it links to. That, for example, means that if we upgrade a system and nothing to the any of inputs of Firefox changes, we get an identical hash and if such a package build already exists, we do not have to build or transfer the package from an external site again.

The building step in Disnix produces a so-called low-level manifest file that is used by tools executing the remaining deployment activities:

<?xml version="1.0"?>
<manifest version="1">
  <distribution>
    <mapping>
      <profile>/nix/store/aiawhpk5irpjqj25kh6ah6pqfvaifm53-test1</profile>
      <target>test1</target>
    </mapping>
  </distribution>
  <activation>
    <mapping>
      <dependsOn>
        <dependency>
          <target>test1</target>
          <container>process</container>
          <key>d500194f55ce2096487c6d2cf69fd94a0d9b1340361ea76fb8b289c39cdc202d</key>
        </dependency>
      </dependsOn>
      <name>nginx</name>
      <service>/nix/store/aa5hn5n1pg2qbb7i8skr6vkgpnsjhlns-nginx-wrapper</service>
      <target>test1</target>
      <container>wrapper</container>
      <type>wrapper</type>
      <key>da8c3879ccf1b0ae34a952f36b0630d47211d7f9d185a8f2362fa001652a9753</key>
    </mapping>
  </activation>
  <targets>
    <target>
      <properties>
        <hostname>test1</hostname>
      </properties>
      <containers>
        <mongo-database/>
        <process/>
        <wrapper/>
      </containers>
      <system>x86_64-linux</system>
      <numOfCores>1</numOfCores>
      <clientInterface>disnix-ssh-client</clientInterface>
      <targetProperty>hostname</targetProperty>
    </target>
  </targets>
</manifest>

The above manifest file contains the following kinds of information:

The distribution element section maps Nix profiles (containing references to all packages implementing the services deployed to the machine) to target machines in the network. This information is used by the distribution step to transfer packages from the coordinator machine to a target machine.
The activation element section contains elements specifying which service to activate on which machine in the network including other properties relevant to the activation, such as the type plugin that needs to be invoked that takes care of the activation process. This information is used by the activation step.
The targets section contains properties of the machines in the network and is used by all tools that carry out remote deployment steps.
There is also an optional snapshots section (not shown in the code fragment above) that contains the properties of services whose state need to be snapshotted, transferred and restored in case their location changes.

When a Disnix (re)deployment process successfully completes, Disnix stores the above manifest as a Disnix coordinator Nix profile on the coorindator machine for future reference with the purpose to optimize the successive upgrade step -- when redeploying a system Disnix will compare the generated manifest with the previously deployed generated instance and only deactivate services that have become obsolete and activating services that are new, making upgrades more efficient than fresh installations.

Unfortunately, when the coordinator machine storing the manifests gets lost, then also the deployment manifest gets lost. As a result, a system can no longer be reliably upgraded -- without deactivating obsolete services, newly deployed services may conflict with services that are already running on the target machines preventing the system from working properly.

Reconstructible manifests

Recently, I have modified Disnix in such a way that the deployment manifests on the coordinator machine can be reconstructed. Each Nix profile that Disnix distributes to a target machine includes a so-called profile manifest file, e.g. /nix/store/aiawhpk5irpjqj25kh6ah6pqfvaifm53-test1/manifest. Previously, this file only contained the Nix store paths to the deployed services and was primarily used by the disnix-query tool to display the installed set of services per machines.

In the latest Disnix, I have changed the format of the profile manifest file to contain all required meta data so that the the activation mappings can be reconstructed on the coordinator machine:

stafftracker
/nix/store/mi7dn2wvwvpgdj7h8xpvyb04d1nycriy-stafftracker-wrapper
process
process
d500194f55ce2096487c6d2cf69fd94a0d9b1340361ea76fb8b289c39cdc202d
false
[{ target = "test2"; container = "process"; _key = "4827dfcde5497466b5d218edcd3326327a4174f2b23fd3c9956e664e2386a080"; } { target = "test2"; container = "process"; _key = "b629e50900fe8637c4d3ddf8e37fc5420f2f08a9ecd476648274da63f9e1ebcc"; } { target = "test1"; container = "process"; _key = "d85ba27c57ba626fa63be2520fee356570626674c5635435d9768cf7da943aa3"; }]

The above code fragment shows a portion of the profile manifest. It has a line-oriented structure in which every 7 lines represent the properties of a deployed service. The first line denotes the name of the service, second line the Nix store path, third line the Dysnomia container, fourth line the Dysnomia type, fifth line the hash code derived of all properties, sixth line whether the attached state must be managed by Disnix and the seventh line an encoding of the inter-dependencies.

The other portions of the deployment manifest can be reconstructed as follows: the distribution section can be derived by querying the Nix store paths of the installed profiles on the target machines, the snapshots section by checking which services have been marked as stateful and the targets section can be directly derived from a provided infrastructure model.

With the augmented data in the profile manifests on the target machines, I have developed a tool named disnix-reconstruct that can reconstruct a deployment manifest from all the meta data the manifests on the target machines provide.

I can now, for example, delete all the deployment manifest generations on the coordinator machine:

$ rm /nix/var/nix/profiles/per-user/sander/disnix-coordinator/*

and reconstruct the latest deployment manifest, by running:

$ disnix-reconstruct infrastructure.nix

The above command resolves the full paths to the Nix profiles on the target machines, then downloads their intra-dependency closures to the coordinator machine, reconstructs the deployment manifest from the profile manifests and finally installs the generated deployment manifest.

If the above command succeeds, then we can reliably upgrade a system again with the usual command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Extending the self-adaptive deployment framework

In addition to reconstructing deployment manifests that have gone missing, disnix-reconstruct offers another benefit -- the self-adaptive redeployment framework described in the two earlier blog posts is capable of responding to various kinds of events, including redeploying services to other machines when a machine crashes and disappears from the network.

However, when a machine disappears from the network and reappears at a later point in time, Disnix no longer knows about its configuration. When such a machine reappears in the network, this could have disastrous results.

Fortunately, by adding disnix-reconstruct to the framework we can solve this issue:

As shown in the above diagram, whenever a change in the infrastructure is detected, we reconstruct the deployment manifest so that Disnix knows which services are deployed to it. Then when the system is being redeployed, the services on the reappearing machines can also be upgraded or undeployed completely, if needed.

The automatic reconstruction feature can be used by providing the --reconstruct parameter to the self adapt tool:

$ dydisnix-self-adapt -s services.nix -i infrastructure.nix -q qos.nix \
  --reconstruct

Conclusion

In this blog post, I have described the latest addition to Disnix: disnix-reconstruct that can be used to reconstruct the deployment manifest on the coordinator machine from meta data stored on the target machines. With this addition, we can still update systems if the coordinator machine gets lost.

Furthermore, we can use this addition in the self-adaptive deployment framework to deal with reappearing machines that already have services deployed to them.

Finally, besides developing disnix-reconstruct, I have reached another stable point. As a result, I have decided to release Disnix 0.7. Consult the Disnix homepage for more information.

Sander van der Burg's blog

Wednesday, March 15, 2017

Reconstructing Disnix deployment configurations

The Disnix upgrade workflow

Reconstructible manifests

Extending the self-adaptive deployment framework

Conclusion

No comments:

Post a Comment