Sander van der Burg's blog: March 2017

Friday, March 31, 2017

Subsituting impure version specifiers in node2nix generated package compositions

In a number of previous blog posts, I have described node2nix, a tool that can be used to automatically integrate NPM packages into the Nix packages ecosystem. The biggest challenge in making this integration possible is the fact that NPM does dependency management in addition to build management -- NPM's dependency management properties conflict with Nix's purity principles.

Dealing with a conflicting dependency manager is quite simple from a conceptual perspective -- you must substitute it by a custom implementation that uses Nix to obtain all required dependencies. The remaining responsibilities (such as build management) are left untouched and still have to be carried out by the guest package manager.

Although conceptually simple, implementing such a substitution approach is much more difficult than expected. For example, in my previous blog posts I have described the following techniques:

Extracting dependencies. In addition to the package we intend to deploy with Nix, we must also include all its dependencies and transitive dependencies in the generation process.
Computing output hashes. In order to make package deployments deterministic, Nix requires that the output hashes of downloads are known in advance. As a result, we must examine all dependencies and compute their corresponding SHA256 output hashes. Some NPM projects have thousands of transitive dependencies that need to be analyzed.
Snapshotting versions. Nix uses SHA256 hash codes (derived from all inputs to build a package) to address specific variants or versions of packages whereas version specifiers in NPM package.json configurations are nominal -- they permit version ranges and references to external artifacts (such as Git repositories and external URLs).

For example, a version range of >= 1.0.3 might resolve to version 1.0.3 today and to version 1.0.4 tomorrow. Translating a version range to a Nix package with a hash code identifier breaks the ability for Nix to guarantee that a package with a specific hash code yields a (nearly) bit identical build.

To ensure reproducibility, we must snapshot the resolved version of these nominal dependency version specifiers (such as a version range) at generation time and generate the corresponding Nix expression for the resulting snapshot.
Simulating shared and private dependencies. In NPM projects, dependencies of a package are stored in the node_modules/ sub folder of the package. Each dependency can have private dependencies by putting them in their corresponding node_modules/ sub folder. Sharing dependencies is also possible by placing the corresponding dependency in any of the parent node_modules/ sub folders.

Moreover, although this is not explicitly advertised as such, NPM implicitly supports cyclic dependencies and is able cope with them because it will refuse to install a dependency in a node_modules/ sub folder if any parent folder already provides it.

When generating Nix expressions, we must replicate the exact same behaviour when it comes to private and shared dependencies. This is particularly important to cope with cyclic dependencies -- the Nix package manager does not allow them and we have to break any potential cycles at generation time.
Simulating "flat module" installations. In NPM versions older than 3.0, every dependency was installed privately by default unless a shared dependency exists that fits within the required version range.

In newer NPM versions, this strategy has been reversed -- every dependency will be shared as much as possible until a conflict has been encountered. This means that we have to move dependencies as high up in the node_modules/ folder hierarchy as possible which is an imperative operation -- in Nix this is a problem, because packages are cannot be changed after they have been built.

To cope with flattening, we must compute the implications of flattening the dependency structure in advance at generation time.

With the above techniques it is possible construct a node_modules/ directory structure having a nearly identical structure that NPM would normally compose with a high degree of accuracy.

Impure version specifiers

Even if it would be possible to reproduce the node_modules/ directory hierarchy with 100% accuracy, there is another problem that remains -- some version specifiers always trigger network communication regardless whether the dependencies have been provided or not, such as:

[
  { "node2nix": "latest" }
, { "nijs": "git+https://github.com/svanderburg/nijs.git#master" }
, { "prom2cb": "github:svanderburg/prom2cb" }
]

When referring to tags or Git branches, NPM is unable to determine to which version a package resolves. As a consequence, it attempts to retrieve the corresponding packages to investigate even when a compatible version in the node_modules/ directory hierarchy already exists.

While performing package builds, Nix takes various precautions to prevent side effects from influencing builds including network connections. As a result, an NPM package deployment will still fail despite the fact that a compatible dependency has already been provided.

In the package builder Nix expression provided by node2nix, I used to substitute these version specifiers in the package.json configuration files by a wildcard: '*'. Wildcards used to work fine for old Node.js 4.x/NPM 2.x installations, but with NPM 3.x flat module installations they became another big source of problems -- in order to make flat module installations work, NPM needs to know to which version a package resolves to determine whether it can be shared on a higher level in the node_modules/ folder hierarchy or not. Wildcards prevent NPM from making these comparisons, and as a result, some package deployments fail that did not use to fail with older versions of NPM.

Pinpointing version specifiers

In the latest node2nix I have solved these issues by implementing a different substitution strategy -- instead of substituting impure version specifiers by wildcards, I pinpoint all the dependencies to the exact version numbers to which these dependencies resolve. Internally, NPM addresses all dependencies by their names and version numbers only (this also has a number of weird implications, because it disregards the origins of these dependencies, but I will not go into detail on that).

I got the inspiration for this pinpointing strategy from the yarn package manager (an alternative to NPM developed by Facebook) -- when deploying a project with yarn, yarn pinpoints the installed dependencies in a so-called yarn.lock file so that package deployments become reproducible when a system is deployed for a second time.

The pinpointing strategy will always prevent NPM from consulting external resources (under the condition that we have provided the package by our substitute dependency manager first) and always provide version numbers for any dependency so that NPM can perform flat module installations. As a result, the accuracy of node2nix with newer versions of NPM has improved quite a bit.

Availability

The pinpointing strategy is part of the latest node2nix that can be obtained from the NPM registry or the Nixpkgs repository.

One month ago, I have given a talk about node2nix at FOSDEM 2017 summarizing the techniques discussed in my blog posts written so far. For convenience, I have embedded the slides into this web page:

Deploying NPM packages with the Nix package manager from Sander van der Burg

Wednesday, March 15, 2017

Reconstructing Disnix deployment configurations

In two earlier blog posts, I have described Dynamic Disnix, an experimental framework enabling self-adaptive redeployment on top of Disnix. The purpose of this framework is to redeploy a service-oriented system whenever the conditions of the environment change, so that the system can still meet its functional and non-functional requirements.

An important category of events that change the environment are machines that crash and disappear from the network -- when a disappearing machine used to host a crucial service, a system can no longer meet its functional requirements. Fortunately, Dynamic Disnix is capable of automatically responding to such events by deploying the missing components elsewhere.

Although Dynamic Disnix supports the recovery of missing services, there is one particular kind of failure I did not take into account. In addition to potentially crashing target machines that host the services of which a service-oriented systems consist, the coordinator machine that initiates the deployment process and stores the deployment state could also disappear. When the deployment state gets lost, it is no longer possible to reliably update the system.

In this blog post, I will describe a new addition to the Disnix toolset that can be used to cope with these kinds of failures by reconstructing a coordinator machine's deployment configuration from the meta data stored on the target machines.

The Disnix upgrade workflow

As explained in earlier blog posts, Disnix requires three kinds of deployment models to carry out a deployment process: a services model capturing the components of which a system consists, an infrastructure model describing the available target machines and their properties, and a distribution model mapping services in the services model to target machines in the infrastructure model. By writing instances of these three models and running the following command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Disnix will carry out all activities necessary to deploy the system: building the services and its intra-dependencies from source code, distributing the services and its intra-dependencies, and activating all services in the right order.

When changing any of the models and running the same command-line instruction again, Disnix attempts to upgrade the system by only rebuilding the aspects that have changed, and only deactivating the obsolete services and activating new services.

Disnix (as well as other Nix-related tools) attempt to optimize a redeployment process by only executing the steps that are required to reach a new deployment state. In Disnix, the building and distribution steps are optimized due to the fact that every package is stored in isolation the Nix store in which each package has a unique filename with a hash prefix, such as:

/nix/store/acv1y1zf7w0i6jx02kfa6gxyn2kfwj3l-firefox-48.0.2

As explained in a number of earlier blog posts, the hash prefix (acv1y1zf7w0i6jx02kfa6gxyn2kfwj3l...) is derived from all inputs used to build the package including its source code, build script, and libraries that it links to. That, for example, means that if we upgrade a system and nothing to the any of inputs of Firefox changes, we get an identical hash and if such a package build already exists, we do not have to build or transfer the package from an external site again.

The building step in Disnix produces a so-called low-level manifest file that is used by tools executing the remaining deployment activities:

<?xml version="1.0"?>
<manifest version="1">
  <distribution>
    <mapping>
      <profile>/nix/store/aiawhpk5irpjqj25kh6ah6pqfvaifm53-test1</profile>
      <target>test1</target>
    </mapping>
  </distribution>
  <activation>
    <mapping>
      <dependsOn>
        <dependency>
          <target>test1</target>
          <container>process</container>
          <key>d500194f55ce2096487c6d2cf69fd94a0d9b1340361ea76fb8b289c39cdc202d</key>
        </dependency>
      </dependsOn>
      <name>nginx</name>
      <service>/nix/store/aa5hn5n1pg2qbb7i8skr6vkgpnsjhlns-nginx-wrapper</service>
      <target>test1</target>
      <container>wrapper</container>
      <type>wrapper</type>
      <key>da8c3879ccf1b0ae34a952f36b0630d47211d7f9d185a8f2362fa001652a9753</key>
    </mapping>
  </activation>
  <targets>
    <target>
      <properties>
        <hostname>test1</hostname>
      </properties>
      <containers>
        <mongo-database/>
        <process/>
        <wrapper/>
      </containers>
      <system>x86_64-linux</system>
      <numOfCores>1</numOfCores>
      <clientInterface>disnix-ssh-client</clientInterface>
      <targetProperty>hostname</targetProperty>
    </target>
  </targets>
</manifest>

The above manifest file contains the following kinds of information:

The distribution element section maps Nix profiles (containing references to all packages implementing the services deployed to the machine) to target machines in the network. This information is used by the distribution step to transfer packages from the coordinator machine to a target machine.
The activation element section contains elements specifying which service to activate on which machine in the network including other properties relevant to the activation, such as the type plugin that needs to be invoked that takes care of the activation process. This information is used by the activation step.
The targets section contains properties of the machines in the network and is used by all tools that carry out remote deployment steps.
There is also an optional snapshots section (not shown in the code fragment above) that contains the properties of services whose state need to be snapshotted, transferred and restored in case their location changes.

When a Disnix (re)deployment process successfully completes, Disnix stores the above manifest as a Disnix coordinator Nix profile on the coorindator machine for future reference with the purpose to optimize the successive upgrade step -- when redeploying a system Disnix will compare the generated manifest with the previously deployed generated instance and only deactivate services that have become obsolete and activating services that are new, making upgrades more efficient than fresh installations.

Unfortunately, when the coordinator machine storing the manifests gets lost, then also the deployment manifest gets lost. As a result, a system can no longer be reliably upgraded -- without deactivating obsolete services, newly deployed services may conflict with services that are already running on the target machines preventing the system from working properly.

Reconstructible manifests

Recently, I have modified Disnix in such a way that the deployment manifests on the coordinator machine can be reconstructed. Each Nix profile that Disnix distributes to a target machine includes a so-called profile manifest file, e.g. /nix/store/aiawhpk5irpjqj25kh6ah6pqfvaifm53-test1/manifest. Previously, this file only contained the Nix store paths to the deployed services and was primarily used by the disnix-query tool to display the installed set of services per machines.

In the latest Disnix, I have changed the format of the profile manifest file to contain all required meta data so that the the activation mappings can be reconstructed on the coordinator machine:

stafftracker
/nix/store/mi7dn2wvwvpgdj7h8xpvyb04d1nycriy-stafftracker-wrapper
process
process
d500194f55ce2096487c6d2cf69fd94a0d9b1340361ea76fb8b289c39cdc202d
false
[{ target = "test2"; container = "process"; _key = "4827dfcde5497466b5d218edcd3326327a4174f2b23fd3c9956e664e2386a080"; } { target = "test2"; container = "process"; _key = "b629e50900fe8637c4d3ddf8e37fc5420f2f08a9ecd476648274da63f9e1ebcc"; } { target = "test1"; container = "process"; _key = "d85ba27c57ba626fa63be2520fee356570626674c5635435d9768cf7da943aa3"; }]

The above code fragment shows a portion of the profile manifest. It has a line-oriented structure in which every 7 lines represent the properties of a deployed service. The first line denotes the name of the service, second line the Nix store path, third line the Dysnomia container, fourth line the Dysnomia type, fifth line the hash code derived of all properties, sixth line whether the attached state must be managed by Disnix and the seventh line an encoding of the inter-dependencies.

The other portions of the deployment manifest can be reconstructed as follows: the distribution section can be derived by querying the Nix store paths of the installed profiles on the target machines, the snapshots section by checking which services have been marked as stateful and the targets section can be directly derived from a provided infrastructure model.

With the augmented data in the profile manifests on the target machines, I have developed a tool named disnix-reconstruct that can reconstruct a deployment manifest from all the meta data the manifests on the target machines provide.

I can now, for example, delete all the deployment manifest generations on the coordinator machine:

$ rm /nix/var/nix/profiles/per-user/sander/disnix-coordinator/*

and reconstruct the latest deployment manifest, by running:

$ disnix-reconstruct infrastructure.nix

The above command resolves the full paths to the Nix profiles on the target machines, then downloads their intra-dependency closures to the coordinator machine, reconstructs the deployment manifest from the profile manifests and finally installs the generated deployment manifest.

If the above command succeeds, then we can reliably upgrade a system again with the usual command-line instruction:

$ disnix-env -s services.nix -i infrastructure.nix -d distribution.nix

Extending the self-adaptive deployment framework

In addition to reconstructing deployment manifests that have gone missing, disnix-reconstruct offers another benefit -- the self-adaptive redeployment framework described in the two earlier blog posts is capable of responding to various kinds of events, including redeploying services to other machines when a machine crashes and disappears from the network.

However, when a machine disappears from the network and reappears at a later point in time, Disnix no longer knows about its configuration. When such a machine reappears in the network, this could have disastrous results.

Fortunately, by adding disnix-reconstruct to the framework we can solve this issue:

As shown in the above diagram, whenever a change in the infrastructure is detected, we reconstruct the deployment manifest so that Disnix knows which services are deployed to it. Then when the system is being redeployed, the services on the reappearing machines can also be upgraded or undeployed completely, if needed.

The automatic reconstruction feature can be used by providing the --reconstruct parameter to the self adapt tool:

$ dydisnix-self-adapt -s services.nix -i infrastructure.nix -q qos.nix \
  --reconstruct

Conclusion

In this blog post, I have described the latest addition to Disnix: disnix-reconstruct that can be used to reconstruct the deployment manifest on the coordinator machine from meta data stored on the target machines. With this addition, we can still update systems if the coordinator machine gets lost.

Furthermore, we can use this addition in the self-adaptive deployment framework to deal with reappearing machines that already have services deployed to them.

Finally, besides developing disnix-reconstruct, I have reached another stable point. As a result, I have decided to release Disnix 0.7. Consult the Disnix homepage for more information.