Showing posts with label NPM. Show all posts
Showing posts with label NPM. Show all posts

Tuesday, August 31, 2021

A more elaborate approach for bypassing NPM's dependency management features in Nix builds

Nix is a general purpose package manager that can be used to automate the deployments of a variety of systems -- it can deploy components written in a variety of programming languages (e.g. C, C++, Java, Go, Rust, Perl, Python, JavaScript) using various kinds of technologies and frameworks, such as Django, Android, and Node.js.

Another unique selling point of Nix is that it provides strong reproducibility guarantees. If a build succeeds on one machine, then performing the same build on another should result in a build that is (nearly) bit-identical.

Nix improves build reproducibility by complementing build processes with features, such as:

  • Storing all artifacts in isolation in a so-called Nix store: /nix/store (e.g. packages, configuration files), in which every path is unique by prefixing it with an SHA256 hash code derived from all build inputs (e.g. dependencies, build scripts etc.). Isolated paths make it possible for multiple variants and versions of the same packages to safely co-exist.
  • Clearing environment variables or setting them to dummy values. In combination with unique and isolated Nix store paths, search environment variables must configured in such a way that the build script can find its dependencies in the Nix store, or it will fail.

    Having to specify all search environment variables may sound inconvenient, but prevents undeclared dependencies to accidentally make a build succeed -- deployment of such a package is very likely to fail on machine that misses an unknown dependency.
  • Running builds as an unprivileged user that does not have any rights to make modifications to the host system -- a build can only write in its designated temp folder or output paths.
  • Optionally running builds in a chroot environment, so that a build cannot possibly find any undeclared host system dependencies through hard-coded absolute paths.
  • Restricting network access to prevent a build from obtaining unknown dependencies that may influence the build outcome.

For many build tools, the Nixpkgs repository provides abstraction functions that allow you to easily construct a package from source code (e.g. GNU Make, GNU Autotools, Apache Ant, Perl's MakeMaker, SCons etc.).

However, certain tools are difficult to use in combination with Nix -- for example, NPM that is used to deploy Node.js projects.

NPM is both a dependency and build manager and the former aspect conflicts with Nix -- builds in Nix are typically prevented from downloading files from remote network locations, with the exception of so-called fixed-output derivations in which the output hash is known in advance.

If network connections would be allowed in regular builds, then Nix can no longer ensure that a build is reproducible (i.e. that the hash code in the Nix store path reflects the same build output derived from all inputs).

To cope with the conflicting dependency management feature of NPM, various kinds of integrations have been developed. npm2nix was the first, and several years ago I have started node2nix to provide a solution that aims for accuracy.

Basically, the build process of an NPM package in Nix boils down to performing the following steps in a Nix derivation:

# populate the node_modules/ folder
npm install --offline

We must first obtain the required dependencies of a project through the Nix package manager and install them in the correct locations in the node_modules/ directory tree.

Finally, we should run NPM in offline mode forcing it not to re-obtain or re-install any dependencies, but still perform build management tasks, such as running build scripts.

From a high-level point of view, this principle may look simple, but in practice it is not:

  • With earlier versions of NPM, we were forced to imitate its dependency resolution algorithm. At first sight, it looked simple, but getting it right (such as coping with circular dependencies and dependency de-duplication) is much more difficult than expected.
  • NPM 5.x introduced lock files. For NPM development projects, they provide exact version specifiers of all dependencies and transitive dependencies, making it much easier to know which dependencies need to be installed.

    Unfortunately, NPM also introduced an offline cache, that prevents us from simply copying packages into the node_modules/ tree. As a result, we need to make additional complex modifications to the package.json configuration files of all dependencies.

    Furthermore, end user package installations do not work with lock files, requiring us to still keep our custom implementation of the dependency resolution algorithm.
  • NPM's behaviour with dependencies on directories on the local file system has changed. In old versions of NPM, such dependencies were copied, but in newer versions, they are symlinked. Furthermore, each directory dependency maintains its own node_modules/ directory for transitive dependencies.

Because we need to take many kinds of installation scenarios into account and work around the directory dependency challenges, the implementation of the build environment: node-env.nix in node2nix has become very complicated.

It has become so complicated that I consider it a major impediment in making any significant changes to the build environment.

In the last few weeks, I have been working on a companion tool named: placebo-npm that should simplify the installation process. Moreover, it should also fix a number of frequently reported issues.

In this blog post, I will explain how the tool works.

Lock-driven deployments


In NPM 5.x, package-lock.json files were introduced. The fact that they capture the exact versions of all dependencies and make all transitive dependencies known, makes certain aspects of an NPM deployment in a Nix build environment easier.

For lock-driven projects, we no longer have to run our own implementation of the dependency resolution algorithm to figure out what the exact versions of all dependencies and transitive dependencies are.

For example, a project with the following package.json:

{
  "name": "simpleproject",
  "version": "0.0.1",
  "dependencies": {
    "underscore": "*",
    "prom2cb": "github:svanderburg/prom2cb",
    "async": "https://mylocalserver/async-3.2.1.tgz"
  }
}

may have the following package-lock.json file:

{
  "name": "simpleproject",
  "version": "0.0.1",
  "lockfileVersion": 1,
  "requires": true,
  "dependencies": {
    "async": {
      "version": "https://mylocalserver/async-3.2.1.tgz",
      "integrity": "sha512-XdD5lRO/87udXCMC9meWdYiR+Nq6ZjUfXidViUZGu2F1MO4T3XwZ1et0hb2++BgLfhyJwy44BGB/yx80ABx8hg=="
    },
    "prom2cb": {
      "version": "github:svanderburg/prom2cb#fab277adce1af3bc685f06fa1e43d889362a0e34",
      "from": "github:svanderburg/prom2cb"
    },
    "underscore": {
      "version": "1.13.1",
      "resolved": "https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz",
      "integrity": "sha512-hzSoAVtJF+3ZtiFX0VgfFPHEDRm7Y/QPjGyNo4TVdnDTdft3tr8hEkD25a1jC+TjTuE7tkHGKkhwCgs9dgBB2g=="
    }
  }
}

As you may notice, the package.json file declares three dependencies:

  • The first dependency is underscore that refers to the latest version in the NPM registry. In the package-lock.json file, the dependency is frozen to version 1.13.1. The resolved property provides the URL where the tarball should be obtained from. Its integrity can be verified with the given SHA512 hash.
  • The second dependency: prom2cb refers to the latest revision of the main branch of the prom2cb Git repository on GitHub. In the package-lock.json file, it is pinpointed to the fab277... revision.
  • The third dependency: async refers to a tarball that is downloaded from an arbitrary HTTP URL. The package-lock.json records its SHA512 integrity hash to make sure that we can only deploy with the version that we have used previously.

As explained earlier, to ensure purity, in a Nix build environment, we cannot allow NPM to obtain the required dependencies of a project. Instead, we must let Nix obtain all the dependencies.

When all dependencies have been obtained, we should populate the node_modules/ folder of the project. In the above example, it is just simply a matter of unpacking the tarballs or copying the Git clones into the node_modules/ folder of the project. No transitive dependencies need to be deployed.

For projects that do not rely on build scripts (that perform tasks, such as linting, compiling code, such as TypeScript etc.) this typically suffices to make a project work.

However, when we also need build management, we need to run the full installation process:

$ npm install --offline

npm ERR! code ENOTCACHED
npm ERR! request to https://registry.npmjs.org/async/-/async-3.2.1.tgz failed: cache mode is 'only-if-cached' but no cached response available.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/sander/.npm/_logs/2021-08-29T12_56_13_978Z-debug.log

Unfortunately, NPM still tries to obtain the dependencies despite the fact that they have already been copied into the right locations into node_modules folder.

Bypassing the offline cache


To cope with the problem that manually obtained dependencies cannot be detected, my initial idea was to use the NPM offline cache in a specific way.

The offline cache claims to be content-addressable, meaning that every item can be looked up by using a hash code that represents its contents, regardless of its origins. Unfortunately, it turns out that this property cannot be fully exploited.

For example, when we obtain the underscore tarball (with the exact same contents) from a different URL:

$ npm cache add http://mylocalcache/underscore-1.13.1.tgz

and run the installation in offline mode:

$ npm install --offline
npm ERR! code ENOTCACHED
npm ERR! request to https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz failed: cache mode is 'only-if-cached' but no cached response available.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/sander/.npm/_logs/2021-08-26T13_50_15_137Z-debug.log

The installation still fails, despite the fact that we already have a tarball (with the exact same SHA512 hash) in our cache.

However, downloading underscore from its original location (the NPM registry):

$ npm cache add underscore@1.13.1

makes the installation succeed.

The reason why downloading the same tarball from an arbitrary HTTP URL does not work is because NPM will only compute a SHA1 hash. Obtaining a tarball from the NPM registry causes NPM to compute a SHA512 hash. Because it was downloaded from a different source, it fails to recognize the SHA512 hash in the package-lock.json file.

We also run into similar issues when we obtain an old package from the NPM registry that only has an SHA1 hash. Importing the same file from a local file path causes NPM to compute a SHA512 hash. As a result, npm install tries to re-obtain the same tarball from the remote location, because the hash was not recognized.

To cope with these problems, placebo-npm will completely bypass the cache. After all dependencies have been copied to the node_modules folder, it modifies their package.json configuration files with hidden metadata properties to trick NPM that they came from their original locations.

For example, to make the underscore dependency work (that is normally obtained from the NPM registry), we must add the following properties to the package.json file:

{
  ...
  _from: "underscore@https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz",
  _integrity: "sha512-XdD5lRO/87udXCMC9meWdYiR+Nq6ZjUfXidViUZGu2F1MO4T3XwZ1et0hb2++BgLfhyJwy44BGB/yx80ABx8hg==",
  _resolved: "https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz"
}

For prom2cb (that is a Git dependency), we should add:

{
  ...
  _from = "github:svanderburg/prom2cb",
  _integrity = "",
  _resolved = "github:svanderburg/prom2cb#fab277adce1af3bc685f06fa1e43d889362a0e34"
}

and for HTTP/HTTPS dependencies and local files we should do something similar (adding _from and _integrity fields).

With these modifications, NPM will no longer attempt to consult the local cache, making the dependency installation step succeed.

Handling directory dependencies


Another challenge is dependencies on local directories, that are frequently used for local development projects:

{
  "name": "simpleproject",
  "version": "0.0.1",
  "dependencies": {
    "underscore": "*",
    "prom2cb": "github:svanderburg/prom2cb",
    "async": "https://mylocalserver/async-3.2.1.tgz",
    "mydep": "../../mydep",
  }
}

In the package.json file shown above, a new dependency has been added: mydep that refers to a relative local directory dependency: ../../mydep.

If we run npm install, then NPM creates a symlink to the folder in the project's node_modules/ folder and installs the transitive dependencies in the node_modules/ folder of the target dependency.

If we want to deploy the same project to a different machine, then it is required to put mydep in the exact same relative location, or the deployment will fail.

Deploying such an NPM project with Nix introduces a new problem -- all packages deployed by Nix are stored in the Nix store (typically /nix/store). After deploying the project, the relative path to the project (from the Nix store) will no longer be correct. Moreover, we also want Nix to automatically deploy the directory dependency as part of the deployment of the entire project.

To cope with these inconveniences, we are required to implement a tricky solution -- we must rewrite directory dependencies in such a way that can refer to a folder that is automatically deployed by Nix. Furthermore, the dependency should still end up being symlink to satisfy NPM -- copying directory dependencies in the node_modules/ folder is not accepted by NPM.

Usage


To conveniently install NPM dependencies from a local source (and satisfying npm in such a way that it believes the dependencies came from their original locations), I have created a tool called: placebo-npm.

We can, for example, obtain all required dependencies ourselves and put them in a local cache folder:

$ mkdir /home/sander/mycache
$ wget https://mylocalserver/async-3.2.1.tgz
$ wget https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz
$ git clone https://github.com/svanderburg/prom2cb

The deployment process that placebo-npm executes is driven by a package-placebo.json configuration file that has the following structure:

{
   "integrityHashToFile": {
     "sha512-hzSoAVtJF+3ZtiFX0VgfFPHEDRm7Y/QPjGyNo4TVdnDTdft3tr8hEkD25a1jC+TjTuE7tkHGKkhwCgs9dgBB2g==": "/home/sander/mycache/underscore-1.13.1.tgz",
     "sha512-XdD5lRO/87udXCMC9meWdYiR+Nq6ZjUfXidViUZGu2F1MO4T3XwZ1et0hb2++BgLfhyJwy44BGB/yx80ABx8hg==": "/home/sander/mycache/async-3.2.1.tgz"
   },
   "versionToFile": {
     github:svanderburg/prom2cb#fab277adce1af3bc685f06fa1e43d889362a0e34": "/home/sander/mycache/prom2cb"
   },
   "versionToDirectoryCopyLink": {
     "file:../dep": "/home/sander/alternatedir/dep"
   }
}

The placebo config maps dependencies in a package-lock.json file to local file references:

  • integrityHashToFile maps dependencies with an integrity hash to local files, which is useful for HTTP/HTTPS dependencies, registry dependencies, and local file dependencies.
  • versionToFile: maps dependencies with a version property to local directories. This is useful for Git dependencies.
  • versionToDirectoryCopyLink: specifies directories that need to be copied into a shadow directory named: placebo_node_dirs and creates symlinks to the shadow directories in the node_modules/ folder. This is useful for installing directory dependencies from arbitrary locations.

With the following command, we can install all required dependencies from the local cache directory and make all necessary modifications to let NPM accept the dependencies:

$ placebo-npm package-placebo.json

Finally, we can run:

$ npm install --offline

The above command does not attempt to re-obtain or re-install the dependencies, but still performs all required build management tasks.

Integration with Nix


All the functionality that placebo-npm provides has already been implemented in the node-env.nix module, but over the years it has evolved into a very complex beast -- it is implemented as a series of Nix functions that generates shell code.

As a consequence, it suffers from recursion problems and makes it extremely difficult to tweak/adjust build processes, such as modifying environment variables or injecting arbitrary build steps to work around Nix integration problems.

With placebo-npm we can reduce the Nix expression that builds projects (buildNPMProject) to an implementation that roughly has the following structure:

{stdenv, placebo-npm}:
{packagePlacebo}:

stdenv.mkDerivation ({
  pname = builtins.replaceStrings [ "@" "/" ] [ "_at_" "_slash_" ] pname; # Escape characters that aren't allowed in a store path

  placeboJSON = builtins.toJSON packagePlacebo;
  passAsFile = [ "placeboJSON" ];

  buildInputs = [ nodejs placebo-npm ] ++ buildInputs;

  buildPhase = ''
    runHook preBuild
    true
    runHook postBuild
  '';
  installPhase = ''
    runHook preInstall

    mkdir -p $out/lib/node_modules/${pname}
    mv * $out/lib/node_modules/${pname}
    cd $out/lib/node_modules/${pname}

    placebo-npm --placebo $placeboJSONPath
    npm install --offline

    runHook postInstall
  '';
} // extraArgs)

As may be observed, the implementation is much more compact and fits easily on one screen. The function accepts a packagePlacebo attribute set as a parameter (that gets translated into a JSON file by the Nix package manager).

Aside from some simple house keeping work, most of the complex work has been delegated to executing placebo-npm inside the build environment, before we run npm install.

The function above is also tweakable -- it is possible to inject arbitrary environment variables and adjust the build process through build hooks (e.g. preInstall and postInstall).

Another bonus feature of delegating all dependency installation functionality to the placebo-npm tool is that we can also use this tool as a build input for other kinds projects -- we can use it the construction process of systems that are built from monolithic repositories, in which NPM is invoked from the build process of the encapsulating project.

The only requirement is to run placebo-npm before npm install is invoked.

Other use cases


In addition to using placebo-npm as a companion tool for node2nix and setting up a simple local cache, it can also be useful to facilitate offline installations from external media, such as USB flash drives.

Discussion


With placebo-npm we can considerably simplify the implementation of node-env.nix (part of node2nix) making it much easier to maintain. I consider the node-env.nix module the second most complicated aspect of node2nix.

As a side effect, it has also become quite easy to provide tweakable build environments -- this should solve a large number of reported issues. Many reported issues are caused by the fact that it is difficult or sometimes impossible to make changes to a project so that it will cleanly deploy.

Moreover, placebo-npm can also be used as a build input for projects built from monolithic repositories, in which a sub set needs to be deployed by NPM.

The integration of the new node-env.nix implementation into node2nix is not completely done yet. I have reworked it, but the part that generates the package-placebo.json file and lets Nix obtain all required dependencies is still a work-in-progress.

I am experimenting with two implementations: a static approach that generates Nix expressions and dynamic implementation that directly consumes a package-lock.json file in the Nix expression language. Both approaches have pros and cons. As a result, node2nix needs to combine both of them into a hybrid approach.

In a next blog post, I will explain more about them.

Availability


The initial version of placebo-npm can be obtained from my GitHub page.

Tuesday, December 19, 2017

Bypassing NPM's content addressable cache in Nix deployments and generating expressions from lock files

Roughly half a year ago, Node.js version 8 was released that also includes a major NPM package manager update (version 5). NPM version 5 has a number of substantial changes over the previous version, such as:

  • It uses package lock files that pinpoint the resolved versions of all dependencies and transitive dependencies. When a project with a bundled package-lock.json file is deployed, NPM will use the pinpointed versions of the packages that are in the lock file making it possible to exactly reproduce a deployment elsewhere. When a project without a lock file is deployed for the first time, NPM will generate a lock file.
  • It has a content-addressable cache that optimizes package retrieval processes and allows fully offline package installations.
  • It uses SHA-512 hashing (as opposed to the significantly weakened SHA-1), for packages published in the NPM registry.

Although these features offer significant benefits over previous versions, e.g. NPM deployments are now much faster, more secure and more reliable, it also comes with a big drawback -- it breaks the integration with the Nix package manager in node2nix. Solving these problems were much harder than I initially anticipated.

In this blog post, I will explain how I have adjusted the generation procedure to cope with NPM's new conflicting features. Moreover, I have extended node2nix with the ability to generate Nix expressions from package-lock.json files.

Lock files


One of the major new features in NPM 5.0 is the lock file (the idea itself is not so new since NPM-inspired solutions such as yarn and the PHP-based composer already support them for quite some time).

A major drawback of NPM's dependency management is that version specifiers are nominal. They can refer to specific versions of packages in the NPM registry, but also to version ranges, or external artifacts such as Git repositories. The latter category of version specifiers affect reproducibility -- for example, the version range specifier >= 1.0.0 may refer to version 1.0.0 today and to version 1.0.1 tomorrow making it extremely hard to reproduce a deployment elsewhere.

In a development project, it is still possible to control the versions of dependencies by using a package.json configuration that only refers to exact versions. However, for transitive dependencies that may still have loose version specifiers there is only very little control.

To solve this reproducibility problem, a package-lock.json file can be used -- a package lock file pinpoints the resolved versions of all dependencies and transitive dependencies making it possible to reproduce the exact same deployment elsewhere.

For example, for the NiJS package with the following package.json configuration:

{
  "name": "nijs",
  "version": "0.0.25",
  "description": "An internal DSL for the Nix package manager in JavaScript",
  "bin": {
    "nijs-build": "./bin/nijs-build.js",
    "nijs-execute": "./bin/nijs-execute.js"
  },
  "main": "./lib/nijs",
  "dependencies": {
    "optparse": ">= 1.0.3",
    "slasp": "0.0.4"
  },
  "devDependencies": {
    "jsdoc": "*"
  }
}

NPM may produce the following partial package-lock.json file:

{
  "name": "nijs",
  "version": "0.0.25",
  "lockfileVersion": 1,
  "requires": true,
  "dependencies": {
    "optparse": {
      "version": "1.0.5",
      "resolved": "https://registry.npmjs.org/optparse/-/optparse-1.0.5.tgz",
      "integrity": "sha1-dedallBmEescZbqJAY/wipgeLBY="
    },
    "requizzle": {
      "version": "0.2.1",
      "resolved": "https://registry.npmjs.org/requizzle/-/requizzle-0.2.1.tgz",
      "integrity": "sha1-aUPDUwxNmn5G8c3dUcFY/GcM294=",
      "dev": true,
      "requires": {
        "underscore": "1.6.0"
      },
      "dependencies": {
        "underscore": {
          "version": "1.6.0",
          "resolved": "https://registry.npmjs.org/underscore/-/underscore-1.6.0.tgz",
          "integrity": "sha1-izixDKze9jM3uLJOT/htRa6lKag=",
          "dev": true
        }
      }
    },
    ...
}

The above lock file pinpoints all dependencies and development dependencies including transitive dependencies to exact versions, including the locations where they can be obtained from and integrity hash codes that can be used to validate them.

The lock file can also be used to derive the entire structure of the node_modules/ folder in which all dependencies are stored. The top level dependencies property captures all packages that reside in the project's node_modules/ folder. The dependencies property of each dependency captures all packages that reside in a dependency's node_modules/ folder.

If NPM 5.0 is used and no package-lock.json is present in a project, it will automatically generate one.

Substituting dependencies


As mentioned in an earlier blog post, the most important technique to make Nix-NPM integration work is by substituting NPM's dependency management activities that conflict with Nix's dependency management -- Nix is much more strict with handling dependencies (e.g. it uses hash codes derived from the build inputs to identify a package as opposed to a name and version number).

Furthermore, in Nix build environments network access is restricted to prevent unknown artifacts to influence the outcome of a build. Only so-called fixed output derivations, whose output hashes should be known in advance (so that Nix can verify its integrity), are allowed to obtain artifacts from external sources.

To substitute NPM's dependency management, populating the node_modules/ folder ourselves with all required dependencies and substituting certain version specifiers, such as Git URLs, used to suffice. Unfortunately, with the newest NPM this substitution process no longer works. When running the following command in a Nix builder environment:

$ npm --offline install ...

The NPM package manager is forced to work in offline mode consulting its content-addressable cache for the retrieval of external artifacts. If NPM needs to consult an external resource, it throws an error.

Despite the fact that all dependencies are present in the node_modules/ folder, deployment fails with the following error message:

npm ERR! code ENOTCACHED
npm ERR! request to https://registry.npmjs.org/optparse failed: cache mode is 'only-if-cached' but no cached response available.

At first sight, the error message suggests that NPM always requires the dependencies to reside in the content-addressable cache to prevent it from downloading it from external sites. However, when we use NPM outside a Nix builder environment, wipe the cache, and perform an offline installation, it does seem to work properly:

$ npm install
$ rm -rf ~/.npm/_cacache
$ npm --offline install

Further experimentation reveals that NPM augments the package.json configuration files of all dependencies with additional metadata that are prefixed by an underscore (_):

{
  "_from": "optparse@>= 1.0.3",
  "_id": "optparse@1.0.5",
  "_inBundle": false,
  "_integrity": "sha1-dedallBmEescZbqJAY/wipgeLBY=",
  "_location": "/optparse",
  "_phantomChildren": {},
  "_requested": {
    "type": "range",
    "registry": true,
    "raw": "optparse@>= 1.0.3",
    "name": "optparse",
    "escapedName": "optparse",
    "rawSpec": ">= 1.0.3",
    "saveSpec": null,
    "fetchSpec": ">= 1.0.3"
  },
  "_requiredBy": [
    "/"
  ],
  "_resolved": "https://registry.npmjs.org/optparse/-/optparse-1.0.5.tgz",
  "_shasum": "75e75a96506611eb1c65ba89018ff08a981e2c16",
  "_spec": "optparse@>= 1.0.3",
  "_where": "/home/sander/teststuff/nijs",
  "name": "optparse",
  "version": "1.0.5",
  ...
}

It turns out that when the _integrity property in a package.json configuration matches the integrity field of the dependency in the lock file, NPM will not attempt to reinstall it.

To summarize, the problem can be solved in Nix builder environments by running a script that augments the package.json configuration files with _integrity fields with the values from the package-lock.json file.

For Git repository dependency specifiers, there seems to be an additional requirement -- it also seems to require the _resolved field to be set to the URL of the repository.

Reconstructing package lock files


The fact that we have discovered how to bypass the cache in a Nix builder environment makes it possible to fix the integration with the latest NPM. However, one of the limitations of this approach is that it only works for projects that have a package-lock.json file included.

Since lock files are still a relatively new concept, many NPM projects (in particular older projects that are not frequently updated) may not have a lock file included. As a result, their deployments will still fail.

Fortunately, we can reconstruct a minimal lock file from the project's package.json configuration and compose dependencies objects by traversing the package.json configurations inside the node_modules/ directory hierarchy.

The only attribute that cannot be immediately derived are the integrity fields containing hashes that are used for validation. It seems that we can bypass the integrity check by providing a dummy hash, such as:

integrity: "sha1-000000000000000000000000000=",

NPM does not seem to object when it encounters these dummy hashes allowing us to deploy projects with a reconstructed package-lock.json file. The solution is a very ugly hack, but it seems to work.

Generating Nix expressions from lock files


As explained earlier, lock files pinpoint the exact versions of all dependencies and transitive dependencies and describe the structure of the entire dependency graph.

Instead of simulating NPM's dependency resolution algorithm, we can also use the data provided by the lock files to generate Nix expressions. Lock files appear to contain most of the data we need -- the URLs/locations of the external artifacts and integrity hashes that we can use for validation.

Using lock files for generation offer the following advantages:

  • We no longer need to simulate NPM's dependency resolution algorithm. Despite my best efforts and fairly good results, it is hard to truly make it 100% identical to NPM's. When using a lock file, the dependency graph is already given, making deployment results much more accurate.
  • We no longer need to consult external resources to resolve versions and compute hashes making the generation process much faster. The only exception seems to be Git repositories -- Nix needs to know the output hash of the clone whereas for NPM the revision hash suffices. When we encounter a Git dependency, we still need to download it and compute the output hash.

Another minor technical challenge are the integrity hashes -- in NPM lock files integrity hashes are in base-64 notation, whereas Nix uses heximal notation or its own custom base-32 notation. We need to convert the NPM integrity hashes to a notation that Nix understands.

Unfortunately, lock files can only be used in development projects. It appears that packages that are installed directly from the NPM registry, e.g. end-user packages that are installed globally through npm install -g, never include a package lock file. (It even seems that the NPM registry blacklist the lock files when publishing a package in the registry).

For this reason, we still need to keep our own implementation of the dependency resolution algorithm.

Usage


By adding a script that augments the dependencies' package.json configuration files with _integrity fields and by optionally reconstructing a package-lock.json file, NPM integration with Nix has been restored.

Using the new NPM 5.x features is straight forward. The following command can be used to generate Nix expressions for a development project with a lock file:

$ node2nix -8 -l package-lock.json

The above command will directly generate Nix expressions from the package lock file, resulting in a much faster generation process.

When a development project does not ship with a lock file, you can use the following command-line instruction:

$ node2nix -8

The generator will use its own implementation of NPM's dependency resolution algorithm. When deploying the package, the builder will reconstruct a dummy lock file to allow the deployment to succeed.

In addition to development projects, it is also possible to install end-user software, by providing a JSON file (e.g. pkgs.json) that defines an array of dependency specifiers:

[
  "nijs"
, { "node2nix": "1.5.0" }
]

A Node.js 8 compatible expression can be generated as follows:

$ node2nix -8 -i pkgs.json

Discussion


The approach described in this blog post is not the first attempt to fix NPM 5.x integration. In my first attempt, I tried populating NPM's content-addressable cache in the Nix builder environment with artifacts that were obtained by the Nix package manager and forcing NPM to work in offline mode.

NPM exposes its download and cache-related functionality as a set of reusable APIs. For downloading packages from the NPM registry, pacote can be used. For downloading external artifacts through the HTTP protocol make-fetch-happen can be used. Both APIs are built on top of the content-addressable cache that can be controlled through the lower-level cacache API.

The real difficulty is that neither the high-level NPM APIs nor the npm cache command-line instruction work with local directories or local files -- they will only add artifacts to the cache if they come from a remote location. I have partially built my own API on top of cacache to populate the NPM cache with locally stored artifacts pretending that they were fetched from a remote location.

Although I had some basic functionality supported, it turned out to be much more complicated and time consuming to get all functionality implemented.

Furthermore, the NPM authors never promised that these APIs are stable, so the implementation may break at some point in time. As a result, I have decided to look for another approach.

Availability


I just released node2nix version 1.5.0 with NPM 5.x support. It can be obtained from the NPM registry, Github, or directly from the Nixpkgs repository.

Friday, November 3, 2017

Creating custom object transformations with NiJS and PNDP

In a number earlier blog posts, I have described two kinds of internal DSLs for Nix -- NiJS is a JavaScript-based internal DSL and PNDP is a PHP-based internal DSL.

These internal DSLs have a variety of application areas. Most of them are simply just experiments, but the most serious application area is code generation.

Using an internal DSL for generation has a number of advantages over string generation that is more commonly used. For example, when composing strings containing Nix expressions, we must make sure that any variable in the host language that we append to a generated expression is properly escaped to prevent code injection attacks.

Furthermore, we also have to take care of the indentation if we want to output Nix expression code that should be readable. Finally, string manipulation itself is not a very intuitive activity as it makes it very hard to read what the generated code would look like.

Translating host language objects to the Nix expression language


A very important feature of both internal DSLs is that they can literally translate some language constructs from the host language (JavaScript or PHP) to the Nix expression because they have (nearly) an identical meaning. For example, the following JavaScript code fragment:

var nijs = require('nijs');

var expr = {
  hello: "Hello",
  name: {
    firstName: "Sander",
    lastName: "van der Burg"
  },
  numbers: [ 1, 2, 3, 4, 5 ]
};

var output = nijs.jsToNix(expr, true);
console.log(output);

will output the following Nix expression:

{
  hello = "Hello",
  name = {
    firstName = "Sander";
    lastName = "van der Burg";
  };
  numbers = [
    1
    2
    3
    4
    5
  ];
}

In the above example, strings will be translated to strings (and quotes will be escaped if necessary), objects to attribute sets, and the array of numbers to a list of numbers. Furthermore, the generated code is also pretty printed so that attribute set and list members have 2 spaces of indentation.

Similarly, in PHP we can compose the following code fragment to get an identical Nix output:

use PNDP\NixGenerator;

$expr = array(
  "hello" => "Hello",
  "name" => array(
    "firstName" => "Sander",
    "lastName => "van der Burg"
  ),
  "numbers" => array(1, 2, 3, 4, 5)
);

$output = NixGenerator::phpToNix($expr, true);
echo($output);

The PHP generator uses a number of clever tricks to determine whether an array is associative or sequential -- the former gets translated into a Nix attribute set while the latter gets translated into a list.

There are objects in the Nix expression language for which no equivalent exists in the host language. For example, Nix also allows you to define objects of a 'URL' and 'file' type. Neither JavaScript nor PHP have a direct equivalent. Moreover, it may be desired to generate other kinds of language constructs, such as function declarations and function invocations.

To still generate these kinds of objects, you must compose an abstract syntax tree from objects that inherit from the NixObject prototype or class. For example, we can define a function invocation to fetchurl {} in Nixpkgs as follows in JavaScript:

var expr = new nijs.NixFunInvocation({
    funExpr: new nijs.NixExpression("fetchurl"),
    paramExpr: {
        url: new nijs.NixURL("mirror://gnu/hello/hello-2.10.tar.gz"),
        sha256: "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"
    }
});

and in PHP as follows:

use PNDP\AST\NixExpression;
use PNDP\AST\NixFunInvocation;
use PNDP\AST\NixURL;

$expr = new NixFunInvocation(new NixExpression("fetchurl"), array(
    "url" => new NixURL("mirror://gnu/hello/hello-2.10.tar.gz"),
    "sha256" => "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"
));

Both of the objects in the above code fragments translate to the following Nix expression:

fetchurl {
  url = mirror://gnu/hello/hello-2.10.tar.gz;
  sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i";
}

Transforming custom object structures into Nix expressions


The earlier described use cases are basically one-on-one translations from the host language (JavaScript or PHP) to the guest language (Nix). In some cases, literal translations do not make sense -- for example, it may be possible that we already have an application with an existing data model from which we want to derive deployments that should be carried out with Nix.

In the latest versions of NiJS and PNDP, it is also possible to specify how to transform custom object structures into a Nix expression. This can be done by inheriting from the NixASTNode class or prototype and overriding the toNixAST() method.

For example, we may have a system already providing a representation of a file that should be downloaded from an external source:

function HelloSourceModel() {
    this.src = "mirror://gnu/hello/hello-2.10.tar.gz";
    this.sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i";
}

The above module defines a constructor function composing an object that refers to the GNU Hello package provided by a GNU mirror site.

A direct translation of an object constructed by the above function to the Nix expression language does not provide anything meaningful -- it can, for example, not be used to let Nix fetch the package from the mirror site.

We can inherit from NixASTNode and implement our own custom toNixAST() function to provide a more meaningful Nix translation:

var nijs = require('nijs');
var inherit = require('nijs/lib/ast/util/inherit.js').inherit;

/* HelloSourceModel inherits from NixASTNode */
inherit(nijs.NixASTNode, HelloSourceModel);

/**
 * @see NixASTNode#toNixAST
 */
HelloSourceModel.prototype.toNixAST = function() {
    return this.args.fetchurl()({
        url: new nijs.NixURL(this.src),
        sha256: this.sha256
    });
};

The toNixAST() function shown above composes an abstract syntax tree (AST) for a function invocation to fetchurl {} in the Nix expression language with the url and sha256 properties a parameters.

An object that inherits from the NixASTNode prototype also indirectly inherits from NixObject. This means that we can directly attach such an object to any other AST object. The generator uses the underlying toNixAST() function to automatically convert it to its AST representation:

var helloSource = new HelloSourceModel();
var output = nijs.jsToNix(helloSource, true);
console.log(output);

In the above code fragment, we directly pass the construct HelloSourceModel object instance to the generator. The output will be the following Nix expression:

fetchurl {
  url = mirror://gnu/hello/hello-2.10.tar.gz;
  sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i";
}

In some cases, it may not be possible to inherit from NixASTNode, for example, when the object already inherits from another prototype or class that is beyond the user's control.

It is also possible to use the NixASTNode constructor function as an adapter. For example, we can take any object with a toNixAST() function:

var helloSourceWrapper = {
    toNixAST: function() {
        return new nijs.NixFunInvocation({
            funExpr: new nijs.NixExpression("fetchurl"),
            paramExpr: {
                url: new nijs.NixURL(this.src),
                sha256: this.sha256
            }
        });
    }
};

By wrapping the helloSourceWrapper object in the NixASTNode constructor, we can convert it to an object that is an instance of NixASTNode:

new nijs.NixASTNode(helloSourceWrapper)

In PHP, we can change any class into a NixASTNode by implementing the NixASTConvertable interface:

use PNDP\AST\NixASTConvertable;
use PNDP\AST\NixURL;

class HelloSourceModel implements NixASTConvertable
{
    /**
     * @see NixASTConvertable::toNixAST()
     */
    public function toNixAST()
    {
        return $this->args->fetchurl(array(
            "url" => new NixURL($this->src),
            "sha256" => $this->sha256
        ));
    }
}

By passing an object that implements the NixASTConvertable interface to the NixASTNode constructor, it can be converted:

new NixASTNode(new HelloSourceModel())

Motivating use case: the node2nix and composer2nix generators


My main motivation to use custom transformations is to improve the quality of the node2nix and composer2nix generators -- the former converts NPM package configurations to Nix expressions and the latter converts PHP composer package configurations to Nix expressions.

Although NiJS and PNDP provide a number of powerful properties to improve the code generation steps of these tools, e.g. I no longer have to think much about escaping strings or pretty printing, there are still many organizational coding issues left. For example, the code that parses the configurations, fetches the external sources, and generates the code are mixed. As a consequence, the code is very hard to read, update, maintain and to ensure its correctness.

The new transformation facilities allow me to separate concerns much better. For example, both generators now have a data model that reflects the NPM and composer problem domain. For example, I could compose the following (simplified) class diagram for node2nix's problem domain:


A crucial part of node2nix's generator is the package class shown on the top left on the diagram. A package requires zero or more packages as dependencies and may provide zero or more packages in the node_modules/ folder residing in the package's base directory.

For readers not familiar with NPM's dependency management: every package can install its dependencies privately in a node_modules/ folder residing in the same base directory. The CommonJS module ensures that every file is considered to be a unique module that should not interfere with other modules. Sharing is accomplished by putting a dependency in a node_modules/ folder of an enclosing parent package.

NPM 2.x always installs a package dependency privately unless a parent package exists that can provide a conforming version. NPM 3.x (and later) will also move a package into the node_modules/ folder hierarchy as high as possible to prevent too many layers of nested node_modules/ folders (this is particularly a problem on Windows). The class structure in the above diagram reflects this kind of dependency organisation.

In addition to a package dependency graph, we also need to obtain package metadata and compute their output hashes. NPM packages originate from various kinds of sources, such as the NPM registry, Git repositories, HTTP sites and local directories on the filesystem.

To optimize the process and support sharing of common sources among packages, we can use a source cache that memorizes all unique source referencess.

The Package::resolveDependencies() method sets the generation process in motion -- it will construct the dependency graph replicating NPM's dependency resolution algorithm as faithfully as possible, and resolves all the dependencies' (and transitive dependencies) metadata.

After resolving all dependencies and their metadata, we must generate the output Nix expressions. One Nix expression is copied (the build infrastructure) and two are generated -- a composition expression and a package or collection expression.

We can also compose a class diagram for the generation infrastructure:


In the above class diagram, every generated expression is represented a class inheriting from NixASTNode. We can also reuse some classes from the domain model as constituents for the generated expressions, by also inheriting from NixASTNode and overriding the toNixAST() method:

  • The source objects can be translated into sub expressions that invoke fetchurl {} and fetchgit {}.
  • The sources cache can be translated into an attribute set exposing all sources that are used as dependencies for packages.
  • A package instance can be converted into a function invocation to nodeenv.buildNodePackage {} that, in addition to configuring build properties, binds the required dependencies to the sources in the sources cache attribute set.

By decomposing the expression into objects and combining the objects' AST representations, we can nicely modularize the generation process.

For composer2nix, we can also compose a class diagram for its domain -- the generation process:


The above class diagram has many similarities, but also some major differences compared to node2nix. composer provides so-called lock files that pinpoint the exact versions of all dependencies and transitive dependencies. As a result, we do not need to replicate composer's dependency resolution algorithm.

Instead, the generation process is driven by the ComposerConfig class that encapsulates the properties of the composer.json and composer.lock files of a package. From a composer configuration, the generator constructs a package object that refers to the package we intend to deploy and populates a source cache with source objects that come from various sources, such as Git, Mercurial and Subversion repositories, Zip files, and directories residing on the local filesystem.

For the generation process, we can adopt a similar strategy that exposes the generated Nix expressions as classes and uses some classes of the domain model as constituents for the generation process:


Discussion


In this blog post, I have described a new feature for the NiJS and PNDP frameworks, making it possible to implement custom transformations. Some of its benefits are that it allows an existing object model to be reused and concerns in an application can be separated much more conveniently.

These facilities are not only useful for the improvement of the architecture of the node2nix and composer2nix generators -- at the company I work for (Conference Compass), we developed our own domain-specific configuration management tool.

Despite the fact that it uses several tools from the Nix project to carry out deployments, it uses a domain model that is not Nix-specific at all. Instead, it uses terminology and an organization that reflects company processes and systems.

For example, we use a backend-for-frontend organization that provides a backend for each mobile application that we ship. We call these backends configurators. Optionally, every configurator can import data from various external sources that we call channels. The tool's data model reflects this kind of organization, and generates Nix expressions that contain all relevant implementation details, if necessary.

Finally, the fact that I modified node2nix to have a much cleaner architecture has another reason beyond quality improvement. Currently, NPM version 5.x (that comes with Node.js 8.x) is still unsupported. To make it work with Nix, we require a slightly different generation process and a completely different builder environment. The new architecture allows me to reuse the common parts much more conveniently. More details about the new NPM support will follow (hopefully) soon.

Availability


I have released new versions of NiJS and PNDP that have the custom transformation facilities included.

Furthermore, I have decided to release new versions for node2nix and composer2nix that use the new generation facilities in their architecture. The improved architecture revealed a very uncommon but nasty bug with bundled dependencies in node2nix, that is now solved.

Friday, March 31, 2017

Subsituting impure version specifiers in node2nix generated package compositions

In a number of previous blog posts, I have described node2nix, a tool that can be used to automatically integrate NPM packages into the Nix packages ecosystem. The biggest challenge in making this integration possible is the fact that NPM does dependency management in addition to build management -- NPM's dependency management properties conflict with Nix's purity principles.

Dealing with a conflicting dependency manager is quite simple from a conceptual perspective -- you must substitute it by a custom implementation that uses Nix to obtain all required dependencies. The remaining responsibilities (such as build management) are left untouched and still have to be carried out by the guest package manager.

Although conceptually simple, implementing such a substitution approach is much more difficult than expected. For example, in my previous blog posts I have described the following techniques:

  • Extracting dependencies. In addition to the package we intend to deploy with Nix, we must also include all its dependencies and transitive dependencies in the generation process.
  • Computing output hashes. In order to make package deployments deterministic, Nix requires that the output hashes of downloads are known in advance. As a result, we must examine all dependencies and compute their corresponding SHA256 output hashes. Some NPM projects have thousands of transitive dependencies that need to be analyzed.
  • Snapshotting versions. Nix uses SHA256 hash codes (derived from all inputs to build a package) to address specific variants or versions of packages whereas version specifiers in NPM package.json configurations are nominal -- they permit version ranges and references to external artifacts (such as Git repositories and external URLs).

    For example, a version range of >= 1.0.3 might resolve to version 1.0.3 today and to version 1.0.4 tomorrow. Translating a version range to a Nix package with a hash code identifier breaks the ability for Nix to guarantee that a package with a specific hash code yields a (nearly) bit identical build.

    To ensure reproducibility, we must snapshot the resolved version of these nominal dependency version specifiers (such as a version range) at generation time and generate the corresponding Nix expression for the resulting snapshot.
  • Simulating shared and private dependencies. In NPM projects, dependencies of a package are stored in the node_modules/ sub folder of the package. Each dependency can have private dependencies by putting them in their corresponding node_modules/ sub folder. Sharing dependencies is also possible by placing the corresponding dependency in any of the parent node_modules/ sub folders.

    Moreover, although this is not explicitly advertised as such, NPM implicitly supports cyclic dependencies and is able cope with them because it will refuse to install a dependency in a node_modules/ sub folder if any parent folder already provides it.

    When generating Nix expressions, we must replicate the exact same behaviour when it comes to private and shared dependencies. This is particularly important to cope with cyclic dependencies -- the Nix package manager does not allow them and we have to break any potential cycles at generation time.
  • Simulating "flat module" installations. In NPM versions older than 3.0, every dependency was installed privately by default unless a shared dependency exists that fits within the required version range.

    In newer NPM versions, this strategy has been reversed -- every dependency will be shared as much as possible until a conflict has been encountered. This means that we have to move dependencies as high up in the node_modules/ folder hierarchy as possible which is an imperative operation -- in Nix this is a problem, because packages are cannot be changed after they have been built.

    To cope with flattening, we must compute the implications of flattening the dependency structure in advance at generation time.

With the above techniques it is possible construct a node_modules/ directory structure having a nearly identical structure that NPM would normally compose with a high degree of accuracy.

Impure version specifiers


Even if it would be possible to reproduce the node_modules/ directory hierarchy with 100% accuracy, there is another problem that remains -- some version specifiers always trigger network communication regardless whether the dependencies have been provided or not, such as:

[
  { "node2nix": "latest" }
, { "nijs": "git+https://github.com/svanderburg/nijs.git#master" }
, { "prom2cb": "github:svanderburg/prom2cb" }
]

When referring to tags or Git branches, NPM is unable to determine to which version a package resolves. As a consequence, it attempts to retrieve the corresponding packages to investigate even when a compatible version in the node_modules/ directory hierarchy already exists.

While performing package builds, Nix takes various precautions to prevent side effects from influencing builds including network connections. As a result, an NPM package deployment will still fail despite the fact that a compatible dependency has already been provided.

In the package builder Nix expression provided by node2nix, I used to substitute these version specifiers in the package.json configuration files by a wildcard: '*'. Wildcards used to work fine for old Node.js 4.x/NPM 2.x installations, but with NPM 3.x flat module installations they became another big source of problems -- in order to make flat module installations work, NPM needs to know to which version a package resolves to determine whether it can be shared on a higher level in the node_modules/ folder hierarchy or not. Wildcards prevent NPM from making these comparisons, and as a result, some package deployments fail that did not use to fail with older versions of NPM.

Pinpointing version specifiers


In the latest node2nix I have solved these issues by implementing a different substitution strategy -- instead of substituting impure version specifiers by wildcards, I pinpoint all the dependencies to the exact version numbers to which these dependencies resolve. Internally, NPM addresses all dependencies by their names and version numbers only (this also has a number of weird implications, because it disregards the origins of these dependencies, but I will not go into detail on that).

I got the inspiration for this pinpointing strategy from the yarn package manager (an alternative to NPM developed by Facebook) -- when deploying a project with yarn, yarn pinpoints the installed dependencies in a so-called yarn.lock file so that package deployments become reproducible when a system is deployed for a second time.

The pinpointing strategy will always prevent NPM from consulting external resources (under the condition that we have provided the package by our substitute dependency manager first) and always provide version numbers for any dependency so that NPM can perform flat module installations. As a result, the accuracy of node2nix with newer versions of NPM has improved quite a bit.

Availability


The pinpointing strategy is part of the latest node2nix that can be obtained from the NPM registry or the Nixpkgs repository.

One month ago, I have given a talk about node2nix at FOSDEM 2017 summarizing the techniques discussed in my blog posts written so far. For convenience, I have embedded the slides into this web page:

Tuesday, September 27, 2016

Simulating NPM global package installations in Nix builds (or: building Grunt projects with the Nix package manager)

A while ago, I "rebranded" my second re-engineered version of npm2nix into node2nix and officially released it as such. My main two reasons for giving the tool a different name is that node2nix is neither a fork nor a continuation of npm2nix, but a tool that is written from scratch (though it incorporates some of npm2nix's ideas and concepts including most of its dependencies).

Furthermore, it approaches the expression generation problem in a fundamentally different way -- whereas npm2nix generates derivations for each package in a dependency tree and composes symlinks to their Nix store paths to allow a package to find its dependencies, node2nix deploys an entire dependency tree in one derivation so that it can more accurately mimic NPM's behaviour including flat-module installations (at the expense of losing the ability to share dependencies among packages and projects).

Because node2nix is conceptually different, I have decided to rename the project so that it can be used alongside the original npm2nix tool that still implements the old generation concepts.

Besides officially releasing node2nix, I have recently extended its feature set with a new concept for a recurring class of NPM development projects.

Global NPM development dependencies


As described in earlier blog posts, node2nix (as well as npm2nix) generate Nix expressions from a set of third party NPM packages (obtained from external sources, such as the NPM registry) or a development project's package.json file.

Although node2nix works fine for most of my development projects, I have noticed that for a recurring class of projects, the auto generation approach is too limited -- some NPM projects may require the presence of globally installed packages and must run additional build steps in order to be deployed properly. A prominent example of such a category of projects are Grunt projects.

Grunt advertises itself as a "The JavaScript Task Runner" and can be used to run all kinds of things, such as code generation, linting, minification etc. The tasks that Grunt carries out are implemented as plugins that must be deployed as a project's development dependencies with the NPM package manager.

(As a sidenote: it is debatable whether Grunt is a tool that NPM developers should use, as NPM itself can also carry out build steps through its script directive, but that discussion is beyond the scope of this blog post).

A Grunt workflow typically looks as follows. Consider an example project, with the following Gruntfile.js:

module.exports = function(grunt) {

  grunt.initConfig({
    jshint: {
      files: ['Gruntfile.js', 'src/**/*.js'],
      options: {
        globals: {
          jQuery: true
        }
      }
    },
    watch: {
      files: ['<%= jshint.files %>'],
      tasks: ['jshint']
    }
  });

  grunt.loadNpmTasks('grunt-contrib-jshint');
  grunt.loadNpmTasks('grunt-contrib-watch');

  grunt.registerTask('default', ['jshint']);

};

The above Gruntfile defines a configuration that iterates over all JavaScript files (*.js files) in the src/ directory and invokes jshint to check for potential errors and code smells.

To deploy the development project, we first have to globally install the grunt-cli command-line utility:

$ npm install -g grunt-cli
$ which grunt
/usr/local/bin/grunt

To be able to carry out the steps, we must update a project's package.json file to have grunt and all its required Grunt plugins as development dependencies:

{
  "name": "grunt-test",
  "version": "0.0.1",
  "private": "true",
  "devDependencies": {
    "grunt": "*",
    "grunt-contrib-jshint": "*",
    "grunt-contrib-watch": "*"
  }
}

Then we must install the development dependencies with the NPM package manager:

$ npm install

And finally, we can run the grunt command-line utility to execute our tasks:

$ grunt

"Global dependencies" in Nix


Contrary to NPM, Nix does not support "global dependencies". As a matter of fact, it takes all kinds of precautions to prevent global dependencies to influence the builds that it performs, such as storing all packages in isolation in a so-called Nix store (e.g. /nix/store--grunt-1.0.1 as opposed to storing files in global directories, such as /usr/lib), initially clearing all environment variables (e.g. PATH) and setting these to the Nix store paths of the provided dependencies to allow packages to find them, running builds in chroot environments, etc.

These precautions are taken for a very good reason: purity -- each Nix package stored in the Nix store has a hash prefix that is computed from all involved build-time dependencies.

With pure builds, we know that (for example) if we encounter a build performed on one machine with a specific hash code and a build with an identical hash code on another machine, their build results are identical as well (with some caveats, but in general there are no observable side effects). Pure package builds are a crucial ingredient to make deployments of systems reliable and reproducible.

In Nix, we must always be explicit about the dependencies of a build. When a dependency is unspecified (something that commonly happens with global dependencies), a build will typically fail because it cannot be (implicitly) found. Similarly, when a build has dependencies on packages that would normally have to be installed globally (e.g. non-NPM dependencies), we must now explicitly provide them as a build inputs.

The problem with node2nix is that it automatically generates Nix expressions and that global dependencies cannot be detected automatically, because they are not specified anywhere in a package.json configuration file.

To cope with this limitation, the generated Nix expressions are made overridable, so that any missing dependency can be provided manually. For example, we may want to deploy an NPM package named floomatic from the following JSON file (node-packages.json):

[
  "floomatic"
]

We can generate Nix expressions from the above specification, by running:

$ node2nix -i node-packages.json

One of floomatic's dependencies is an NPM package named: native-diff-match-patch that requires the Qt 4.x library and pkgconfig. These two packages are non-NPM package dependencies left undetected by the node2nix generator. In conventional Linux distributions, these packages typically reside in global directories, such as /usr/lib, and can still be implicitly found.

By creating an override expression (named: override.nix), we can inject these missing (global) dependencies ourselves:

{pkgs ? import <nixpkgs> {
    inherit system;
}, system ? builtins.currentSystem}:

let
  nodePackages = import ./default.nix {
    inherit pkgs system;
  };
in
nodePackages // {
  floomatic = nodePackages.floomatic.override (oldAttrs: {
    buildInputs = oldAttrs.buildInputs ++ [
      pkgs.pkgconfig
      pkgs.qt4
    ];
  });
}

With the override expression shown above, we can correctly deploy the floomatic package, by running:

$ nix-build override.nix -A floomatic

Providing supplemental NPM packages to an NPM development project


Similar to non-NPM dependencies, we also need to supply the grunt-cli as an additional dependency to allow a Grunt project build to succeed in a Nix build environment. What makes this process difficult is that grunt-cli is also an NPM package. As a consequence, we need to generate a second set of Nix expressions and propagate their generated package configurations as parameters to the former expression. Although it was already possible to do this, because the Nix language is flexible enough, the process is quite complex, hacky and inconvenient.

In the latest node2nix version, I have automated this workflow -- when generating expressions for a development project, it is now also possible to provide a supplemental package specification. For example, for our trivial Grunt project, we can create the following supplemental JSON file (supplement.json) that provides the grunt-cli:

[
  "grunt-cli"
]

We can generate Nix expressions for the development project and supplemental package set, by running:

$ node2nix -d -i package.json --supplement-input supplement.json

Besides providing the grunt-cli as an additional dependency, we also need to run grunt after obtaining all NPM dependencies. With the following wrapper expression (override.nix), we can run the Grunt task runner after all NPM packages have been successfully deployed:

{ pkgs ? import <nixpkgs> {}
, system ? builtins.currentSystem
}:

let
  nodePackages = import ./default.nix {
    inherit pkgs system;
  };
in
nodePackages // {
  package = nodePackages.package.override {
    postInstall = "grunt";
  };
}

As may be observed in the expression shown above, the postInstall hook is responsible for invoking the grunt command.

With the following command-line instruction, we can use the Nix package manager to deploy our Grunt project:

$ nix-build override.nix -A package

Conclusion


In this blog post, I have explained a recurring limitation of node2nix that makes it difficult to deploy projects having dependencies on NPM packages that (in conventional Linux distributions) are typically installed in global file system locations, such as grunt-cli. Furthermore, I have described a new node2nix feature that provides a solution to this problem.

In addition to Grunt projects, the solution described in this blog post is relevant for other tools as well, such as ESLint.

All features described in this blog post are part of the latest node2nix release (version 1.1.0) that can be obtained from the NPM registry and the Nixpkgs collection.

Besides a new release, node2nix is now also used to generate the expressions for the set of NPM packages included with the development and upcoming 16.09 versions of Nixpkgs.

Monday, February 29, 2016

Managing NPM flat module installations in a Nix build environment

Some time ago, I have reengineered npm2nix and described some of its underlying concepts in a blog post. In the reengineered version, I have ported the implementation from CoffeeScript to JavaScript, refactored/modularized the code, and I have been improving the implementation to more accurately simulate NPM's dependency organization, including many of its odd traits.

I have observed that in the latest Node.js (the 5.x series) NPM's behaviour has changed significantly. To cope with this, I did yet another major reengineering effort. In this blog post, I will describe the path that has lead to the latest implementation.

The first attempt


Getting a few commonly used NPM packages deployed with Nix is not particularly challenging, but to make it work completely right turns out to be quite difficult -- the early npm2nix implementations generated Nix expressions that build every package and all of its dependencies in separate derivations (in other words: each package and dependency translates to a separate Nix store path). To allow a package to find its dependencies, the build script creates a node_modules/ sub folder containing symlinks that refer to the Nix store paths of the packages that it requires.

NPM packages have loose dependency specifiers, e.g. wildcards and version ranges, whereas Nix package dependencies are exact, i.e. they bind to packages that are identified by unique hash codes derived from all build time dependencies. npm2nix makes this translation by "snapshotting" the latest conforming version and turning that into into a Nix package.

For example, one my own software projects (NiJS) has the following package configuration file:

{
  "name" : "nijs",
  "version" : "0.0.23",
  "dependencies" : {
    "optparse" : ">= 1.0.3",
    "slasp": "0.0.4"
  }
  ...
}

The package configuration states that it requires optparse version 1.0.3 or higher, and slasp version 0.0.4. Running npm install results in the following directory structure of dependencies:

nijs/
  ...
  package.json
  node_modules/
    optparse/
      package.json
      ...
    slasp/
      package.json
      ...

A node_modules/ folder gets created in which each sub directory represents an NPM package that is a dependency of NiJS. In the older versions of npm2nix, it gets translated as follows:

/nix/store/ab12pq...-nijs-0.0.24/
  ...
  package.json
  node_modules/
    optparse -> /nix/store/4pq1db...-optparse-1.0.5
    slasp -> /nix/store/8j12qp...-slasp-0.0.4
/nix/store/4pq1db...-optparse-1.0.5/
  ...
  package.json
/nix/store/8j12qp...-slasp-0.0.4/
  ...
  package.json

Each involved package is stored in its own private folder in the Nix store. The NiJS package has a node_modules/ folder containing symlinks to its dependencies. For many packages, this approach works well enough, as it at least provides a conforming version for each dependency that it requires.

Unfortunately, it is possible to run into oddities as well. For example, a package that does not work properly in such a model is ironhorse.

For example, we could declare mongoose and ironhorse dependencies of a project:

{
  "name": "myproject",
  "version": "0.0.1",
  "dependencies": {
    "mongoose": "3.8.5",
    "ironhorse": "0.0.11"
  }
}

Ironhorse has an overlapping dependency with the project's dependencies -- it also depends on mongoose, as shown in the following package configuration:

{
  "name": "ironhorse",
  "version": "0.0.11",
  "license" : "MIT",
  "dependencies" : {
    "underscore": "~1.5.2",
    "mongoose": "*",
    "temp": "*",
    ...
  },
  ...
}

Running 'npm install' on project level yields the following directory structure:

myproject/
  ...
  package.json
  node_modules/
    mongoose/
      ...
    ironhorse/
      ...
      package.json
      node_modules/
        underscore/
        temp/

Note that the mongoose only appears one time in the hierarchy of node_modules/ folders despite that it has been declared as a dependency twice.

In contrast, when using an older version of npm2nix, the following directory structure gets generated:

/nix/store/67ab07...-myproject-0.0.1
  ...
  package.json
  node_modules/
    mongoose -> /nix/store/ec704c...-mongoose-3.8.5
    ironhorse -> /nix/store/3ee85e...-ironhorse-0.0.11
/nix/store/3ee85e...-ironhorse-0.0.11
  ...
  package.json
  node_modules/
    underscore -> /nix/store/10af96...-underscore-1.5.2
    mongoose -> /nix/store/a37f75...-mongoose-4.4.5
    temp -> /nix/store/fae379...-temp-0.8.3
/nix/store/ec704c...-mongoose-3.8.5
  package.json
  ...
/nix/store/a37f75...-mongoose-4.4.5
  package.json
  ...
/nix/store/10af96...-underscore-1.5.2
/nix/store/fae379...-temp-0.8.3

In the above directory structure, we can observe that two different versions of mongoose have been deployed -- version 3.8.5 (as a dependency for the project) and version 4.4.5 (as a dependency for ironhorse). Having two different versions of mongoose deployed typically leads to problems.

The reason why npm2nix produces a different result is because whenever NPM encounters a dependency specification, it recursively searches the parent directories to find a conforming version. If a conforming version has been found that fits within the version range of a package dependency, it will not be included again. This is also the reason why NPM can "handle" cyclic dependencies (despite the fact that they are a bad practice) -- when a dependency has been encountered a second time, it will not be deployed again causing NPM to break the cycle.

npm2nix did not implement this kind behaviour -- it always binds a dependency to the latest conforming version, but as can be observed in the last example, this is not what NPM always does -- it could also bind to a shared dependency that may be older than the latest version in the NPM registry (As a sidenote: I wonder how many NPM users actually know about this detail!).

Second attempt: simulating shared dependency behaviour


One of the main objectives in the reengineered version (as described in my previous blog post), is to more accurately mimic NPM's shared dependency behaviour, as the old behaviour was particularly problematic for packages having cyclic dependencies -- Nix does not allow them and causes the evaluation of the entire Nixpkgs set on the Hydra build server to fail as a result.

The reengineered version worked, but the solution was quite expensive and controversial -- I compose Nix expressions of all packages involved, in which each dependency resolves to the latest corresponding version.

Each time a package includes a dependency, I propagate an attribute set to its build function telling it which dependencies have already been resolved by any of the parents. Resolved dependencies get excluded as a dependency.

To check whether a resolved dependency fits within a version range specifier, I have to consult semver. Because semver is unsupported in the Nix expression language, I use a trick in which I import Nix expressions generated by a build script (that invokes the semver command-line utility) to figure out which dependencies have been resolved already.

Besides consulting semver, I used another hack -- packages that have been resolved by any of the parents must be excluded as a dependency. However, NPM packages in Nix are deployed independently from each other in separate build functions and will fail because NPM expects them to present. To solve this problem, I create shims for the excluded packages, by substituting them by empty packages with the same name and version, and removing them after the package has been built.

Symlinking the dependencies also no longer worked reliably -- the CommonJS module system dereferences the location of the includer first and looks in the parent directories for shared dependencies relative from there. This means in case of a symlink, it incorrectly resolves to a Nix store path that has no meaningful parent directories. The only solution I could think of is copying dependencies instead of symlinking them.

To summarize: the new solution worked more accurately than the original version (and can cope with cyclic dependencies) but it is quite inefficient as well -- making copies of dependencies causes a lot of duplication (that would be a waste of disk space) and building Nix expressions in the instantiation phase makes the process quite slow.

Third attempt: computing the dependency graph ahead of time


Apart from the earlier described inefficiencies, the main reason that I had to do yet another major revision is that Node.js 5.x (that includes npm 3.x) executes so-called "flat-module installations. The idea is that when a package includes a dependency, it will be stored in a node_modules/ folder as high in the directory structure as possible without breaking any dependencies.

This new approach has a number of implications. For example, deploying the Disnix virtual hosts test web application with the old npm 2.x used to yield the following directory structure:

webapp/
  ...
  package.json
  node_modules/
    express/
      ...
      package.json
      node_modules/
        accepts/
        array-flatten/
        content-disposition/
        ...
    ejs/
      ...
      package.json

As can be observed in the structure above, the test webapp depends on two packages: express and ejs. Express has dependencies of its own, such as accepts, array-flatten, content-disposition. Because no parent node_modules/ folder provides them, they are included privately for the express package.

Running 'npm install' with the new npm 3.x yields the following directory structure:

webapp/
  ...
  package.json
  node_modules/
    accepts/
    array-flatten/
    content-disposition/
    express/
      ...
      package.json
    ejs/
      ...
      package.json

Since the libraries that express requires do not conflict with the includer's dependencies, they have been moved one level up to the parent package's node_modules/ folder.

Flattening the directory structure makes deploying a NPM project even more imperative -- previously, the dependencies that were included with a package depend on the state of the includer. Now we must also modify the entire directory hierarchy of dependencies by moving packages up in the directory structure. It also makes the resulting dependency graph less predictable. For example, the order in which dependencies are installed matters -- unless all dependencies are discarded and reinstalled from scratch, it may result in different kinds of dependency graphs.

If this flat module approach has all kinds of oddities, why would NPM uses such an approach, you may wonder? It turns out that the only reason is: better Windows support. On Windows, there is a limit on the length on paths and flattening the directory structure helps to prevent hitting it. Unfortunately, it comes at the price of making deployments more imperative and less predictable.

To simulate this flattening strategy, I had to revise npm2nix again. Because of its previous drawbacks and the fact that we have to perform even more imperative operations, I have decided to implement a new strategy in which I compute the entire dependency graph ahead of time by the generator, instead of hacking it into the evaluation phase of the Nix expressions.

Supporting private and shared dependencies works exactly the same as in the old implementation, but is now performed ahead of time. Additionally, I simulate the flat dependency structure as follows:

  • When a package requires a dependency: I check whether the parent directory has a conflicting dependency. This means: it either has a dependency bundled with the same name and a different version or indirectly binds to another parent that provides a conflicting version.
  • If the dependency conflicts: bundle the dependency in the current package.
  • If the dependency does not conflict: bind the package to the dependency (but do not include it) and consult the parent package one level higher.

Besides computing the dependency graph ahead of time, I also deploy the entire dependency graph in one Nix build function -- because including dependencies is stateful, it no longer makes sense to build them as individual Nix packages, that are supposed to be pure.

I have made the flattening algorithm optional. By default, the new npm2nix generates Nix expressions for Node.js 4.x (using the old npm 2.x) release:

$ npm2nix

By appending the -5 parameter, it generates Nix expressions for usage with Node.js 5.x (using the new npm 3.x with flat module installations):

$ npm2nix -5

I have tested the new approach on many packages including my public projects. The good news is: they all seem to work!

Unfortunately, despite the fact that I could get many packages working, the approach is not perfect and hard to get 100% right. For example, in a private project I have encountered bundled dependencies (dependencies that are statically included with a package). NPM also moves them up, while npm2nix merely generates an expression composing the dependency graph (that reflects flat module installations as much as possible). To fix this issue, we must also run a post processing step that moves dependencies up that are in the wrong places. Currently, this step has not been implemented yet in npm2nix.

Another issue is that we want Nix to obtain all dependencies instead of NPM. To prevent NPM from consulting external resources, we substitute some version specifiers (such as Git repositories) by a wildcard: *. These version specifiers sometimes confuse NPM, despite the fact that the directory structure matches NPM's dependency structure.

To cope with these imperfections, I have also added an option to npm2nix to refrain it from running npm install -- in many cases, packages still work fine despite NPM being confused. Moreover, the npm install step in the Nix builder environment merely serves as a validation step -- the Nix builder script is responsible for actually providing the dependencies.

Discussion


In this blog post, I have described the path that has lead to a second reengineered version of npm2nix. The new version computes dependency graphs ahead of time and can mostly handle npm 3.x's flat module installations. Moreover, compared to the previous version, it does no longer rely on very expensive and ugly hacks.

Despite the fact that I can now more or less handle flat installations, I am still not quite happy. Some things that bug me are:

  • The habit of "reusing" modules that have been bundled with any of the includers, makes it IMO difficult and counter-intuitive to predict which version will actually be used in a certain context. In some cases, packagers might expect that the latest version of a version range will be used, but this is not guaranteed to be the case. This could, for example, reintroduce security and stability issues without end users noticing (or expecting) it.
  • Flat module installations are less deterministic and make it really difficult to predict what a dependency graph looks like -- the dependencies that appear at a certain level in the directory structure depend on the order in which dependencies are installed. Therefore, I do not consider this an improvement over npm 2.x.

Because of these drawbacks, I expect that NPM will reconsider some of its concepts again in the future causing npm2nix to break again.

I would recommend the NPM developers to use the following approach:

  • All involved packages should be stored in a single node_modules/ folder instead of multiple nested hierarchies of node_modules/ folders.
  • When a module requests another module, the module loader should consult the package.json configuration file of the package where the includer module belongs to. It should take the latest conforming version in the central node_modules/ folder. I consider taking the last version of a version range to be less counter-intuitive than taking any conforming version.
  • To be able to store multiple versions of packages in a single node_modules/ folder, a better directory naming convention should be adopted. Currently, NPM only identifies modules by name in a node_modules/ folder, making it impossible to store two versions next to each other in one directory.

    If they would, for example, use both the name and version numbers in the directory names, more things are possible. Adding more properties in the path names makes sharing even better -- for example, a package with a name and version number could originate from various sources, e.g. the NPM registry or a Git repository -- reflecting this in the path makes it possible to store more variants next to each other in a reliable way.

    Naming things to improve shareability is not really rocket science -- Nix uses hash codes (that are derived from all build-time dependencies) to uniquely identify packages and the .NET Global Assembly Cache uses so-called strong names that include various naming attributes, such as cryptographic keys to ensure that no library conflicts. I am convinced that adopting a better naming convention for storing NPM packages would be quite beneficial as well.
  • To cope with cyclic dependencies: I would simply say that it suffices to disallow them. Packages are supposed to be units of reuse, and if two packages mutually depend on each other, then they should be combined into one package.

Availability


The second reengineered npm2nix version can be obtained from my GitHub page. The code resides in the reengineering2 branch.