Sander van der Burg's blog: 2017

Saturday, December 30, 2017

Seventh annual blog reflection

Today, it is my blog's seventh anniversary. As with previous years, it is a nice opportunity to reflect over last year's writings.

Nix expression generators

As usual, the majority of my work is Nix related. This year, a substantial amount of my time was spend on the development of several kinds of Nix expression generators.

In the first quarter of the year, I improved node2nix's accuracy by implementing a more robust version pinning strategy to prevent NPM from consulting external sources. I also gave a talk about node2nix's state of affairs at FOSDEM 2017, attracting quite a large audience.

In addition to node2nix, I revived a number old PHP projects, modernized them a bit and developed composer2nix that can be used to automatically generate Nix expressions from composer (a package manager for PHP projects) configuration files. Similar to node2nix I have developed PNDP: an internal DSL for Nix in PHP to make the code generation process more robust and reliable.

I later unified some of the generation concepts of node2nix and composer2nix and extended both NiJS (an internal DSL for Nix in JavaScript) and PNDP (an internal DSL for Nix in PHP) to support custom object transformations, making the code of both generators much cleaner and better maintainable.

Finally, I had to make a substantial revision to node2nix to support NPM 5.x's content-addressable cache that conflicts with Nix's purity principles.

Disnix

In addition to the Nix expression generators, I have also released a new major version of Disnix in March: version 0.7.

For this new version, I have developed a new abstraction layer implementing multi-process programming patterns to improve the performance of certain deployment activities and to make the code more readable and better maintainable.

Another major feature addition is a deployment configuration reconstructor. Disnix deployment is centralized and when the coordinator machine disappears, a system can no longer be reliably upgraded. With the reconstructor, it is possible to recover from such failures.

Contrary to 2015 and 2016, I did not do as much Disnix development this year. Apart from these two major feature additions, I only did a couple of maintenance releases.

Other Nix features

I have also developed an NPM package providing an API that can be used to remotely control a Hydra server, a Nix-based continuous integration service, and a command-line client to demonstrate its possibilities.

Web technology

Besides Nix-development, I made a brief journey back in time to a technology area that I used to be very interested in (years ago) before I got involved with anything Nix-related and my research: web technology.

I used to develop a custom web framework around that time. I already published some pieces of it in 2014. In the middle of this year, I have isolated, documented and released the remaining parts. I also developed a number of simple example applications to demonstrate how the framework's features can be used.

In the process, some memories of the past resurfaced and I wrote an essay to reflect over them.

One of my not-so-strong points when it comes to web technology is anything layout and style related. I did a small investigation and wrote a checklist of minimalistic layout considerations for myself.

Mobile app development

I also wrote a blog post about the Model-View-Controller (MVC) paradigm and considerations while extending the company's product with chat functionality.

Blog posts

Like every year, I will publish the top 10 of the most frequently read blog posts so far:

On Nix and GNU Guix. This blog post is still the most popular for five years in a row. It seems that there are still plenty of people out there who want to know the differences. However, it appears that its position can soon be overtaken by the number 2.
Managing private Nix packages outside the Nixpkgs tree. This blog post is a tutorial written for Nix-beginners and seems to have grown quite considerably in popularity. It may soon become my most popular blog post.
An evaluation and comparison of Snappy Ubuntu. Still popular, but gradually dropping. I believe this can be attributed to the fact that Snappy has not been in the news for quite a while.
Setting up a multi-user Nix installation on non-NixOS systems. As with previous blog reflections, this post remains popular and still shows that this is an area open for improvement.
Yet another blog post about Object Oriented Programming and JavaScript. Was in last year's top 10 and remains popular. It appears that I did a fairly good job explaining the prototypes concept.
An alternative explanation of the Nix package manager. This blog post has also been in the top 10 for the last five years. It is gradually dropping in popularity. I still believe that it is important to have a good Nix package manager explanation recipe.
On NixOps, Disnix, service deployment and infrastructure deployment. This blog post was also in last year's top 10 and still popular. It is good to observe that people take interest in both NixOps and Disnix.
Asynchronous programming with JavaScript. A JavaScript-related blog post that remains popular.
The NixOS project and deploying systems declaratively. This is the only blog post that was not in last year's top 10. It seems to have quite some impact, in particular the corresponding presentation slides.
Composing FHS-compatible chroot environments with Nix (or deploying Steam in NixOS). Is still popular, but I expect this one to disappear from the top 10 next year.

Some thoughts

I am quite happy with the blog posts I produced this year, yet I have a few observations and ideas for next year to improve upon.

In the middle of this year, I had a significant drop in my blogging productivity (as may be observed by checking the publishing dates on the panel on the right). This drop was caused by a variety of things I will not elaborate about. It took me quite a bit of effort to get back into my usual rhythm and get another story published. This is something I should look after next year.

Another thing I observed by looking at my overall top 10 is that all blog posts except the 10th (about FHS-compatible chroot environments) were written for educational purposes. This year, I have not published any blog posts with education in mind. This is also something I should focus myself a bit more on next year.

Finally, the fact that I did not do so much Disnix development, does not mean that it is finished or that I am out of ideas. I still have a huge list of things that I would like to explore.

Conclusion

I'm still not out of ideas, so stay tuned! The final thing I'd like to say is:

HAPPY NEW YEAR!!!!!

Tuesday, December 19, 2017

Bypassing NPM's content addressable cache in Nix deployments and generating expressions from lock files

Roughly half a year ago, Node.js version 8 was released that also includes a major NPM package manager update (version 5). NPM version 5 has a number of substantial changes over the previous version, such as:

It uses package lock files that pinpoint the resolved versions of all dependencies and transitive dependencies. When a project with a bundled package-lock.json file is deployed, NPM will use the pinpointed versions of the packages that are in the lock file making it possible to exactly reproduce a deployment elsewhere. When a project without a lock file is deployed for the first time, NPM will generate a lock file.
It has a content-addressable cache that optimizes package retrieval processes and allows fully offline package installations.
It uses SHA-512 hashing (as opposed to the significantly weakened SHA-1), for packages published in the NPM registry.

Although these features offer significant benefits over previous versions, e.g. NPM deployments are now much faster, more secure and more reliable, it also comes with a big drawback -- it breaks the integration with the Nix package manager in node2nix. Solving these problems were much harder than I initially anticipated.

In this blog post, I will explain how I have adjusted the generation procedure to cope with NPM's new conflicting features. Moreover, I have extended node2nix with the ability to generate Nix expressions from package-lock.json files.

Lock files

One of the major new features in NPM 5.0 is the lock file (the idea itself is not so new since NPM-inspired solutions such as yarn and the PHP-based composer already support them for quite some time).

A major drawback of NPM's dependency management is that version specifiers are nominal. They can refer to specific versions of packages in the NPM registry, but also to version ranges, or external artifacts such as Git repositories. The latter category of version specifiers affect reproducibility -- for example, the version range specifier >= 1.0.0 may refer to version 1.0.0 today and to version 1.0.1 tomorrow making it extremely hard to reproduce a deployment elsewhere.

In a development project, it is still possible to control the versions of dependencies by using a package.json configuration that only refers to exact versions. However, for transitive dependencies that may still have loose version specifiers there is only very little control.

To solve this reproducibility problem, a package-lock.json file can be used -- a package lock file pinpoints the resolved versions of all dependencies and transitive dependencies making it possible to reproduce the exact same deployment elsewhere.

For example, for the NiJS package with the following package.json configuration:

{
  "name": "nijs",
  "version": "0.0.25",
  "description": "An internal DSL for the Nix package manager in JavaScript",
  "bin": {
    "nijs-build": "./bin/nijs-build.js",
    "nijs-execute": "./bin/nijs-execute.js"
  },
  "main": "./lib/nijs",
  "dependencies": {
    "optparse": ">= 1.0.3",
    "slasp": "0.0.4"
  },
  "devDependencies": {
    "jsdoc": "*"
  }
}

NPM may produce the following partial package-lock.json file:

{
  "name": "nijs",
  "version": "0.0.25",
  "lockfileVersion": 1,
  "requires": true,
  "dependencies": {
    "optparse": {
      "version": "1.0.5",
      "resolved": "https://registry.npmjs.org/optparse/-/optparse-1.0.5.tgz",
      "integrity": "sha1-dedallBmEescZbqJAY/wipgeLBY="
    },
    "requizzle": {
      "version": "0.2.1",
      "resolved": "https://registry.npmjs.org/requizzle/-/requizzle-0.2.1.tgz",
      "integrity": "sha1-aUPDUwxNmn5G8c3dUcFY/GcM294=",
      "dev": true,
      "requires": {
        "underscore": "1.6.0"
      },
      "dependencies": {
        "underscore": {
          "version": "1.6.0",
          "resolved": "https://registry.npmjs.org/underscore/-/underscore-1.6.0.tgz",
          "integrity": "sha1-izixDKze9jM3uLJOT/htRa6lKag=",
          "dev": true
        }
      }
    },
    ...
}

The above lock file pinpoints all dependencies and development dependencies including transitive dependencies to exact versions, including the locations where they can be obtained from and integrity hash codes that can be used to validate them.

The lock file can also be used to derive the entire structure of the node_modules/ folder in which all dependencies are stored. The top level dependencies property captures all packages that reside in the project's node_modules/ folder. The dependencies property of each dependency captures all packages that reside in a dependency's node_modules/ folder.

If NPM 5.0 is used and no package-lock.json is present in a project, it will automatically generate one.

Substituting dependencies

As mentioned in an earlier blog post, the most important technique to make Nix-NPM integration work is by substituting NPM's dependency management activities that conflict with Nix's dependency management -- Nix is much more strict with handling dependencies (e.g. it uses hash codes derived from the build inputs to identify a package as opposed to a name and version number).

Furthermore, in Nix build environments network access is restricted to prevent unknown artifacts to influence the outcome of a build. Only so-called fixed output derivations, whose output hashes should be known in advance (so that Nix can verify its integrity), are allowed to obtain artifacts from external sources.

To substitute NPM's dependency management, populating the node_modules/ folder ourselves with all required dependencies and substituting certain version specifiers, such as Git URLs, used to suffice. Unfortunately, with the newest NPM this substitution process no longer works. When running the following command in a Nix builder environment:

$ npm --offline install ...

The NPM package manager is forced to work in offline mode consulting its content-addressable cache for the retrieval of external artifacts. If NPM needs to consult an external resource, it throws an error.

Despite the fact that all dependencies are present in the node_modules/ folder, deployment fails with the following error message:

npm ERR! code ENOTCACHED
npm ERR! request to https://registry.npmjs.org/optparse failed: cache mode is 'only-if-cached' but no cached response available.

At first sight, the error message suggests that NPM always requires the dependencies to reside in the content-addressable cache to prevent it from downloading it from external sites. However, when we use NPM outside a Nix builder environment, wipe the cache, and perform an offline installation, it does seem to work properly:

$ npm install
$ rm -rf ~/.npm/_cacache
$ npm --offline install

Further experimentation reveals that NPM augments the package.json configuration files of all dependencies with additional metadata that are prefixed by an underscore (_):

{
  "_from": "optparse@>= 1.0.3",
  "_id": "optparse@1.0.5",
  "_inBundle": false,
  "_integrity": "sha1-dedallBmEescZbqJAY/wipgeLBY=",
  "_location": "/optparse",
  "_phantomChildren": {},
  "_requested": {
    "type": "range",
    "registry": true,
    "raw": "optparse@>= 1.0.3",
    "name": "optparse",
    "escapedName": "optparse",
    "rawSpec": ">= 1.0.3",
    "saveSpec": null,
    "fetchSpec": ">= 1.0.3"
  },
  "_requiredBy": [
    "/"
  ],
  "_resolved": "https://registry.npmjs.org/optparse/-/optparse-1.0.5.tgz",
  "_shasum": "75e75a96506611eb1c65ba89018ff08a981e2c16",
  "_spec": "optparse@>= 1.0.3",
  "_where": "/home/sander/teststuff/nijs",
  "name": "optparse",
  "version": "1.0.5",
  ...
}

It turns out that when the _integrity property in a package.json configuration matches the integrity field of the dependency in the lock file, NPM will not attempt to reinstall it.

To summarize, the problem can be solved in Nix builder environments by running a script that augments the package.json configuration files with _integrity fields with the values from the package-lock.json file.

For Git repository dependency specifiers, there seems to be an additional requirement -- it also seems to require the _resolved field to be set to the URL of the repository.

Reconstructing package lock files

The fact that we have discovered how to bypass the cache in a Nix builder environment makes it possible to fix the integration with the latest NPM. However, one of the limitations of this approach is that it only works for projects that have a package-lock.json file included.

Since lock files are still a relatively new concept, many NPM projects (in particular older projects that are not frequently updated) may not have a lock file included. As a result, their deployments will still fail.

Fortunately, we can reconstruct a minimal lock file from the project's package.json configuration and compose dependencies objects by traversing the package.json configurations inside the node_modules/ directory hierarchy.

The only attribute that cannot be immediately derived are the integrity fields containing hashes that are used for validation. It seems that we can bypass the integrity check by providing a dummy hash, such as:

integrity: "sha1-000000000000000000000000000=",

NPM does not seem to object when it encounters these dummy hashes allowing us to deploy projects with a reconstructed package-lock.json file. The solution is a very ugly hack, but it seems to work.

Generating Nix expressions from lock files

As explained earlier, lock files pinpoint the exact versions of all dependencies and transitive dependencies and describe the structure of the entire dependency graph.

Instead of simulating NPM's dependency resolution algorithm, we can also use the data provided by the lock files to generate Nix expressions. Lock files appear to contain most of the data we need -- the URLs/locations of the external artifacts and integrity hashes that we can use for validation.

Using lock files for generation offer the following advantages:

We no longer need to simulate NPM's dependency resolution algorithm. Despite my best efforts and fairly good results, it is hard to truly make it 100% identical to NPM's. When using a lock file, the dependency graph is already given, making deployment results much more accurate.
We no longer need to consult external resources to resolve versions and compute hashes making the generation process much faster. The only exception seems to be Git repositories -- Nix needs to know the output hash of the clone whereas for NPM the revision hash suffices. When we encounter a Git dependency, we still need to download it and compute the output hash.

Another minor technical challenge are the integrity hashes -- in NPM lock files integrity hashes are in base-64 notation, whereas Nix uses heximal notation or its own custom base-32 notation. We need to convert the NPM integrity hashes to a notation that Nix understands.

Unfortunately, lock files can only be used in development projects. It appears that packages that are installed directly from the NPM registry, e.g. end-user packages that are installed globally through npm install -g, never include a package lock file. (It even seems that the NPM registry blacklist the lock files when publishing a package in the registry).

For this reason, we still need to keep our own implementation of the dependency resolution algorithm.

Usage

By adding a script that augments the dependencies' package.json configuration files with _integrity fields and by optionally reconstructing a package-lock.json file, NPM integration with Nix has been restored.

Using the new NPM 5.x features is straight forward. The following command can be used to generate Nix expressions for a development project with a lock file:

$ node2nix -8 -l package-lock.json

The above command will directly generate Nix expressions from the package lock file, resulting in a much faster generation process.

When a development project does not ship with a lock file, you can use the following command-line instruction:

$ node2nix -8

The generator will use its own implementation of NPM's dependency resolution algorithm. When deploying the package, the builder will reconstruct a dummy lock file to allow the deployment to succeed.

In addition to development projects, it is also possible to install end-user software, by providing a JSON file (e.g. pkgs.json) that defines an array of dependency specifiers:

[
  "nijs"
, { "node2nix": "1.5.0" }
]

A Node.js 8 compatible expression can be generated as follows:

$ node2nix -8 -i pkgs.json

Discussion

The approach described in this blog post is not the first attempt to fix NPM 5.x integration. In my first attempt, I tried populating NPM's content-addressable cache in the Nix builder environment with artifacts that were obtained by the Nix package manager and forcing NPM to work in offline mode.

NPM exposes its download and cache-related functionality as a set of reusable APIs. For downloading packages from the NPM registry, pacote can be used. For downloading external artifacts through the HTTP protocol make-fetch-happen can be used. Both APIs are built on top of the content-addressable cache that can be controlled through the lower-level cacache API.

The real difficulty is that neither the high-level NPM APIs nor the npm cache command-line instruction work with local directories or local files -- they will only add artifacts to the cache if they come from a remote location. I have partially built my own API on top of cacache to populate the NPM cache with locally stored artifacts pretending that they were fetched from a remote location.

Although I had some basic functionality supported, it turned out to be much more complicated and time consuming to get all functionality implemented.

Furthermore, the NPM authors never promised that these APIs are stable, so the implementation may break at some point in time. As a result, I have decided to look for another approach.

Availability

I just released node2nix version 1.5.0 with NPM 5.x support. It can be obtained from the NPM registry, Github, or directly from the Nixpkgs repository.

Friday, December 8, 2017

Controlling a Hydra server from a Node.js application

In a number of earlier blog posts, I have elaborated about the development of custom configuration management solutions that use Nix to carry out a variety (or all) of its deployment tasks.

For example, an interesting consideration is that when a different programming language than Nix's expression language is used, an internal DSL can be used to make integration with the Nix expression language more convenient and safe.

Another problem that might surface when developing a custom solution is the scale at which builds are carried out. When it is desired to carry out builds on a large scale, additional concerns must be addressed beyond the solutions that the Nix package manager provides, such as managing a build queue, timing out builds that appear to be stuck, and sending notifications in case of a success or failure.

In such cases, it may be very tempting to address these concerns in your own custom build solution. However, some of these concerns are quite difficult to implement, such as build queue management.

There is already a Nix project that solves many of these problems, namely: Hydra, the Nix-based continuous integration service. One of Hydra's lesser known features is that it also exposes many of its operations through a REST API. Apart from a very small section in the Hydra manual, the API is not very well documented and -- as a result -- a bit cumbersome to use.

In this blog post, I will describe a recently developed Node.js package that exposes most of Hydra's functionality with an API that is documented and convenient to use. It can be used to conveniently integrate a Node.js application with Hydra's build management services. To demonstrate its usefulness, I have developed a simple command-line utility that can be used to remotely control a Hydra instance.

Finding relevant API calls

The Hydra API is not extensively documented. The manual has a small section titled: "Using the external API" that demonstrates some basic usage scenarios, such as querying data in JSON format.

Querying data is basically a nearly identical operation to opening the Hydra web front-end in a web browser. The only difference is that the GET request should include a header field stating that the output format should be displayed in JSON format.

For example, the following request fetches an overview of projects in JSON format (as opposed to HTML which is the default):

$ curl -H "Accept: application/json" https://hydra.nixos.org

In addition to querying data in JSON format, Hydra supports many additional REST operations, such as creating, updating or deleting projects and jobsets. Unfortunately, these operations are not documented anywhere in the manual.

By analyzing the Hydra source code and running the following script I could (sort of) semi-automatically derive the REST operations that are interesting to invoke:

$ cd hydra/src/lib/Hydra/Controller
$ find . -type f | while read i
do
    echo "# $i"
    grep -E '^sub [a-zA-Z0-9]+[ ]?\:' $i
    echo
done

Running the script shows me the following (partial) output:

# ./Project.pm
sub projectChain :Chained('/') :PathPart('project') :CaptureArgs(1) {
sub project :Chained('projectChain') :PathPart('') :Args(0) :ActionClass('REST::ForBrowsers') { }
sub edit : Chained('projectChain') PathPart Args(0) {
sub create : Path('/create-project') {
...

The web front-end component of Hydra uses Catalyst, a Perl-based MVC framework. One of the conventions that Catalyst uses is that every request invokes an annotated Perl subroutine.

The above script scans the controller modules, and shows for each module annotated subroutines that may be of interest. In particular, the sub routines with a :ActionClass('REST::ForBrowsers') annotation are relevant -- they encapsulate create, read, update and delete operations for various kinds of data.

For example, the project subroutine shown above supports the following REST operations:

sub project_GET {
    ...
}

sub project_PUT {
    ...
}

sub project_DELETE {
    ...
}

The above operations will query the properties a project (GET), create a new or update an existing project (PUT) or delete a project (DELETE).

With the manual, the extracted information and reading the source code a bit (which is unavoidable since many details are missing such as formal function parameters), I was able to develop a client API supporting a substantial amount of Hydra features.

API usage

Using client the API is relatively straight forward. The general idea is to instantiate the HydraConnector prototype to connect to a Hydra server, such as a local test instance:

var HydraConnector = require('hydra-connector').HydraConnector;

var hydraConnector = new HydraConnector("http://localhost");

and then invoke any of its supported operations:

hydraConnector.queryProjects(function(err, projects) {
    if(err) {
        console.log("Some error occurred: "+err);
    } else {
        for(var i = 0; i < projects.length; i++) {
            var project = projects[i];
            console.log("Project: "+project.name);
        }
    }
});

The above code fragment fetches all projects and displays their names.

Write operations require a user to be logged in with the appropriate permissions. By invoking the login() method, we can authenticate ourselves:

hydraConnector.login("admin", "myverysecretpassword", function(err) {
    if(err) {
        console.log("Login succeeded!");
    } else {
        console.log("Some error occurred: "+err);
    }
});

Besides logging in, the client API also implements a logout() operation to relinquish write operation rights.

As explained in an earlier blog post, write operations require user authentication but all data is publicly readable. When it is desired to restrict read access, Hydra can be placed behind a reverse proxy that requires HTTP basic authentication.

The client API also supports HTTP basic authentication to support this usage scenario:

var hydraConnector = new HydraConnector("http://localhost",
    "sander",
    "12345"); // HTTP basic credentials

Using the command-line client

To demonstrate the usefulness of the API, I have created a utility that serves as a command-line equivalent of Hydra's web front-end. The following command shows an overview of projects:

$ hydra-connect --url https://hydra.nixos.org --projects

As may be observed in the screenshot above, the command-line utility provides suggestions for additional command-line instructions that could be relevant, such as querying more detailed information or modifying data.

The following command shows the properties of an individual project:

$ hydra-connect --url https://hydra.nixos.org --project disnix

By default, the command-line utility will show a somewhat readable textual representation of the data. It is also possible to display the "raw" JSON data of any request for debugging purposes:

$ hydra-connect --url https://hydra.nixos.org --project disnix --json

We can also use the command-line utility to create or edit data, such as a project:

$ hydra-connect --url https://hydra.nixos.org --login
$ export HYDRA_SESSION=...
$ hydra-connect --url https://hydra.nixos.org --project disnix --modify

The application will show command prompts asking the user to provide all the relevant project properties.

When changes have been triggered, the active builds in the queue can be inspected by running:

$ hydra-connect --url https://hydra.nixos.org --status

Many other operations are supported. For example, you can inspect the properties of an individual build:

$ hydra-connect --url https://hydra.nixos.org --build 65100054

And request various kinds of build related artifacts, such as the raw build log of a build:

$ hydra-connect --url https://hydra.nixos.org --build 65100054 --raw-log

or download one of its build products:

$ hydra-connect --url https://hydra.nixos.org --build 65100054 \
  --build-product 4 > /home/sander/Download/disnix-0.7.2.tar.gz

Conclusion

In this blog post, I have shown the node-hydra-connector package that can be used to remotely control a Hydra server from a Node.js application. The package can be obtained from the NPM registry and the corresponding GitHub repository.

Friday, November 3, 2017

Creating custom object transformations with NiJS and PNDP

In a number earlier blog posts, I have described two kinds of internal DSLs for Nix -- NiJS is a JavaScript-based internal DSL and PNDP is a PHP-based internal DSL.

These internal DSLs have a variety of application areas. Most of them are simply just experiments, but the most serious application area is code generation.

Using an internal DSL for generation has a number of advantages over string generation that is more commonly used. For example, when composing strings containing Nix expressions, we must make sure that any variable in the host language that we append to a generated expression is properly escaped to prevent code injection attacks.

Furthermore, we also have to take care of the indentation if we want to output Nix expression code that should be readable. Finally, string manipulation itself is not a very intuitive activity as it makes it very hard to read what the generated code would look like.

Translating host language objects to the Nix expression language

A very important feature of both internal DSLs is that they can literally translate some language constructs from the host language (JavaScript or PHP) to the Nix expression because they have (nearly) an identical meaning. For example, the following JavaScript code fragment:

var nijs = require('nijs');

var expr = {
  hello: "Hello",
  name: {
    firstName: "Sander",
    lastName: "van der Burg"
  },
  numbers: [ 1, 2, 3, 4, 5 ]
};

var output = nijs.jsToNix(expr, true);
console.log(output);

will output the following Nix expression:

{
  hello = "Hello",
  name = {
    firstName = "Sander";
    lastName = "van der Burg";
  };
  numbers = [
    1
    2
    3
    4
    5
  ];
}

In the above example, strings will be translated to strings (and quotes will be escaped if necessary), objects to attribute sets, and the array of numbers to a list of numbers. Furthermore, the generated code is also pretty printed so that attribute set and list members have 2 spaces of indentation.

Similarly, in PHP we can compose the following code fragment to get an identical Nix output:

use PNDP\NixGenerator;

$expr = array(
  "hello" => "Hello",
  "name" => array(
    "firstName" => "Sander",
    "lastName => "van der Burg"
  ),
  "numbers" => array(1, 2, 3, 4, 5)
);

$output = NixGenerator::phpToNix($expr, true);
echo($output);

The PHP generator uses a number of clever tricks to determine whether an array is associative or sequential -- the former gets translated into a Nix attribute set while the latter gets translated into a list.

There are objects in the Nix expression language for which no equivalent exists in the host language. For example, Nix also allows you to define objects of a 'URL' and 'file' type. Neither JavaScript nor PHP have a direct equivalent. Moreover, it may be desired to generate other kinds of language constructs, such as function declarations and function invocations.

To still generate these kinds of objects, you must compose an abstract syntax tree from objects that inherit from the NixObject prototype or class. For example, we can define a function invocation to fetchurl {} in Nixpkgs as follows in JavaScript:

var expr = new nijs.NixFunInvocation({
    funExpr: new nijs.NixExpression("fetchurl"),
    paramExpr: {
        url: new nijs.NixURL("mirror://gnu/hello/hello-2.10.tar.gz"),
        sha256: "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"
    }
});

and in PHP as follows:

use PNDP\AST\NixExpression;
use PNDP\AST\NixFunInvocation;
use PNDP\AST\NixURL;

$expr = new NixFunInvocation(new NixExpression("fetchurl"), array(
    "url" => new NixURL("mirror://gnu/hello/hello-2.10.tar.gz"),
    "sha256" => "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"
));

Both of the objects in the above code fragments translate to the following Nix expression:

fetchurl {
  url = mirror://gnu/hello/hello-2.10.tar.gz;
  sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i";
}

Transforming custom object structures into Nix expressions

The earlier described use cases are basically one-on-one translations from the host language (JavaScript or PHP) to the guest language (Nix). In some cases, literal translations do not make sense -- for example, it may be possible that we already have an application with an existing data model from which we want to derive deployments that should be carried out with Nix.

In the latest versions of NiJS and PNDP, it is also possible to specify how to transform custom object structures into a Nix expression. This can be done by inheriting from the NixASTNode class or prototype and overriding the toNixAST() method.

For example, we may have a system already providing a representation of a file that should be downloaded from an external source:

function HelloSourceModel() {
    this.src = "mirror://gnu/hello/hello-2.10.tar.gz";
    this.sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i";
}

The above module defines a constructor function composing an object that refers to the GNU Hello package provided by a GNU mirror site.

A direct translation of an object constructed by the above function to the Nix expression language does not provide anything meaningful -- it can, for example, not be used to let Nix fetch the package from the mirror site.

We can inherit from NixASTNode and implement our own custom toNixAST() function to provide a more meaningful Nix translation:

var nijs = require('nijs');
var inherit = require('nijs/lib/ast/util/inherit.js').inherit;

/* HelloSourceModel inherits from NixASTNode */
inherit(nijs.NixASTNode, HelloSourceModel);

/**
 * @see NixASTNode#toNixAST
 */
HelloSourceModel.prototype.toNixAST = function() {
    return this.args.fetchurl()({
        url: new nijs.NixURL(this.src),
        sha256: this.sha256
    });
};

The toNixAST() function shown above composes an abstract syntax tree (AST) for a function invocation to fetchurl {} in the Nix expression language with the url and sha256 properties a parameters.

An object that inherits from the NixASTNode prototype also indirectly inherits from NixObject. This means that we can directly attach such an object to any other AST object. The generator uses the underlying toNixAST() function to automatically convert it to its AST representation:

var helloSource = new HelloSourceModel();
var output = nijs.jsToNix(helloSource, true);
console.log(output);

In the above code fragment, we directly pass the construct HelloSourceModel object instance to the generator. The output will be the following Nix expression:

fetchurl {
  url = mirror://gnu/hello/hello-2.10.tar.gz;
  sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i";
}

In some cases, it may not be possible to inherit from NixASTNode, for example, when the object already inherits from another prototype or class that is beyond the user's control.

It is also possible to use the NixASTNode constructor function as an adapter. For example, we can take any object with a toNixAST() function:

var helloSourceWrapper = {
    toNixAST: function() {
        return new nijs.NixFunInvocation({
            funExpr: new nijs.NixExpression("fetchurl"),
            paramExpr: {
                url: new nijs.NixURL(this.src),
                sha256: this.sha256
            }
        });
    }
};

By wrapping the helloSourceWrapper object in the NixASTNode constructor, we can convert it to an object that is an instance of NixASTNode:

new nijs.NixASTNode(helloSourceWrapper)

In PHP, we can change any class into a NixASTNode by implementing the NixASTConvertable interface:

use PNDP\AST\NixASTConvertable;
use PNDP\AST\NixURL;

class HelloSourceModel implements NixASTConvertable
{
    /**
     * @see NixASTConvertable::toNixAST()
     */
    public function toNixAST()
    {
        return $this->args->fetchurl(array(
            "url" => new NixURL($this->src),
            "sha256" => $this->sha256
        ));
    }
}

By passing an object that implements the NixASTConvertable interface to the NixASTNode constructor, it can be converted:

new NixASTNode(new HelloSourceModel())

Motivating use case: the node2nix and composer2nix generators

My main motivation to use custom transformations is to improve the quality of the node2nix and composer2nix generators -- the former converts NPM package configurations to Nix expressions and the latter converts PHP composer package configurations to Nix expressions.

Although NiJS and PNDP provide a number of powerful properties to improve the code generation steps of these tools, e.g. I no longer have to think much about escaping strings or pretty printing, there are still many organizational coding issues left. For example, the code that parses the configurations, fetches the external sources, and generates the code are mixed. As a consequence, the code is very hard to read, update, maintain and to ensure its correctness.

The new transformation facilities allow me to separate concerns much better. For example, both generators now have a data model that reflects the NPM and composer problem domain. For example, I could compose the following (simplified) class diagram for node2nix's problem domain:

A crucial part of node2nix's generator is the package class shown on the top left on the diagram. A package requires zero or more packages as dependencies and may provide zero or more packages in the node_modules/ folder residing in the package's base directory.

For readers not familiar with NPM's dependency management: every package can install its dependencies privately in a node_modules/ folder residing in the same base directory. The CommonJS module ensures that every file is considered to be a unique module that should not interfere with other modules. Sharing is accomplished by putting a dependency in a node_modules/ folder of an enclosing parent package.

NPM 2.x always installs a package dependency privately unless a parent package exists that can provide a conforming version. NPM 3.x (and later) will also move a package into the node_modules/ folder hierarchy as high as possible to prevent too many layers of nested node_modules/ folders (this is particularly a problem on Windows). The class structure in the above diagram reflects this kind of dependency organisation.

In addition to a package dependency graph, we also need to obtain package metadata and compute their output hashes. NPM packages originate from various kinds of sources, such as the NPM registry, Git repositories, HTTP sites and local directories on the filesystem.

To optimize the process and support sharing of common sources among packages, we can use a source cache that memorizes all unique source referencess.

The Package::resolveDependencies() method sets the generation process in motion -- it will construct the dependency graph replicating NPM's dependency resolution algorithm as faithfully as possible, and resolves all the dependencies' (and transitive dependencies) metadata.

After resolving all dependencies and their metadata, we must generate the output Nix expressions. One Nix expression is copied (the build infrastructure) and two are generated -- a composition expression and a package or collection expression.

We can also compose a class diagram for the generation infrastructure:

In the above class diagram, every generated expression is represented a class inheriting from NixASTNode. We can also reuse some classes from the domain model as constituents for the generated expressions, by also inheriting from NixASTNode and overriding the toNixAST() method:

The source objects can be translated into sub expressions that invoke fetchurl {} and fetchgit {}.
The sources cache can be translated into an attribute set exposing all sources that are used as dependencies for packages.
A package instance can be converted into a function invocation to nodeenv.buildNodePackage {} that, in addition to configuring build properties, binds the required dependencies to the sources in the sources cache attribute set.

By decomposing the expression into objects and combining the objects' AST representations, we can nicely modularize the generation process.

For composer2nix, we can also compose a class diagram for its domain -- the generation process:

The above class diagram has many similarities, but also some major differences compared to node2nix. composer provides so-called lock files that pinpoint the exact versions of all dependencies and transitive dependencies. As a result, we do not need to replicate composer's dependency resolution algorithm.

Instead, the generation process is driven by the ComposerConfig class that encapsulates the properties of the composer.json and composer.lock files of a package. From a composer configuration, the generator constructs a package object that refers to the package we intend to deploy and populates a source cache with source objects that come from various sources, such as Git, Mercurial and Subversion repositories, Zip files, and directories residing on the local filesystem.

For the generation process, we can adopt a similar strategy that exposes the generated Nix expressions as classes and uses some classes of the domain model as constituents for the generation process:

Discussion

In this blog post, I have described a new feature for the NiJS and PNDP frameworks, making it possible to implement custom transformations. Some of its benefits are that it allows an existing object model to be reused and concerns in an application can be separated much more conveniently.

These facilities are not only useful for the improvement of the architecture of the node2nix and composer2nix generators -- at the company I work for (Conference Compass), we developed our own domain-specific configuration management tool.

Despite the fact that it uses several tools from the Nix project to carry out deployments, it uses a domain model that is not Nix-specific at all. Instead, it uses terminology and an organization that reflects company processes and systems.

For example, we use a backend-for-frontend organization that provides a backend for each mobile application that we ship. We call these backends configurators. Optionally, every configurator can import data from various external sources that we call channels. The tool's data model reflects this kind of organization, and generates Nix expressions that contain all relevant implementation details, if necessary.

Finally, the fact that I modified node2nix to have a much cleaner architecture has another reason beyond quality improvement. Currently, NPM version 5.x (that comes with Node.js 8.x) is still unsupported. To make it work with Nix, we require a slightly different generation process and a completely different builder environment. The new architecture allows me to reuse the common parts much more conveniently. More details about the new NPM support will follow (hopefully) soon.

Availability

I have released new versions of NiJS and PNDP that have the custom transformation facilities included.

Furthermore, I have decided to release new versions for node2nix and composer2nix that use the new generation facilities in their architecture. The improved architecture revealed a very uncommon but nasty bug with bundled dependencies in node2nix, that is now solved.

Tuesday, October 3, 2017

Deploying PHP composer packages with the Nix package manager

In two earlier blog posts, I have described various pieces of my custom web framework that I used to actively develop many years ago. The framework is quite modular -- every concern, such as layout management, data management, the editor, and the gallery, are separated into packages that can be deployed independently, so that web applications only have to include what they actually need.

Although modularity is quite useful for a variety of reasons, the framework did not start out as being modular in the beginning -- when I just started developing web applications in PHP, I did not reuse anything at all. Slowly, I discovered similarities between my projects and started sharing snippets of common functionality between them. Gradually, I learned that keeping these common aspects up to date became a burden. As a result, I developed a "common framework" that I reused among all my PHP projects.

Having a common framework for my web application projects reduced the amount of required maintenance, but introduced a new drawback -- its size kept growing and growing. As a result, many simple web applications that only required a small subset of the framework's functionality still had to embed the entire framework, making them unnecessarily big.

Today, a bit of extra PHP code is not so much of a problem, but around the time I was still actively developing web applications, many shared web hosting providers only offered a small amount of storage capacity, typically just a few megabytes.

To cope with the growing size of the framework, I decided to modularize the code by separating the framework's concerns into packages that can be deployed independently. I "invented" my own conventions to integrate the framework packages into web applications:

In the base directory of the web application project, I create a lib/ directory that contains symlinks to the framework packages.
In every PHP script that displays a page (typically only index.php), I configure the include path to refer to the packages' content in the lib/ folder, such as:
```
set_include_path("./lib/sblayout:./lib/sbdata:./lib/sbcrud");
```

Each PHP module is responsible for loading the desired classes or utility functions from the framework packages. As a result, I ended up writing a substantial amount of require() statements, such as:

require_once("data/model/Form.class.php");
require_once("data/model/field/HiddenField.class.php");
require_once("data/model/field/TextField.class.php");
require_once("data/model/field/DateField.class.php");
require_once("data/model/field/TextAreaField.class.php");
require_once("data/model/field/URLField.class.php");
require_once("data/model/field/FileField.class.php");

After my (approximately) 8 years of absence from the PHP domain, I discovered that a tool has been developed to support convenient construction of modular PHP applications: composer. Composer is heavily inspired by the NPM package manager, that is the defacto package delivery mechanism for Node.js applications.

In the last couple of months (it progresses quite slowly as it is a non-urgent side project), I have decided to get rid of my custom modularity conventions in my framework packages, and to adopt composer instead.

Furthermore, composer is a useful deployment tool, but its scope is limited to PHP applications only. As frequent readers may probably already know, I use Nix-based solutions to deploy entire software systems (that are also composed of non-PHP packages) from a single declarative specification.

To be able to include PHP composer packages in a Nix deployment process, I have developed a generator named: composer2nix that can be used to generate Nix deployment expressions from composer configuration files.

In this blog post, I will explain the concepts of composer2nix and show how it can be used.

Using composer

Using composer is generally quite straight forward. In the most common usage scenario, there is typically a PHP project (often a web application) that requires a number of dependencies. By changing the current working folder to the project directory, and running:

$ composer install

Composer will obtain all required dependencies and stores them in the vendor/ sub directory.

The vendor/ folder follows a very specific organisation:

$ find vendor/ -maxdepth 2 -type d
vendor/bin
vendor/composer
vendor/phpdocumentor
vendor/phpdocumentor/fileset
vendor/phpdocumentor/graphviz
vendor/phpdocumentor/reflection-docblock
vendor/phpdocumentor/reflection
vendor/phpdocumentor/phpdocumentor
vendor/svanderburg
vendor/svanderburg/pndp
...

The vendor/ folder structure (mostly) consists two levels: the outer directory defines the namespace of the packages and the inner directory the package names.

There are a couple of folders deviating from this convention -- most notably, the vendor/composer directory, that is used by composer to track package installations:

$ ls vendor/composer
autoload_classmap.php
autoload_files.php
autoload_namespaces.php
autoload_psr4.php
autoload_real.php
autoload_static.php
ClassLoader.php
installed.json
LICENSE

In addition to obtaining packages and storing them in the vendor/ folder, composer also generates autoload scripts (as shown above) that can be used to automatically make code units (typically classes) provided by the packages available for use in the project. Adding the following statement to one of your project's PHP scripts:

require_once("vendor/autoload.php");

suffices to load the functionality exposed by the packages that composer installs.

Composer can be used to install both runtime and development dependencies. Many development dependencies (such as phpunit or phpdocumentor) provide command-line utilities to carry out tasks. Composer packages can also declare which executables they provide. Composer automatically generates symlinks for all provided executables in the: vendor/bin folder:

$ ls -l vendor/bin/
lrwxrwxrwx 1 sander users 29 Sep 26 11:49 jsonlint -> ../seld/jsonlint/bin/jsonlint
lrwxrwxrwx 1 sander users 41 Sep 26 11:49 phpdoc -> ../phpdocumentor/phpdocumentor/bin/phpdoc
lrwxrwxrwx 1 sander users 45 Sep 26 11:49 phpdoc.php -> ../phpdocumentor/phpdocumentor/bin/phpdoc.php
lrwxrwxrwx 1 sander users 34 Sep 26 11:49 pndp-build -> ../svanderburg/pndp/bin/pndp-build
lrwxrwxrwx 1 sander users 46 Sep 26 11:49 validate-json -> ../justinrainbow/json-schema/bin/validate-json

For example, you can run the following command-line instruction from the base directory of a project to generate API documentation:

$ vendor/bin/phpdocumentor -d src -t out

In some cases (the composer documentation often discourages this) you may want to install end-user packages globally. They can be installed into the global composer configuration directory by running:

$ composer global require phpunit/phpunit

After installing a package globally (and adding: $HOME/.config/composer/vendor/bin directory to the PATH environment variable), we should be able to run:

$ phpunit --help

The composer configuration

The deployment operations that composer carries out are driven by a configuration file named: composer.json. An example of such a configuration file could be:

{
  "name": "svanderburg/composer2nix",
  "description": "Generate Nix expressions to build PHP composer packages",
  "type": "library",
  "license": "MIT",
  "authors": [
      {
          "name": "Sander van der Burg",
          "email": "svanderburg@gmail.com",
          "homepage": "http://sandervanderburg.nl"
      }
  ],

  "require": {
      "svanderburg/pndp": "0.0.1"
  },
  "require-dev": {
      "phpdocumentor/phpdocumentor": "2.9.x"
  },

  "autoload": {
      "psr-4": { "Composer2Nix\\": "src/Composer2Nix" }
  },

  "bin": [ "bin/composer2nix" ]
}

The above configuration file declares the following configuration properties:

A number of meta attributes, such as the package name, description, license and authors.
The package type. The type: library indicates that this project is a library that can be used in another project.
The project's runtime (require) and development (require-dev) dependencies. In a dependency object, the keys refer to the package names and the values to version specifications that can be either:
- A semver compatible version specifier that can be an exact version (e.g. 0.0.1), wildcard (e.g. 1.0.x), or version range (e.g. >= 1.0.0).
- A version alias that directly (or indirectly) resolves to a branch in the VCS repository of the dependency. For example, the dev-master version specifier refers to the current master branch of the Git repository of the package.
The autoloader configuration. In the above example, we configure the autoloader to load all classes belonging to the Composer2Nix namespace, from the src/Composer2Nix sub directory.

By default, composer obtains all packages from the Packagist repository. However, it is also possible to consult other kinds of repositories, such as external HTTP sites or VCS repositories of various kinds (including Git, Mercurial and Subversion).

External repositories can be specified by adding a 'repositories' object to the composer configuration:

{
  "name": "svanderburg/composer2nix",
  "description": "Generate Nix expressions to build PHP composer packages",
  "type": "library",
  "license": "MIT",
  "authors": [
      {
          "name": "Sander van der Burg",
          "email": "svanderburg@gmail.com",
          "homepage": "http://sandervanderburg.nl"
      }
  ],
  "repositories": [
      {
          "type": "vcs",
          "url": "https://github.com/svanderburg/pndp"
      }
  ],

  "require": {
      "svanderburg/pndp": "dev-master"
  },
  "require-dev": {
      "phpdocumentor/phpdocumentor": "2.9.x"
  },

  "autoload": {
      "psr-4": { "Composer2Nix\\": "src/Composer2Nix" }
  },

  "bin": [ "bin/composer2nix" ]
}

In the above example, we have defined PNDP's GitHub repository as an external repository and changed the version specifier of svanderburg/pndp to use the latest Git master branch.

Composer uses a version resolution strategy that will parse composer configuration files and branch names in all repositories to figure out where a version can be obtained from and takes the first option that matches the dependency specification. Packagist is consulted last, making it possible for the user to override dependencies.

Pinpointing dependency versions

The version specifiers of dependencies in a composer.json configuration file are nominal and have some drawbacks when it comes to reproducibility -- for example, the version specifier: >= 1.0.1 may resolve to version 1.0.2 today and to 1.0.3 tomorrow, making it very difficult to exactly reproduce a deployment elsewhere at a later point in time.

Although direct dependencies can be easily controlled by the user, it is quite difficult to control the version resolutions of the transitive dependencies. To cope with this problem, composer will always generate lock files (composer.lock) that pinpoint the exact dependency versions (including all transitive dependencies) the first time when it gets invoked (or when composer update is called):

{
    "_readme": [
        "This file locks the dependencies of your project to a known state",
        "Read more about it at https://getcomposer.org/doc/01-basic-usage.md#composer-lock-the-lock-file",
        "This file is @generated automatically"
    ],
    "content-hash": "ca5ed9191c272685068c66b76ed1bae8",
    "packages": [
        {
            "name": "svanderburg/pndp",
            "version": "v0.0.1",
            "source": {
                "type": "git",
                "url": "https://github.com/svanderburg/pndp.git",
                "reference": "99b0904e0f2efb35b8f012892912e0d171e9c2da"
            },
            "dist": {
                "type": "zip",
                "url": "https://api.github.com/repos/svanderburg/pndp/zipball/99b0904e0f2efb35b8f012892912e0d171e9c2da",
                "reference": "99b0904e0f2efb35b8f012892912e0d171e9c2da",
                "shasum": ""
            },
            "bin": [
                "bin/pndp-build"
            ],
            ...
        }
        ...
    ]
}

By bundling the composer.lock file with the package, it becomes possible to reproduce a deployment elsewhere with the exact same package versions.

The Nix package manager

Nix is a package manager whose main purpose is to build all kinds of software packages from source code, such as GNU Autotools, CMake, Perl's MakeMaker, Apache Ant, and Python projects.

Nix's main purpose is not be a build tool (it can actually also be used for building projects, but this application area is still highly experimental). Instead, Nix manages dependencies and complements existing build tools by providing dedicated build environments to make deployments reliable and reproducible, such as clearing all environment variables, making files read-only after the package has been built, restricting network access and resetting the files' timestamps to 1.

Most importantly, in these dedicated environments Nix ensures that only specified dependencies can be found. This may probably sound inconvenient at first, but this property exists for a good reason: if a package unknowingly depends on another package then it may work on the machine where it has been built, but may fail on another machine because this unknown dependency is missing. By building a package in a pure environment in which all dependencies are known, we eliminate this problem.

To provide stricter purity guarantees, Nix isolates packages by storing them in a so-called "Nix store" (that typically resides in: /nix/store) in which every directory entry corresponds to a package. Every path in the Nix store is prefixed by hash code, such as:

/nix/store/2gi1ghzlmb1fjpqqfb4hyh543kzhhgpi-firefox-52.0.1

The hash is derived from all build-time dependencies to build the package.

Because every package is stored in its own path and variants of packages never share the same name because of the hash prefix, it becomes harder for builds to accidentally succeed because of undeclared dependencies. Dependencies can only be found if the environment has been configured in such a way that the Nix store paths to the packages are known, for example, by configuring environment variables, such as: export PATH=/nix/store/5vyssyqvbirdihqrpqhbkq138ax64bjy-gnumake-4.2.1/bin.

The Nix expression language and build environment abstractions have all kinds of facilities to make the configuration of dependencies convenient.

Integrating composer deployments into Nix builder environments

Invoking composer in a Nix builder environment introduces an additional challenge -- composer is not only a tool that does build management (e.g. it can execute script directives that can carry out arbitrary build steps), but also dependency management. The latter property conflicts with the Nix package manager.

In a Nix builder environment, network access is typically restricted, because it affects reproducibility (although it still possible to hack around this restriction) -- when downloading a file from an external site it is not known in advance what you will get. An unknown artifact influences the outcome of a package build in unpredictable ways.

Network access in Nix build environments is only permitted in so-called fixed output derivations. For a fixed output derivation, the output hash must be known in advance so that Nix can verify whether we have obtained the artifact we want.

The solution to cope with a conflicting dependency manager is by substituting it -- we must let Nix obtain the dependencies and force the tool to only execute its build management tasks.

We can populate the vendor/ folder ourselves. As explained earlier, the composer.lock file stores the exact versions of pinpointed dependencies including all transitive dependencies. For example, when a project declares svanderburg/pndp version 0.0.1 as a dependency, it may translate to the following entry in the composer.lock file:

"packages": [
    {
        "name": "svanderburg/pndp",
        "version": "v0.0.1",
        "source": {
            "type": "git",
            "url": "https://github.com/svanderburg/pndp.git",
            "reference": "99b0904e0f2efb35b8f012892912e0d171e9c2da"
        },
        "dist": {
            "type": "zip",
            "url": "https://api.github.com/repos/svanderburg/pndp/zipball/99b0904e0f2efb35b8f012892912e0d171e9c2da",
            "reference": "99b0904e0f2efb35b8f012892912e0d171e9c2da",
            "shasum": ""
        },
        ...
    }
    ...
]

As can be seen in the code fragment above, the dependency translates to two kinds of pinpointed source objects -- a source reference to a specific revision in a Git repository and a dist reference to a zipball containing a snapshot of the given Git revision.

The reason why every dependency translates to two kinds of objects is that composer supports two kinds of installation modes: source (to obtain a dependency directly from a VCS) and dist (to obtain a dependency from a zipball).

We can translate the 'dist' reference into the following Nix function invocation:

"svanderburg/pndp" = {
  targetDir = "";
  src = composerEnv.buildZipPackage {
    name = "svanderburg-pndp-99b0904e0f2efb35b8f012892912e0d171e9c2da";
    src = fetchurl {
      url = https://api.github.com/repos/svanderburg/pndp/zipball/99b0904e0f2efb35b8f012892912e0d171e9c2da;
      sha256 = "19l7i7adp76bjf32x9a2ykm0r5cgcmi4wf4cm4127miy3yhs0n4y";
    };
  };
};

and the 'source' reference to the following Nix function invocation:

"svanderburg/pndp" = {
  targetDir = "";
  src = fetchgit {
    name = "svanderburg-pndp-99b0904e0f2efb35b8f012892912e0d171e9c2da";
    url = "https://github.com/svanderburg/pndp.git";
    rev = "99b0904e0f2efb35b8f012892912e0d171e9c2da";
    sha256 = "15i311dc0123v3ppa69f49ssnlyzizaafzxxr50crdfrm8g6i4kh";
  };
};

(As a sidenote: we need the targetDir property to provide compatibility with the deprecated PSR-0 autoloading standard. Old autoload packages can be stored in a sub folder of a package residing in the vendor/ structure.)

To generate the above function invocations, we need more than just the properties provided by the composer.lock file. Since download functions in Nix are fixed output derivations, we must compute the output hashes of the downloads by invoking a Nix prefetch script, such as nix-prefetch-url or nix-prefetch-git. The composer2nix generator will automatically invoke the appropriate prefetch script to augment the generated expressions with output hashes.

To ensure maximum compatibility with composer's behaviour, the dependencies obtained by Nix must be copied into to the vendor/ folder. In theory, symlinking would be more space efficient, but experiments have shown that some packages (such as phpunit) may attempt to load the project's autoload script, e.g. by invoking:

require_once(realpath("../../autoload.php"));

The above require invocation does not work if the dependency is a symlink -- the require path resolves to a path in the Nix store (e.g. /nix/store/...). The parent's parent path corresponds to /nix where no autoload script is stored. (As a sidenote: I have decided to still provide symlinking as an option for deployment scenarios where this is not an issue).

After some experimentation, I discovered that composer uses the following file to track which packages have been installed: vendor/composer/installed.json. The contents appears to be quite similar to the composer.lock file:

[
    {
        "name": "svanderburg/pndp",
        "version": "v0.0.1",
        "version_normalized": "0.0.1.0",
        "source": {
            "type": "git",
            "url": "https://github.com/svanderburg/pndp.git",
            "reference": "99b0904e0f2efb35b8f012892912e0d171e9c2da"
        },
        "dist": {
            "type": "zip",
            "url": "https://api.github.com/repos/svanderburg/pndp/zipball/99b0904e0f2efb35b8f012892912e0d171e9c2da",
            "reference": "99b0904e0f2efb35b8f012892912e0d171e9c2da",
            "shasum": ""
        },
        ...
    },
    ...
]

Reconstructing the above file can be done by merging the contents of the packages and packages-dev objects in the composer.lock file.

Another missing piece in the puzzle is the autoload scripts. We can force composer to dump the autoload script, by running:

$ composer dump-autoload --optimize

The above command generates an optimized autoloader script. A non-optimized autoload script dynamically inspects the contents of the package folders to load modules. This is convenient in the development stage of a project, in which the files continuously change, but in production environments this introduces quite a bit of load time overhead.

Since packages in the Nix store can never change after they have been built, it makes no sense to generate a non-optimized autoloader script.

Finally, the last remaining practical issue, is PHP packages providing command-line utilities. Most executables have the following shebang line:

#!/usr/bin/env php

To ensure that these CLI tools work in Nix builder environments, the above shebang must be subsituted by the PHP executable that resides in the Nix store.

After carrying out the above described steps, running the following command:

$ composer install --optimize-autoloader

is simply just a formality -- it will not download or change anything.

Use cases

composer2nix has a variety of use cases. The most obvious one is to use it to package a web application project with Nix instead of composer. Running the following command generates Nix expressions from the composer configuration files:

$ composer2nix

By running the following command, we can use Nix to obtain the dependencies and generate a package with a vendor/ folder:

$ nix-build
$ ls result/
index.php  vendor/

In addition to web applications, we can also deploy command-line utility projects implemented in PHP. For these kinds of projects it make more sense generate a bin/ sub folder in which the executables can be found.

For example, for the composer2nix project, we can generate a CLI-specific expression by adding the --executable parameter:

$ composer2nix --executable

We can install the composer2nix executable in our Nix profile by running:

$ nix-env -f default.nix -i

and then invoke composer2nix as follows:

$ composer2nix --help

We can also deploy third party command-line utilities directly from the Packagist repository:

$ composer2nix -p phpunit/phpunit
$ nix-env -f default.nix -iA phpunit-phpunit
$ phpunit --version

The most powerful application is not the integration with Nix itself, but the integration with other Nix projects. For example, we can define a NixOS configuration running an Apache HTTP server instance with PHP and our example web application:

{pkgs, config, ...}:

let
  myexampleapp = import /home/sander/myexampleapp {
    inherit pkgs;
  };
in
{
  services.httpd = {
    enable = true;
    adminAddr = "admin@localhost";
    extraModules = [
      { name = "php7"; path = "${pkgs.php}/modules/libphp7.so"; }
    ];
    documentRoot = myexampleapp;
  };

  ...
}

We can deploy the above NixOS configuration as follows:

$ nixos-rebuild switch

By running only one simple command-line instruction, we have a running system with the Apache webserver serving our web application.

Discussion

In addition to composer2nix, I have also been responsible for developing node2nix, a tool that generates Nix expressions from NPM package configurations. Because composer is heavily inspired by NPM, we see many similarities in the architecture of both generators. For example, both generate the same kinds of expressions (a builder environment, a packages expression and a composition expression), have a similar separation of concerns, and both use an internal DSL for generating Nix expressions (NiJS and PNDP).

There are also a number of conceptual differences -- dependencies in NPM can be private to a package or shared among multiple packages. In composer, all dependencies in a project are shared.

The reason why NPM's dependency management is more powerful is because Node.js uses the CommonJS module system. CommonJS considers each file to be a unique module. This, for example, makes it possible for one module to load a version of a package from a certain filesystem location and another version of the same package from another filesystem location within the same project.

By contrast, in PHP, isolation is accomplished by the namespace declarations in each file. Namespaces can not be dynamically altered so that multiple versions can safely coexist in one project. Furthermore, the vendor/ directory structure makes it possible to store only one variant of a package.

Despite the fact that composer's dependency management is less powerful makes constructing a generator much more straightforward compared to NPM.

Another feature that composer supports for quite some time, and NPM until very recently is pinpointing/locking dependencies. When generating Nix expressions from NPM package configurations, we must replicate NPM's dependency resolving algorithm. In composer, we can simply take whatever the composer.lock file provides. The lock file saves us from replicating the dependency lookup process making the generation process considerably easier.

Acknowledgments

My implementation is not the first attempt that tries to integrate composer with Nix. After a few days of developing, I discovered another attempt that seems to be in a very early development stage. I did not try or use this version.

Availability

composer2nix can be obtained from Packagist and my GitHub page.

Despite the fact that I did quite a bit of research, composer2nix should still be considered a prototype. One of its known limitations is that it does not support fossil repositories yet.

Monday, September 11, 2017

PNDP: An internal DSL for Nix in PHP

It has been a while since I wrote a Nix-related blog post. In many of my earlier Nix blog posts, I have elaborated about various Nix applications and their benefits.

However, when you are developing a product or service, you typically do not only want to use configuration management tools, such as Nix -- you may also want to build a platform that is tailored towards your needs, so that common operations can be executed structurally and conveniently.

When it is desired to integrate custom solutions with Nix-related tools, you basically have one recurring challenge -- you must generate deployment specifications in the Nix expression language.

The most obvious solution is to use string manipulation to generate the expressions we want, but this has a number of disadvantages. Foremost, composing strings is not a very intuitive activity -- it is not always obvious to see what the end result would be by looking at the code.

Furthermore, it is difficult to ensure that a generated expression is correct and safe. For example, if a string value is not properly escaped, it may be possible to inject arbitrary deployment code putting the security of the deployed system at risk.

For these reasons, I have developed NiJS: an internal DSL for JavaScript, a couple of years ago to make integration with JavaScript-based applications more convenient. Most notably, NiJS is used by node2nix to generate Nix expressions from NPM package deployment specifications.

I have been doing PHP development in the last couple of weeks and realized that I needed a similar solution for this language. In this blog post, I will describe PNDP, an internal DSL for Nix in PHP, and show how it can be used.

Composing Nix packages in PHP

The Nix packages repository follows a specific convention for organizing packages -- every package is a Nix expression file containing a function definition describing how to build a package from source code and its build-time dependencies.

A top-level composition expression file provides all the function invocations that build variants of packages (typically only one per package) by providing the desired versions of the build-time dependencies as function parameters.

Every package definition typically invokes stdenv.mkDerivation {} (or abstractions built around it) that composes a dedicated build environment in which only the specified dependencies can be found and other kinds of precautions are taken to improve build reproducibility. In this builder environment, we can execute many kinds of build steps, such as running GNU Make, CMake, or Apache Ant.

In our internal DSL in PHP we can replicate these conventions using PHP language constructs. We can compose a proxy to the stdenv.mkDerivation {} invocation in PHP by writing the following class:

namespace Pkgs;
use PNDP\AST\NixFunInvocation;
use PNDP\AST\NixExpression;

class Stdenv
{
    public function mkDerivation($args)
    {
        return new NixFunInvocation(new NixExpression("pkgs.stdenv.mkDerivation"), $args);
    }
}

In the above code fragment, we define a class named: Stdenv exposing a method named mkDerivation. The method composes an abstract syntax tree for a function invocation to stdenv.mkDerivation {} using an arbitrary PHP object of any type as a parameter.

With the proxy shown above, we can create our own in packages in PHP by providing a function definition that specifies how a package can be built from source code and its build-time dependencies:

namespace Pkgs;
use PNDP\AST\NixURL;

class Hello
{
    public static function composePackage($args)
    {
        return $args->stdenv->mkDerivation(array(
            "name" => "hello-2.10",

            "src" => $args->fetchurl(array(
                "url" => new NixURL("mirror://gnu/hello/hello-2.10.tar.gz"),
                "sha256" => "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"
            )),

            "doCheck" => true,

            "meta" => array(
                "description" => "A program that produces a familiar, friendly greeting",
                "homepage" => new NixURL("http://www.gnu.org/software/hello/manual"),
                "license" => "GPLv3+"
            )
        ));
    }
}

The above code fragment defines a class named 'Hello' exposing one static method named: composePackage(). The composePackage method invokes the stdenv.mkDerivation {} proxy (shown earlier) to build GNU Hello from source code.

In addition to constructing a package, the above code fragment also follows the PHP conventions for modularization -- in PHP it is a common practice to modularize code chunks into classes that reside in their own namespace. For example, by following these conventions, we can also automatically load our package classes by using an autoloading implementation that follows the PSR-4 recommendation.

We can create compositions of packages as follows:

class Pkgs
{
    public $stdenv;

    public function __construct()
    {
        $this->stdenv = new Pkgs\Stdenv();
    }

    public function fetchurl($args)
    {
        return Pkgs\Fetchurl::composePackage($this, $args);
    }

    public function hello()
    {
        return Pkgs\Hello::composePackage($this);
    }
}

As with the previous example, the composition example is a class. In this case, it exposes variants of packages by calling the functions with their required function arguments. In the above example, there is only one variant of the GNU Hello package. As a result, it suffices to just propagate the object itself as build parameters.

Contrary to the Nix expression language, we must expose each package composition as a method -- the Nix expression language is a lazy language that only invokes functions when their results are needed, PHP is an eager language that will evaluate them at construction time.

An implication of using eager evaluation is that opening the composition module, triggers all packages to be built. By wrapping the compositions into methods, we can make sure that only the requested packages are evaluated when needed.

Another practical implication of creating methods for each package composition is that it can become quite tedious if we have many of them. PHP offers a magic method named: __call() that gets invoked when we invoke a method that does not exists. We can use this magic method to automatically compose a package based on the method name:

public function __call($name, $arguments)
{
    // Compose the classname from the function name
    $className = ucfirst($name);
    // Compose the name of the method to compose the package
    $methodName = 'Pkgs\\'.$className.'::composePackage';
    // Prepend $this so that it becomes the first function parameter
    array_unshift($arguments, $this);
    // Dynamically the invoke the class' composition method with $this as first parameter and the remaining parameters
    return call_user_func_array($methodName, $arguments);
}

The above method takes the (non-existent) method name, converts it into the corresponding class name (by using the camel case naming convention), invokes the package's composition method using the composition object itself as a first parameter, and any other method parameters as successive parameters.

Converting PHP language constructs into Nix language constructs

Everything that PNDP does boils down to the phpToNix() function that automatically converts most PHP language constructs into semantically equivalent or similar Nix language constructs. For example, the following PHP language constructs are converted to Nix as follows:

A variable of type boolean, integer or double are converted verbatim.
A string will be converted into a string in the Nix expression language, and conflicting characters, such as the backslash and double quote, will be escaped.
In PHP, arrays can be sequential (when all elements have numeric keys that appear in numeric order) or associative in the remainder of the cases. The generator tries to detect what kind of array we have. It recursively converts sequential arrays into Nix lists of Nix language elements, and associative arrays into Nix attribute sets.
An object that is an instance of a class, will be converted into a Nix attribute set exposing its public properties.
A NULL reference gets converted into a Nix null value.
Variables that have an unknown type or are a resource will throw an exception.

As with NiJS (and JavaScript), the PHP host language does not provide equivalents for all Nix language constructs, such as values of the URL type, or encoding Nix function definitions.

You can still generate these objects by composing an abstract syntax from objects that are instances of the NixObject class. For example, when composing a NixURL object, we can generate a value of the URL type in the Nix expression language.

Arrays are a bit confusing in PHP, because you do not always know in advance whether it would yield a list or attribute set. To make these conversions explicit and prevent generation errors, they can be wrapped inside a NixList or NixAttrSet object.

Building packages programmatically

The PNDPBuild::callNixBuild() function can be used to build a generated Nix expression, such as the GNU Hello example shown earlier:

/* Evaluate the package */
$expr = PNDPBuild::evaluatePackage("Pkgs.php", "hello", false);

/* Call nix-build */
PNDPBuild::callNixBuild($expr, array());

In the code fragment above, we open the composition class file, named: Pkgs.php and we evaluate the hello() method to generate the Nix expression. Finally, we call the callNixBuild() function, in which we evaluate the generated expression by the Nix package manager. When the build succeeds, the resulting Nix store path is printed on the standard output.

Building packages from the command-line

As the previous code example is so common, there is also a command-line utility that can execute the same task. The following instruction builds the GNU Hello package from the composition class (Pkgs.php):

$ pndp-build -f Pkgs.php -A hello

It may also be useful to see what kind of Nix expression is generated for debugging or testing purposes. The --eval-only option prints the generated Nix expression on the standard output:

$ pndp-build -f Pkgs.js -A hello --eval-only

We can also nicely format the generated expression to improve readability:

$ pndp-build -f Pkgs.js -A hello --eval-only --format

Discussion

In this blog post, I have described PNDP: an internal DSL for Nix in PHP.

PNDP is not the first internal DSL I have developed for Nix. A couple of years ago, I also wrote NiJS: an internal DSL in JavaScript. PNDP shares a lot of concepts and implementation details with NiJS.

Contrary to NiJS, the functionality of PNDP is much more limited -- I have developed PNDP mainly for code generation purposes. In NiJS, I have also been exploring the abilities of the JavaScript language, such as exposing JavaScript functions in the Nix expression language, and the possibilities of an internal DSL, such as creating an interpreter that makes it a primitive standalone package manager. In PNDP, all this extra functionality is missing, since I have no practical need for them.

In a future blog post, I will describe an application that uses PNDP as a generator.

Availability

PNDP can obtained from Packagist as well as my GitHub page. It can be used under the terms and conditions of the MIT license.