For example, if we want to build a package, such as the trivial GNU Hello package, we can write the following expression:
with import <nixpkgs> {}; stdenv.mkDerivation { name = "hello-2.10"; src = fetchurl { url = mirror://gnu/hello/hello-2.10.tar.gz; sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"; }; meta = { description = "A program that produces a familiar, friendly greeting"; longDescription = '' GNU Hello is a program that prints "Hello, world!" when you run it. It is fully customizable. ''; homepage = http://www.gnu.org/software/hello/manual/; license = "GPLv3+"; }; }
and build it with the Nix package manager as follows:
$ nix-build /nix/store/188avy0j39h7iiw3y7fazgh7wk43diz1-hello-2.10
The above code fragment does probably not look too complicated and is quite easy to repeat for build other kinds of GNU Autotools/GNU Make-based packages. However, stdenv.mkDerivation {} is a big/complex function abstraction that has many responsibilities.
Its most important responsibility is to compose a so-called pure build environments, in which various restrictions are imposed on the build scripts to provide better guarantees that builds are pure (meaning: that they always produce the same (nearly) bit-identical result if the dependencies are the same), such as:
- Build scripts can only write to designated output directories and temp directories. They are restricted from writing to any other file system location.
- All environment variables are cleared and some of them are set to default or dummy values, such as search path environment variables (e.g. PATH).
- All build results are made immutable by removing the write permission bits and their timestamps are reset to one second after the epoch.
- Running builds as unprivileged users.
- Optionally, builds run in a chroot environment and use namespaces to restrict access to the host filesystem and the network as much as possible.
In addition to purity, the stdenv.mkDerivation {} function has many additional responsibilities. For example, it also implements a generic builder that is clever enough to build a GNU Autotools/GNU Make project without specifying any build instructions.
For example, the above Nix expression for GNU Hello does not specify any build instructions. The generic builder automatically unpacks the tarball, opens the resulting directory and invokes ./configure --prefix=$out; make; make install with the appropriate parameters.
Because stdenv.mkDerivation {} has many responsibilities and nearly all packages in Nixpkgs depend on it, its implementation is very complex (e.g. thousands of lines of code) and hard to change.
As a personal exercise, I have developed a function abstraction with similar functionality from scratch. My implementation can be decomposed into layers in which every abstraction layer gradually adds additional responsibilities.
Writing "raw" derivations
stdenv.mkDerivation is a function abstraction, not a feature of the Nix expression language. To compose "pure" build environments, stdenv.mkDerivation invokes a Nix expression language construct -- the derivation {} builtin.
(As a sidenote: derivation is strictly speaking not a builtin, but an abstraction built around the derivationStrict builtin, but this is something internal to the Nix package manager. It does not matter for the scope of this blog post).
Despite the fact that this low level function is not commonly used, it is also possible to directly invoke it and compose low-level "raw" derivations to build packages. For example, we can write the following Nix expression (default.nix):
derivation { name = "test"; builder = ./test.sh; system = "x86_64-linux"; person = "Sander"; }
The above expression invokes the derivation builtin function that composes a "pure" build environment:
- The name attribute specifies the name of the package, that should appear in the resulting Nix store path.
- The builder attribute specifies that the test.sh executable should be run inside the pure build environment.
- The system attribute is used to tell Nix that this build should be carried out for x86-64 Linux systems. When Nix is unable to build the package for the requested system architecture, it can also delegate a build to a remote machine that is capable.
- All attributes (including the attributes described earlier) are converted to environment variables (e.g. strings, numbers and URLs are converted to strings and the boolean value: 'true' is converted to '1') and can be used by the builder process for a variety of reasons.
We can implement the builder process (the test.sh build script) as follows:
#!/bin/sh -e echo "Hello $person" > $out
The above script generates a greeting message for the provided person (exposed as an environment variable by Nix) and writes it to the Nix store (the output path is provided by the out environment variable).
We can evaluate the Nix expression (and generate the output file with the Hello greeting) by running:
$ nix-build /nix/store/7j4y5d8rx1vah5v64bpqd5dskhwx5105-test $ cat result Hello Sander
The return value of the derivation {} function is a bit confusing. At first sight, it appears to be a string corresponding to the output path in the Nix store. However, some investigation with the nix repl tool reveals that it is much more than that:
$ nix repl Welcome to Nix version 2.0.4. Type :? for help.
when importing the derivation:
nix-repl> test = import ./default.nix
and describing the result:
nix-repl> :t test a set
we will see that the result is actually an attribute set, not a string. By requesting the attribute names, we will see the following attributes:
nix-repl> builtins.attrNames test [ "all" "builder" "drvAttrs" "drvPath" "name" "out" "outPath" "outputName" "person" "system" "type" ]
It appears that the resulting attribute set has the same attributes as the parameters that we passed to derivation, augmented by the following additional attributes:
- The type attribute that refers to the string: "derivation".
- The drvAttrs attribute refers to an attribute set containing the original parameters passed to derivation {}.
- drvPath and outPath refer to the Nix store paths of the store derivation file and output of the build. A side effect of requesting these members is that the expression gets evaluated or built.
- The out attribute is a reference to the derivation producing the out result, all is a list of derivations of all outputs produced (Nix derivations can also produce multiple output paths in the Nix store).
- In case there are multiple outputs, the outputName determines the name of the output path that is the default.
Providing basic dependencies
Although we can use the low-level derivation {} function to produce a very simple output file in the Nix store, it is not very useful on its own.
One important limitation is that we only have a (Bourne-compatible) shell (/bin/sh), but no other packages in the "pure" build environment. Nix prevents unspecified dependencies from being found to make builds more pure.
Since a pure build environment is almost entirely empty (with the exception of the shell), the amount of things we can do in an environment created by derivation {} is very limited -- most of the commands that build scripts run are provided by executables belonging to external packages, e.g. commands such as cat, ls (GNU Coreutils), grep (GNU Grep) or make (GNU Make) and should be added to the PATH search environment variable in the build environment.
We may also want to configure additional environment variables to make builds more pure -- for example, on Linux systems, we want to set the TZ (timezone) environment variable to UTC to prevent error messages, such as: "Local time zone must be set--see zic manual page".
To make the execution of more complex build scripts more convenient, we can create a setup script that we can include in a every build script that adds basic utilities to the PATH search environment variable, configures these additional environment variables, and sets the SHELL environment variable to the bash shell residing in the Nix store. We can create a package named: stdenv that provides a setup script to accomplish this:
{bash, basePackages, system}: let shell = "${bash}/bin/sh"; in derivation { name = "stdenv"; inherit shell basePackages system; builder = shell; args = [ "-e" ./builder.sh ]; }
The builder script of the stdenv package can be implemented as follows:
set -e # Setup PATH for base packages for i in $basePackages do basePackagesPath="$basePackagesPath${basePackagesPath:+:}$i/bin" done export PATH="$basePackagesPath" # Create setup script mkdir $out cat > $out/setup <<EOF export SHELL=$shell export PATH="$basePackagesPath" EOF # Allow the user to install stdenv using nix-env and get the packages # in stdenv. mkdir $out/nix-support echo "$basePackages" > $out/nix-support/propagated-user-env-packages
The above script adds all base packages (GNU Coreutils, Findutils, Diffutils, sed, grep, gawk and bash) to the PATH of builder and creates a script in $out/setup that exports the PATH environment variable and the location to the bash shell.
We can use the stdenv (providing this setup script) as a dependency for building a package, such as:
{stdenv}: derivation { name = "hello"; inherit stdenv; builder = ./builder.sh; system = "x86_64-linux"; }
In the corresponding builder script, we include the setup script in the first line and we, for example, invoke various external commands to generate a shell script that says: "Hello world!":
#!/bin/sh -e source $stdenv/setup mkdir -p $out/bin cat > $out/bin/hello <<EOF #!$SHELL -e echo "Hello" EOF chmod +x $out/bin/hello
The above script works because the setup script adds GNU Coreutils (that includes cat, mkdir and chmod) to the PATH of the builder.
Writing more simple derivations
Using a setup script makes writing build scripts somewhat practical, but there are still a number inconveniences we have to cope with.
The first inconvenience is the system parameter -- in most cases, we want to build a package for the same architecture as the host system's architecture and preferably we want the same architecture for all other packages that we intend to deploy.
Another issue is the shell. /bin/sh is, in a sandbox-enabled Nix installations, a minimal Bourne-compatible shell provided by Busybox, or a reference to the host system's shell in non-sandboxed installations. The latter case could be considered an impurity, because we do not know what kind of shell (e.g. bash, dash, ash ?) or version of a shell we are using (e.g. 3.2.57, 4.3.30 ?). Ideally, we want to use a shell that is provided as a Nix package in the Nix store, because that version is pure.
(As a sidenote: in Nixpkgs, we use the bash shell to run build commands, but this is not a strict requirement. For example, GNU Guix (a package manager that uses several components of the Nix package manager) uses both Guile as a host and guest language. In theory, we could also launch a different kind of interpreter than bash).
The third issue is the meta parameter -- for every package, it is possible to specify meta-data, such as a description, license and homepage reference as an attribute set. Unfortunately, attribute sets cannot be converted to environment variables. To deal with this problem, the meta attribute needs to be removed before we invoke derivation {} and be readded to the return attribute set. (IMO I believe this ideally should be something the Nix package manager could solve by itself).
We can hide all these inconveniences by creating a simple abstraction function that I will call: stdenv.simpleDerivation that can be implemented as follows:
{stdenv, system, shell}: {builder, ...}@args: let extraArgs = removeAttrs args [ "builder" "meta" ]; buildResult = derivation ({ inherit system stdenv; builder = shell; # Make bash the default builder args = [ "-e" builder ]; # Pass builder executable as parameter to bash setupSimpleDerivation = ./setup.sh; } // extraArgs); in buildResult // # Readd the meta attribute to the resulting attribute set (if args ? meta then { inherit (args) meta; } else {})
The above Nix expression basically removes the meta argument, then invokes the derivation {} function, sets the system parameter, uses bash as builder and passes the builder executable as an argument to bash. After building the package, the meta attribute gets readded to the result.
With this abstraction, we can reduce the complexity of the previously shown Nix expression to something very simple:
{stdenv}: stdenv.simpleDerivation { name = "hello"; builder = ./builder.sh; meta = { description = "This is a simple testcase"; }; }
The function abstraction is also sophisticated enough to build something more complex, such as GNU Hello. We can write the following Nix expression that passes all dependencies that it requires as function parameters:
{stdenv, fetchurl, gnumake, gnutar, gzip, gcc, binutils}: stdenv.simpleDerivation { name = "hello-2.10"; src = fetchurl { url = mirror://gnu/hello/hello-2.10.tar.gz; sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"; }; inherit stdenv gnumake gnutar gzip gcc binutils; builder = ./builder.sh; }
We can use the following builder script to build GNU Hello:
source $setupSimpleDerivation export PATH=$PATH:$gnumake/bin:$gnutar/bin:$gzip/bin:$gcc/bin:$binutils/bin tar xfv $src cd hello-2.10 ./configure --prefix=$out make make install
The above script imports a setup script configuring basic dependencies, then extends the PATH environment variable with additional dependencies, and then executes the commands to build GNU Hello -- unpacking the tarball, running the configure script, building the project, and installing the package.
The run command abstraction
We can still improve a bit upon the function abstraction shown previously -- one particular inconvenience that remains is that you have to write two files to get a package built -- a Nix expression that composes the build environment and a builder script that carries out the build steps.
Another repetitive task is configuring search path environment variables (e.g. PATH, PYTHONPATH, CLASSPATH etc.) to point to the appropriate directories in the Nix store. As may be noticed by looking at the code of the previous builder script, this process is tedious.
To address these inconveniences, I have created another abstraction function called: stdenv.runCommand that extends the previous abstraction function -- when no builder parameter has been provided, this function executes a generic builder that will evaluate the buildCommand environment variable containing a string with shell commands to execute. This feature allows us to rewrite the first example (that generates a shell script) to one file:
{stdenv}: stdenv.runCommand { name = "hello"; buildCommand = '' mkdir -p $out/bin cat > $out/bin/hello <<EOF #! ${stdenv.shell} -e echo "Test" EOF chmod +x $out/bin/hello ''; }
Another feature of the stdenv.runCommand abstraction is to provide a generic mechanism to configure build-time dependencies -- all build-time dependencies that a package needs can be provided as a list of buildInputs. The generic builder carries out all necessary build steps to make them available. For example, when a package provides a bin/ sub folder, then it will be automatically added to the PATH environment variable.
Every package can bundle a setup-hook.sh script that modifies the build environment so that it knows how dependencies for this package can be configured. For example, the following partial expression represents the Perl package that bundles a setup script:
{stdenv, ...}: stdenv.mkDerivation { name = "perl"; ... setupHook = ./setup-hook.sh }
The setup hook can automatically configure the PERL5LIB search path environment variable for all packages that provide Perl modules:
addPerlLibPath() { addToSearchPath PERL5LIB $1/lib/perl5/site_perl } envHooks+=(addPerlLibPath)
When we add perl as a build input to a package, then its setup hook configures the generic builder in such a way that the PERL5LIB environment variable is automatically configured when we provide a Perl module as a build input.
We can also more conveniently build GNU Hello, by using the buildInputs parameter:
{stdenv, fetchurl, gnumake, gnutar, gzip, gcc, binutils}: stdenv.runCommand { name = "hello-2.10"; src = fetchurl { url = mirror://gnu/hello/hello-2.10.tar.gz; sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"; }; buildInputs = [ gnumake gnutar gzip gcc binutils ]; buildCommand = '' tar xfv $src cd hello-2.10 ./configure --prefix=$out make make install ''; }
Compared to the previous GNU Hello example, this Nix expression is much simpler and more intuitive to write.
The run phases abstraction
We can improve the ease of use for build processes even further. GNU Hello, and many other GNU packages and other system software used for Linux are GNU Autotools/GNU Make based and follow similar conventions including the build commands you need to carry out. Likewise, many other software projects use standardized build tools that follow conventions.
As a result, when you have to maintain a collection of packages, you probably end up writing the same kinds of build instructions over and over again.
To alleviate this problem, I have created another abstraction layer, named: stdenv.runPhases making it possible to define and execute phases in a specific order. Every phase has a pre and post hook (a script that executes before and after each phase) and can be disabled or reenabled with a do* or dont* flag.
With this abstraction function, we can divide builds into phases, such as:
{stdenv}: stdenv.runPhases { name = "hello"; phases = [ "build" "install" ]; buildPhase = '' cat > hello <<EOF #! ${stdenv.shell} -e echo "Hello" EOF chmod +x hello ''; installPhase = '' mkdir -p $out/bin mv hello $out/bin ''; }
The above Nix expression executes a build and install phase. In the build phase, we construct a script that echoes "Hello", and in the install phase we move the script into the Nix store and we make it executable.
In addition to environment variables, it is also possible to define the phases in a setup script as shell functions. For example, we can also use a builder script:
{stdenv}: stdenv.runPhases { name = "hello2"; builder = ./builder.sh; }
and define the phases in the builder script:
source $setupRunPhases phases="build install" buildPhase() { cat > hello <<EOF #! $SHELL -e echo "Hello" EOF chmod +x hello } installPhase() { mkdir -p $out/bin mv hello $out/bin } genericBuild
Another feature of this abstraction is that we can also define exitHook and failureHook parameters that will be executed if the builder succeeds or fails.
In the next sections, I will show abstractions built on top of stdenv.runPhases that can be used to hide implementation details of common build procedures.
The generic build abstraction
For many build procedures, we need to carry out the same build steps, such as: unpacking the source archives, applying patches, and stripping debug symbols from the resulting ELF executables.
I have created another build function abstraction named: stdenv.genericBuild that implements a number of common build phases:
- The unpack phase generically unpacks the provided sources, makes it content writable and opens the source directory. The unpack command is determined by the unpack hook that each potential unpacker provides -- for example, the GNU tar package includes a setup hook that untars the file if it looks like a tarball or compressed tarball:
_tryUntar() { case "$1" in *.tar|*.tar.gz|*.tar.bz2|*.tar.lzma|*.tar.xz) tar xfv "$1" ;; *) return 1 ;; esac } unpackHooks+=(_tryUntar)
- The patch phase applies any patch that is provided by the patches parameter uncompressing them when necessary. The uncompress file operation also works with setup hooks -- uncompressor packages (such as gzip and bzip2) provide a setup hook that uncompresses the file if it is of the right filetype.
- The strip phase processes all sub directories containing ELF binaries (e.g. bin/ and lib/) and strips their debugging symbols. This reduces the size of the binaries and removes non-deterministic timestamps.
- The patchShebangs phase processes all scripts with a shebang line and changes it to correspond to a path in the Nix store.
- The compressManPages phase compresses all manual pages with gzip.
We can also add GNU patch as as base package for this abstraction function, since it is required to execute the patch phase. As a result, it does not need to be specified as a build dependency for each package.
This function abstraction alone is not very useful, but it captures all common aspects that most build tools use, such as GNU Make, CMake or SCons projects.
I can reduce the size of the previously shown GNU Hello example Nix expression to the following:
{stdenv, fetchurl, gnumake, gnutar, gzip, gcc, binutils}: stdenv.genericBuild { name = "hello-2.10"; src = fetchurl { url = mirror://gnu/hello/hello-2.10.tar.gz; sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"; }; buildInputs = [ gnumake gnutar gzip gcc binutils ]; buildCommandPhase = '' ./configure --prefix=$out make make install ''; }
In the above expression, I no longer have to specify how to unpack the download GNU Hello source tarball.
GNU Make/GNU Autotools abstraction
We can extend the previous function abstraction even further with phases that automate a complete GNU Make/GNU Autotools based workflow. This abstraction is what we can call stdenv.mkDerivation and is comparable in terms of features with the implementation in Nixpkgs.
We can adjust the phases to include a configure, build, check and install phase. The configure phase checks whether a configure script exists and executes it. The build, check and install phases will execute: make, make check and make install with appropriate parameters.
We can also add common packages that we need to build these projects as base packages so that they no longer have to be provided as a build input: GNU Tar, gzip, bzip2, xz, GNU Make, Binutils and GCC.
With these additional phases and base packages, we can reduce the GNU Hello example to the following expression:
{stdenv, fetchurl}: stdenv.mkDerivation { name = "hello-2.10"; src = fetchurl { url = mirror://gnu/hello/hello-2.10.tar.gz; sha256 = "0ssi1wpaf7plaswqqjwigppsg5fyh99vdlb9kzl7c9lng89ndq1i"; }; }
The above Nix expression does not contain any installation instructions -- the generic builder is able to figure out all steps on its own.
Composing custom function abstractions
I have shown several build abstraction layers implementing most features that are in the Nixpkgs version of stdenv.mkDerivation. Aside from clarity, another objective of splitting this function in layers is to make the composition of custom build abstractions more convenient.
For example, we can implement the trivial builder named: writeText whose only responsibility is to write a text file into the Nix store, by extending stdenv.runCommand. This abstraction suffices because writeText does not require any build tools, such as GNU Make and GCC, and it also does not need any generic build procedure executing phases:
{stdenv}: { name # the name of the derivation , text , executable ? false # run chmod +x ? , destination ? "" # relative path appended to $out eg "/bin/foo" , checkPhase ? "" # syntax checks, e.g. for scripts }: stdenv.runCommand { inherit name text executable; passAsFile = [ "text" ]; # Pointless to do this on a remote machine. preferLocalBuild = true; allowSubstitutes = false; buildCommand = '' target=$out${destination} mkdir -p "$(dirname "$target")" if [ -e "$textPath" ] then mv "$textPath" "$target" else echo -n "$text" > "$target" fi [ "$executable" = "1" ] && chmod +x "$target" || true ''; }
We can also make a builder for Perl packages, by extending: stdenv.mkDerivation -- Perl packages also use GNU Make as a build system. Its only difference is the configuration step -- it runs Perl's MakeMaker script to generate the Makefile. We can simply replace the configuration phase for GNU Autotools by an implementation that invokes MakeMaker.
When developing custom abstractions, I basically follow this pattern:
{stdenv, foo, bar}: {name, buildInputs ? [], ...}@args: let extraArgs = removeAttrs args [ "name" "buildInputs" ]; in stdenv.someBuildFunction ({ name = "mypackage-"+name; buildInputs = [ foo bar ] ++ buildInputs; } // extraArgs)
- A build function is a nested function in which the first line is a function header that captures the common build-time dependencies required to build a package. For example, when we want to build Perl packages, then perl is such a common dependency.
- The second line is the inner function header that captures the parameters that should be passed to the build function. The notation allows an arbitrary number of parameters. The parameters in the { } block (name, buildInputs) are considered to have a specific use in the body of the function. The remainder of parameters are non-essential -- they are used as environment variables in the builder environment or they can be propagated to other functions.
- We compose an extraArgs variable that contains all non-essential arguments that we can propagate to the build function. Basically, all function arguments that are used in the body need to be removed and function arguments that are attribute sets, because they cannot be converted to strings.
- In the body of the function, we set up important aspects of the build environment, such as the mandatory build parameters, and we propagate the remaining function arguments to the builder abstraction function.
Following this pattern also ensures that the builder is flexible enough to be extended and modified. For example, by extending a function that is based on stdenv.runPhases the builder can be extended with custom phases and build hooks.
Discussion
In this blog post, I have derived my own reimplementation of Nixpkgs's stdenv.mkDerivation function that consists of the following layers each gradually adding functionality to the "raw" derivation {} builtin:
- "Raw" derivations
- The setup script ($stdenv/setup)
- Simple derivation (stdenv.simpleDerivation)
- The run command abstraction (stdenv.runCommand)
- The run phases abstraction (stdenv.runPhases)
- The generic build abstraction (stdenv.genericBuild)
- The GNU Make/GNU Autotools abstraction (stdenv.mkDerivation)
The features that the resulting stdenv.mkDerivation provides are very similar to the Nixpkgs version, but not entirely identical. Most notably, cross compiling support is completely absent.
From the experience, I have a number of improvement suggestions that we may want to implement in Nixpkgs version to improve the quality and clarity of the generic builder infrastructure:
- We could also split the implementation of stdenv.mkDerivation and the corresponding setup.sh script into layered sub functions. Currently, the setup.sh script is huge (e.g. over 1200 LOC) and has many responsibilities (perhaps too many). By splitting the build abstraction functions and their corresponding setup scripts, we can separate concerns better and reduce the size of the script so that it becomes more readable and better maintainable.
- In the Nixpkgs implementation, the phases that the generic builder executes are built for GNU Make/GNU Autotools specifically. Furthermore, the invocation of pre and post hooks and do and dont flags are all hand coded for every phase (there is no generic mechanism that deals with them). As a result, when you define a new custom phase, you need to reimplement the same aspects over and over again. In my implementation, you only have to define phases -- the generic builder automatically executes the coresponding pre and post hooks and evaluates the do and dont flags.
- In the Nixpkgs implementation there is no uncompressHook -- as a result, the decompression of patch files is completely handcoded for every uncompressor, e.g. gzip, bzip2, xz etc. In my implementation, we can delegate this responsibility to any potential uncompressor package.
- In my implementation, I turned some of the phases of the generic builder into command-line tools that can be invoked outside the build environment (e.g. patch-shebangs, compress-man). This makes it easier to experiment with these tools and to make adjustments.
The biggest benefit of having separated concerns is flexibility when composing custom abstractions -- for example, the writeText function in Nixpkgs is built on top of stdenv.mkDerivation that includes GNU Make and GCC as dependencies, but does not depend on it. As a result, when one of these packages get updated all generated text files need to be updated as well, while there is no real dependency on it. When using a more minimalistic function, such as stdenv.runCommand this problem will go away.
Availability
I have created a new GitHub repository called: nix-lowlevel-experiments. It contains the implementation of all function abstractions described in this blog post, including some test cases that demonstrate how these functions can be used.
In the future, I will probably experiment with other low level Nix concepts and add them to this repository as well.
Thank you for this enjoyable article, Sander!
ReplyDeleteGot to understand some ideas/principles that are not so obvious when just looking at the code.
Great article! Thanks!
ReplyDelete