Monday, May 27, 2019

A Nix-friendly XML-based data exchange library for the Disnix toolset

In the last few months, I have been intensively working on a variety of internal improvements to the Disnix toolset.

One of the more increasingly complex and tedious aspects in the Disnix toolset is data exchange -- Disnix implements declarative deployment in the sense that it takes three specifications written in the Nix expression language as inputs: a services model that specifies the deployable units, their properties and how they depend on each other, an infrastructure model specifies the available target machines and their properties, and a distribution model that specifies the mapping between services in the services models and target machines in the infrastructure model.

From these three declarative models, Disnix derives all the activities that need to be carried out to get a system in a running state: building services from source, distributing services (and their dependencies) to target machines, activating the services, and (optionally) restoring state snapshots.

Using the Nix expression language for these input models is useful for a variety of reasons:

  • We can use the Nix package manager's build infrastructure to reliably build services from source code, including all their required dependencies, and store them in isolation in the Nix store. The Nix store ensures that multiple variants and versions can co-exist and that we can always roll back to previous versions.
  • Because the Nix expression language is a purely functional domain-specific language (that in addition to data structures supports functions), we can make all required configuration parameters (such as the dependencies of the services that we intend to deploy) explicit by using functions so that we know that all mandatory settings have been specified.

Although the Nix expression language is a first class-citizen concept for tasks carried out by the Nix package manager, we also want to use the same specifications to instruct tools that carry out activities that Nix does not implement, such as the tools that activate the services and restore state snapshots.

The Nix expression language is not designed to be consumed by other tools than Nix (as a sidenote: despite this limitation, it is still somewhat possible to use the Nix expression language independently of the package manager in experimental setups, such as this online tutorial, but the libexpr component of Nix does not have a stable interface or commitment to make the language portable across tools).

As a solution, I convert objects in the Nix expression language to XML, so that they can be consumed by any of the tools that implement the activities that Nix does not support.

Although this may sound conceptually straight forward, the amount of data that needs to be converted, and code that needs to be written to parse that data is growing bigger and more complex, and becomes increasingly harder to adjust and maintain.

To cope with this growing complexity, I have standardized a collection of Nix-XML conversion patterns, and wrote a library named: libnixxml that can be used to make data interchange in both directions more convenient.

Converting objects in the Nix expression language to XML


The Nix expression language supports a variety of language integrations. For example, it can export Nix objects to XML and JSON, and import from JSON and TOML data.

The following Nix attribute set:
{
  message = "This is a test";
  tags = [ "test" "example" ];
}

can be converted to XML (with the builtins.toXML primop) or by running:

$ nix-instantiate --eval-only --xml --strict example.nix

resulting in the following XML data:

<?xml version='1.0' encoding='utf-8'?>
<expr>
  <attrs>
    <attr column="3" line="2" name="message" path="/home/sander/example.nix">
      <string value="This is a test" />
    </attr>
    <attr column="3" line="3" name="tags" path="/home/sander/example.nix">
      <list>
        <string value="test" />
        <string value="example" />
      </list>
    </attr>
  </attrs>
</expr>

Although the above XML code fragment is valid XML, it is basically also just a literal translation of the underlying abstract syntax tree (AST) to XML.

An AST dump is not always very practical for consumption by an external application -- it is not very "readable", contains data that we do not always need (e.g. line and column data), and imposes (due to the structure) additional complexity on a program to parse the XML data to a domain model. As a result, exported XML data almost always needs to be converted to an XML format that is more practical for consumption.

For all the input models that Disnix consumes, I was originally handwriting XSL stylesheets converting the XML data to a format that can be more easily consumed and handwriting all the parsing code. Eventually, I derived a number of standard patterns.

For example, a more practical XML representation of the earlier shown Nix expression could be:

<?xml version="1.0"?>
<expr>
  <message>This is a test</message>
  <tags>
    <elem>test</elem>
    <elem>example</elem>
  </tags>
</expr>

In the above expression, the type and meta information is discarded. The attribute set is translated to a collection of XML sub elements in which the element names correspond to the attribute keys. The list elements are translated to generic sub elements (the above example uses elem, but any element name can be picked). The above notation is IMO, more readable, more concise and easier to parse by an external program.

Attribute keys may be identifiers, but can also be strings containing characters that invalidate certain XML element names (e.g. < or >). It is also possible to use a slightly more verbose notation in which a generic element name is used and the name property is used for each attribute set key:

<?xml version="1.0"?>
<expr>
  <attr name="message">This is a test</attr>
  <attr name="tags">
    <elem>test</elem>
    <elem>example</elem>
  </attr>
</expr>

When an application has a static domain model, it is not necessary to know any types (e.g. this conversion can be done in the application code using the application domain model). However, it may also be desired to construct data structures dynamically.

For dynamic object construction, type information needs to be known. Optionally, XML elements can be annotated with type information:

<?xml version="1.0"?>
<expr type="attrs">
  <attr name="message" type="string">This is a test</attr>
  <attr name="tags" type="list">
    <elem type="string">test</elem>
    <elem type="string">example</elem>
  </attr>
</expr>

To automatically convert data to XML format following the above listed conventions, I have created a standardized XSL stylesheet and command-line tool that can automatically convert Nix expressions.

The following command generates the first XML code fragment:

$ nixexpr2xml --attr-style simple example.nix

We can use the verbose notation for attribute sets, by running:

$ nixexpr2xml --attr-style verbose example.nix

Type annotations can be enabled by running:

$ nixexpr2xml --attr-style verbose --enable-types example.nix

The root, attribute and list element representations as well as the attribute set and types properties use generic element and property names. Their names can also be adjusted, if desired:

$ nixexpr2xml --root-element-name root \
  --list-element-name item \
  --attr-element-name property \
  --name-attribute-name key \
  --type-attribute-name mytype
  example.nix

Parsing a domain model


In addition to producing more "practical" XML data, I have also implemented utility functions that help me consuming the XML data to construct a domain model in the C programming language, that consists values (strings, integers etc.), structs, list-like data structures (e.g. arrays, linked lists) and table-like data structures, such as hash tables.

For example, the following XML document only containing a string:

<expr>hello</expr>

can be parsed to a string in C as follows:

#include <nixxml-parse.h>

xmlNodePtr element;
/* Open XML file and obtain root element */
xmlChar *value = NixXML_parse_value(element, NULL);
printf("value is: %s\n"); // value is: hello

We can also use functions to parse (nested) data structures. For example, to parse the following XML code fragment representing an attribute set:

<expr>
  <attr name="firstName">Sander</attr>
  <attr name="lastName">van der Burg</attr>
</expr>

We can use the following code snippet:

#include <stdlib.h>
#include <nixxml-parse.h>

xmlNodePtr element;

typedef struct
{
    xmlChar *firstName;
    xmlChar *lastName;
}
ExampleStruct;

void *create_example_struct(xmlNodePtr element, void *userdata)
{
    return calloc(1, sizeof(ExampleStruct));
}

void parse_and_insert_example_struct_member(xmlNodePtr element, void *table, const xmlChar *key, void *userdata)
{
    ExampleStruct *example = (ExampleStruct*)table;

    if(xmlStrcmp(key, (xmlChar*) "firstName") == 0)
        example->firstName = NixXML_parse_value(element, userdata);
    else if(xmlStrcmp(key, (xmlChar*) "lastName") == 0)
        example->lastName = NixXML_parse_value(element, userdata);
}

/* Open XML file and obtain root element */

ExampleStruct *example = NixXML_parse_verbose_heterogeneous_attrset(element, "attr", "name", NULL, create_example_struct, parse_and_insert_example_struct_member);

To parse the attribute set in the XML code fragment above (that uses a verbose notation) and derive a struct from it, we invoke the NixXML_parse_verbose_heterogeneous_attrset() function. The parameters specify that the XML code fragment should be parsed as follows:

  • It expects the name of the XML element of each attribute to be called: attr.
  • The property that refers to the name of the attribute is called: name.
  • To create a struct that stores the attributes in the XML file, the function: create_example_struct() will be executed that allocates memory for it and initializes all fields with NULL values.
  • The logic that parses the attribute values and assigns them to the struct members is in the parse_and_insert_example_member() function. The implementation uses NixXML_parse_value() (as shown in the previous example) to parse the attribute values.

In addition to parsing values as strings and attribute sets as structs, it is also possible to:

  • Parse lists, by invoking: NixXML_parse_list()
  • Parse uniformly typed attribute sets (in which every attribute set member has the same type), by invoking: NixXML_parse_verbose_attrset()
  • Parse attribute sets using the simple XML notation for attribute sets (as opposed to the verbose notation): NixXML_parse_simple_attrset() and NixXML_parse_simple_heterogeneous_attrset()

Printing Nix or XML representation of a domain model


In addition to parsing NixXML data to construct a domain model, the inverse process is also possible -- the API also provides convenience functions to print an XML or Nix representation of a domain model.

For example, the following string in C:

char *greeting = "Hello";

can be displayed as a string in the Nix expression language as follows:

#include <nixxml-print-nix.h>

NixXML_print_string_nix(stdout, greeting, 0, NULL); // outputs: "Hello"

or as an XML document, by running:

#include <nixxml-print-xml.h>

NixXML_print_open_root_tag(stdout, "expr");
NixXML_print_string_xml(stdout, greeting, 0, NULL, NULL);
NixXML_print_close_root_tag(stdout, "expr");

producing the following output:

<expr>Hello</expr>

The example struct shown in the previous section can be printed as a Nix expression with the following code:

#include <nixxml-print-nix.h>

void print_example_attributes_nix(FILE *file, const void *value, const int indent_level, void *userdata, NixXML_PrintValueFunc print_value)
{
    ExampleStruct *example = (ExampleStruct*)value;
    NixXML_print_attribute_nix(file, "firstName", example->firstName, indent_level, userdata, NixXML_print_string_nix);
    NixXML_print_attribute_nix(file, "lastName", example->lastName, indent_level, userdata, NixXML_print_string_nix);
}

NixXML_print_attrset_nix(stdout, &example, 0, NULL, print_example_attributes_nix, NULL);

The above code fragment executes the function: NixXML_print_attrset_nix() to print the example struct as an attribute set. The attribute set printing function invokes the function: print_example_attributes_nix() to print the attribute set members.

The print_example_attributes_nix() function prints each attribute assignment. It uses the NixXML_print_string_nix() function (shown in the previous example) to print each member as a string in the Nix expression language.

The result of running the above code is the following Nix expression:

{
  "firstName" = "Sander";
  "lastName" = "van der Burg";
}

the same struct can be printed as XML (using the verbose notation for attribute sets) with the following code:

#include <nixxml-print-xml.h>

void print_example_attributes_xml(FILE *file, const void *value, const char *child_element_name, const char *name_property_name, const int indent_level, const char *type_property_name, void *userdata, NixXML_PrintXMLValueFunc print_value)
{
    ExampleStruct *example = (ExampleStruct*)value;
    NixXML_print_verbose_attribute_xml(file, child_element_name, name_property_name, "firstName", example->firstName, indent_level, NULL, userdata, NixXML_print_string_xml);
    NixXML_print_verbose_attribute_xml(file, child_element_name, name_property_name, "lastName", example->lastName, indent_level, NULL, userdata, NixXML_print_string_xml);
}

NixXML_print_open_root_tag(stdout, "expr");
NixXML_print_verbose_attrset_xml(stdout, &example, "attr", "name", 0, NULL, NULL, print_example_attributes_xml, NULL);
NixXML_print_close_root_tag(stdout, "expr");

The above code fragment uses a similar strategy as the previous example (by invoking NixXML_print_verbose_attrset_xml()) to print the example struct as an XML file using a verbose notation for attribute sets.

The attribute set members are printed by the print_example_attributes_xml() function.

The result of running the above code is the following XML output:

<expr>
  <attr name="firstName">Sander</attr>
  <attr name="lastName">van der Burg</attr>
</expr>

In addition to printing values and attribute sets, it is also possible to:

  • Print lists in Nix and XML format: NixXML_print_list_nix(), NixXML_print_list_xml()
  • Print attribute sets in simple XML notation: NixXML_print_simple_attrset_xml()
  • Print strings as int, float or bool: NixXML_print_string_as_*_xml.
  • Print integers: NixXML_print_int_xml()
  • Disable indentation by setting the indent_level parameter to -1.
  • Print type annotated XML, by setting the type_property_name parameter to a string that is not NULL.

Using abstract data structures


There is no standardized library for abstract data structures in C, e.g. lists, maps, trees etc. As a result, each framework provides their own implementations of them. To parse lists and attribute sets (that have arbitrary structures), you need generalized data structures that are list-like or table-like.

libnixxml provides two sub libraries to demonstrate how integration with abstract data structures can be implemented. One sub library is called libnixxml-data that uses pointer arrays for lists and xmlHashTable for attribute sets, and another is called libnixxml-glib that integrates with GLib using GPtrArray structs for lists and GHashTables for attribute sets.

The following XML document:

<expr>
  <elem>test</elem>
  <elem>example</elem>
</expr@>

can be parsed as a pointer array (array of strings) as follows:

#include <nixxml-ptrarray.h>

xmlNodePtr element;
/* Open XML file and obtain root element */
void **array = NixXML_parse_ptr_array(element, "elem", NULL, NixXML_parse_value);

and printed as a Nix expression with:

NixXML_print_ptr_array_nix(stdout, array, 0, NULL, NixXML_print_string_nix);

and as XML with:

NixXML_print_open_root_tag(stdout, "expr");
NixXML_print_ptr_array_xml(stdout, array, "elem", 0, NULL, NULL, NixXML_print_string_xml);
NixXML_print_close_root_tag(stdout, "expr");

Similarly, there is a module that works with xmlHashTables providing a similar function interface as the pointer array module.

Working with generic NixXML nodes


By using generic data structures to represent lists and tables, type annotated NixXML data and a generic NixXML_Node struct (that indicates what kind of node we have, such as a value, list or attribute set) we can also automatically parse an entire document by using a single function call:

#include <nixxml-ptrarray.h>
#include <nixxml-xmlhashtable.h>
#include <nixxml-parse-generic.h>

xmlNodePtr element;
/* Open XML file and obtain root element */
NixXML_Node *node = NixXML_generic_parse_expr(element,
    "type",
    "name",
    NixXML_create_ptr_array,
    NixXML_create_xml_hash_table,
    NixXML_add_value_to_ptr_array,
    NixXML_insert_into_xml_hash_table,
    NixXML_finalize_ptr_array);

The above function composes a generic NixXML_Node object. The function interface uses function pointers to compose lists and tables. These functions are provided by the pointer array and xmlHashTable modules in the libnixxml-data library.

We can also print an entire NixXML_Node object structure as a Nix expression:

#include <nixxml-print-generic-nix.h>

NixXML_print_generic_expr_nix(stdout,
    node,
    0,
    NixXML_print_ptr_array_nix,
    NixXML_print_xml_hash_table_nix);

as well as XML (using simple or verbose notation for attribute sets):

#include <nixxml-print-generic-xml.h>

NixXML_print_generic_expr_verbose_xml(stdout,
    node,
    0,
    "expr",
    "elem",
    "attr",
    "name",
    "type",
    NixXML_print_ptr_array_xml,
    NixXML_print_xml_hash_table_verbose_xml);

Summary


The following table summarizes the concepts described in this blog post:

Concept Nix expression representation XML representation C application domain model
value "hello" hello char*
list [ "hello" "bye" ] <elem>hello</elem><elem>bye</elem> void**, linked list, ...
attribute set { a = "hello"; b = "bye"; } <a>hello</a><b>bye</b> xmlHashTablePtr, struct, ...
attribute set { a = "hello"; b = "bye"; } <attr name="a">hello</attr><attr name="b">bye</attr> xmlHashTablePtr, struct, ...

The above table shows the concepts that the NixXML defines, and how they can be represented in the Nix expression language, XML and in a domain model of a C application.

The representations of these concepts can be translated as follows:

  • To convert a raw AST XML representation of a Nix expression to NixXML, we can use the included XSL stylesheet or run the nixexpr2xml command.
  • XML concepts can be parsed to a domain model in a C application by invoking NixXML_parse_* functions for the appropriate concepts and XML representation.
  • Domain model elements can be printed as XML by invoking NixXML_print_*_xml functions.
  • Domain model elements can be printed in the Nix expression language by invoking NixXML_print_*_nix functions.

Benefits


I have re-engineered the current development versions of Disnix and the Dynamic Disnix toolsets to use libnixxml for data exchange. For Disnix, there is much fewer boilerplate code that I need to write for the parsing infrastructure, making it significantly easier to maintain it.

In the Dynamic Disnix framework, libnixxml provides even more benefits beyond a simpler parsing infrastructure. The Dynamic Disnix toolset provides deployment planning methods, and documentation and visualization tools. These concerns are orthogonal to the features of the core Disnix toolset -- there is first-class Nix/Disnix integration, but the features of Dynamic Disnix should work with any service-oriented system (having a model that works with services and dependencies) regardless of what technology is used to carry out the deployment process itself.

With libnixxml it is now quite easy to make all these tools both accept Nix and XML representations of their input models, and make them output data in both Nix and XML. It is now also possible to use most features of Dynamic Disnix, such as the visualization features described in the previous blog post, independently of Nix and Disnix.

Moreover, the deployment planning methods should now also be able to more conveniently invoke external tools, such as SAT-solvers.

Related work


libnixxml is not the only Nix language integration facility I wrote. I also wrote NiJS (that is JavaScript-based) and PNDP (that is PHP-based). Aside from the language (C programming language), the purpose of libnixxml is not to replicate the functionality of these two libraries in C.

Basically, libnixxml has the inverse purpose -- NiJS and PNDP are useful for systems that already have a domain model (e.g. a domain-specific configuration management tool), and make it possible to generate the required Nix expression language code to conveniently integrate with Nix.

In libnixxml, the Nix expression representation is the basis and libnixxml makes it more convenient for external programs to consume such a Nix expression. Moreover, libnixxml only facilitates data interchange, and not all Nix expression language features.

Conclusion


In this blog post, I have described libnixxml that makes XML-based data interchange with configurations in the Nix expression language and domain models in the C programming language more convenient. It is part of the current development version of Disnix and can be obtained as a separate project from my GitHub page.

Thursday, February 28, 2019

Generating functional architecture documentation from Disnix service models

In my previous blog post, I have described a minimalistic architecture documentation approach for service-oriented systems based on my earlier experiences with setting up basic configuration management repositories. I used this approach to construct a documentation catalog for the platform I have been developing at Mendix.

I also explained my motivation -- it improves developer effectiveness, team consensus and the on-boarding of new team members. Moreover, it is a crucial ingredient in improving the quality of a system.

Although we are quite happy with the documentation, my biggest inconvenience is that I had to derive it entirely by hand -- I consulted various kinds of sources, but since existing documentation and information provided by people may be incomplete or inconsistent, I considered the source code and deployment configuration files the ultimate source of truth, because no matter how elegantly a diagram is drawn, it is useless if it does not match the actual implementation.

Because a manual documentation process is very costly and time consuming, a more ideal situation would be to have an automated approach that automatically derives architecture documentation from deployment specifications.

Since I am developing a deployment framework for service-oriented systems myself (Disnix), I have decided to extend it with a generator that can derive architecture diagrams and supplemental descriptions from the deployment models using the conventions I have described in my previous blog post.

Visualizing deployment architectures in Disnix


As explained in my previous blog post, the notation that I used for the diagrams was not something I invented from scratch, but something I borrowed from Disnix.

Disnix already has a feature (for quite some time) that can visualize deployment architectures referring to a description that shows how the functional parts (the services/components) are mapped to physical resources (e.g. machines/containers) in a network.

For example, after deploying a service-oriented system, such as my example web application system, by running:

$ disnix-env -s services.nix -i infrastructure.nix \
  -d distribution-bundles.nix

You can visualize the corresponding deployment architecture of the system, by running:

$ disnix-visualize > out.dot

The above command-line instruction generates a directed graph in the DOT language. The resulting dot file can be converted into a displayable image (such as a PNG or SVG file) by running:

$ dot -Tpng out.dot > out.png

Resulting in a diagram of the deployment architecture that may look as follows:


The above diagram uses the following notation:

  • The light grey boxes denote machines in a network. In the above deployment scenario, we have two them.
  • The ovals denote services (more specifically: in a Disnix-context, they reflect any kind of distributable deployment unit). Services can have almost any shape, such as web services, web applications, and databases. Disnix uses a plugin system called Dysnomia to make sure that the appropriate deployment steps are carried out for a particular type of service.
  • The arrows denote inter-dependencies. When a service points to another service means that the latter is an inter-dependency of the former service. Inter-dependency relationships ensure that the dependent service gets all configuration properties so that it knows how to reach the dependency and the deployment system makes sure that inter-dependencies of a specific service are deployed first.

    In some cases, enforcing the right activation order of activation may be expensive. It is also possible to drop the ordering requirement, as denoted by the dashed arrows. This is acceptable for redirects from the portal application, but not acceptable for database connections.
  • The dark grey boxes denote containers. Containers can be any kind of runtime environment that hosts zero or more distributable deployment units. For example, the container service of a MySQL database is a MySQL DBMS, whereas the container service of a Java web application archive can be a Java Servlet container, such as Apache Tomcat.

Visualizing the functional architecture of service-oriented systems


The services of which a service-oriented systems is composed are flexible -- they can be deployed to various kinds of environments, such a test environment, a second fail-over production environment or a local machine.

Because services can be deployed to a variety of targets, it may also be desired to get an architectural view of the functional parts only.

I created a new tool called: dydisnix-visualize-services that can be used to generate functional architecture diagrams by visualizing the services in the Disnix services model:


The above diagram is a visual representation of the services model of the example web application system, using a similar notation as the deployment architecture without showing any environment characteristics:

  • Ovals denote services and arrows denote inter-dependency relationships.
  • Every service is annotated with its type, so that it becomes clear what kind of a shape a service has and what kind of deployment procedures need to be carried out.

Despite the fact that the above diagram is focused on the functional parts, it may still look quite detailed, even from a functional point of view.

Essentially, the architecture of my example web application system is a "system of sub systems" -- each sub system provides an isolated piece of functionality consisting of a database backend and web application front-end bundle. The portal sub system is the entry point and responsible for guiding the users to the sub systems implementing the functionality that they want to use.

It is also possible to annotate services in the Disnix services model with a group and description property:

{distribution, invDistribution, pkgs, system}:

let
  customPkgs = import ../top-level/all-packages.nix {
    inherit pkgs system;
  };

  groups = {
    homework = "Homework";
    literature = "Literature";
    ...
  };
in
{
  homeworkdb = {
    name = "homeworkdb";
    pkg = customPkgs.homeworkdb;
    type = "mysql-database";
    group = groups.homework;
    description = "Database backend of the Homework subsystem";
  };

  homework = {
    name = "homework";
    pkg = customPkgs.homework;
    dependsOn = {
      inherit usersdb homeworkdb;
    };
    type = "apache-webapplication";
    appName = "Homework";
    group = groups.homework;
    description = "Front-end of the Homework subsystem";
  };

  ...
}

In the above services model, I have grouped every database and web application front-end bundle in a group that represents a sub system (such as Homework). By adding the --group-subservices parameter to the dydisnix-visualize-services command invocation, we can simplify the diagram to only show the sub systems and how these sub systems are inter-connected:

$ dydisnix-visualize-services -s services.nix -f png \
  --group-subservices

resulting in the following functional architecture diagram:


As may be observed in the picture above, all services have been grouped. The service groups are denoted by ovals with dashed borders.

We can also query sub architecture diagrams of every group/sub system. For example, the following command generates a sub architecture diagram for the Homework group:

$ dydisnix-visualize-services -s services.nix -f png \
  --group Homework --group-subservices

resulting in the following diagram:


The above diagram will only show the the services in the Homework group and their context -- i.e. non-transitive dependencies and services that have a dependency on any service in the requested group.

Services that exactly fit the group or any of its parent groups will be displayed verbatim (e.g. the homework database back-end and front-end). The other services will be categorized into in the lowest common sub group (the Users and Portal sub systems).

For more complex architectures consisting of many layers, you may probably want to generate all available architecture diagrams in one command invocation. It is also possible to run the visualization tool in batch mode. In batch mode, it will recursively generate diagrams for the top-level architecture and every possible sub group and stores them in a specified output folder:

$ dydisnix-visualize-services --batch -s services.nix -f svg \
  --output-dir out

Generating supplemental documentation


Another thing I have explained in my previous blog post is that providing diagrams is useful, but they cannot clear up all confusion -- you also need to document and clarify additional details, such as the purposes of the services.

It also possible to generate a documentation page for each group showing a table of services with their descriptions and types:

The following command generates a documentation page for the Homework group:

$ dydisnix-document-services -s services.nix --group Homework

It is also possible to adjust the generation process by providing a documentation configuration file (by using the --docs parameter):

$ dydisnix-document-services -f services.nix --docs docs.nix \
  --group Homework

The are a variety of settings that can be provided in a documentation configuration file:

{
  groups = {
    Homework = "Homework subsystem";
    Literature = "Literature subsystem";
    ...
  };

  fields = [ "description" "type" ];

  descriptions = {
    type = "Type";
    description = "Description";
  };
}

The above configuration file specifies the following properties:

  • The descriptions for every group.
  • Which fields should be displayed in the overview table. It is possible to display any property of a service.
  • A description of every field in the services model.

Like the visualization tool, the documentation tool can also be used in batch mode to generate pages for all possible groups and sub groups.

Generating a documentation catalog


In addition to generating architecture diagrams and descriptions, it is also possible to combine both tools to automatically generate a complete documentation catalog for a service-oriented system, such as the web application example system:

$ dydisnix-generate-services-docs -s services.nix --docs docs.nix \
  -f svg --output-dir out

By opening the entry page in the output folder, you will get an overview of the top-level architecture, with a description of the groups.


By clicking on a group hyperlink, you can inspect the sub architecture of the corresponding group, such as the 'Homework' sub system:


The above page displays the sub architecture diagram of the 'Homework' subsystem and a description of all services belonging to that group.

Another particularly interesting aspect is the 'Portal' sub system:


The portal's purpose is to redirect users to functionality provided by the other sub systems. The above architecture diagram displays all the sub systems in grouped form to illustrate that there is a dependency relationship, but without revealing all their internal details that clutters the diagram with unnecessary implementation details.

Other features


The tools support more use cases than those described in this blog post -- it is also possible, for example, to create arbitrary layers of sub groups by using the '/' character as a delimiter in the group identifier. I also used the company platform as an example case, that can be decomposed into four layers.

Availability


The tools described in this blog post are part of the latest development version of Dynamic Disnix -- a very experimental extension framework built on top of Disnix that can be used to make service-oriented systems self-adaptive by redeploying their services in case of events.

The reason why I have added these tools to Dynamic Disnix (and not the core Disnix toolset) is because the extension toolset has an infrastructure to parse and reflect over individual Disnix models.

Although I promised to make an official release of Dynamic Disnix a very long time ago, this still has not happened yet. However, the documentation feature is a compelling reason to stabilize the code and make the framework more usable.

Thursday, January 31, 2019

A minimalistic discovery and architecture documentation process

In a blog post written a couple of years ago, I have described how to set up a basic configuration management process in a small organization that is based on the process framework described in the IEEE 828-2012 configuration management standard. The most important prerequisite for setting up such a process is identifying all configuration items (CIs) and storing them in a well-organized repository.

There are many ways to organize configuration items ranging from simple to very sophisticated solutions. I used a very small set of free and open source tools, and a couple of simple conventions to set up a CI repository:

  • A Git repository with an hierarchical directory structure referring to configurations items. Each path component in the directory structure serves a specific purpose to group configuration items. The overall strategy was to use a directory structure with a maximum of three levels: environment/machine/application. Using Git makes it possible to version configuration items and share the repository with team members.
  • Using markdown to write down the purposes of the configuration items and descriptions how they can be reproduced. Markdown works well for two reasons: it can be nicely formatted in a browser, but also read from a terminal when logged in to remote servers via SSH.
  • Using Dia for drawing diagrams of systems consisting of more complex applications components. Dia is not the most elegant program around, but it works well enough, it is free and open source, and supported on Linux, Windows and macOS.

My main motivation to formalize configuration management (but only lightly), despite being in a small organization, is to prevent errors and minimize delays and disruptions while remaining flexible by not being bound to all kinds of complex management procedures.

I wrote this blog post while I was still employed at a small-sized startup company with only one development team. In the meantime, I have joined a much bigger organization (Mendix) that has many cross-disciplinary development teams that work concurrently on various aspects of our service and product portfolio.

About microservices


When I just joined, the amount of information I had to absorb was quite overwhelming. I also learned that we heavily adopted the microservices architecture paradigm for our entire online service platform.

According to Martin Fowler's blog post on microservices, using microservices offers the following benefits:

  • Strong module boundaries. You can divide the functionality of a system into microservices and make separate teams responsible for the development of each service. This makes it possible to iterate faster and offer better quality because teams can focus on themselves on a subset of features only.
  • Independent deployment. Microservices can be deployed independently making it possible to ship features when they are done, without having complex integration cycles.
  • Technology diversity. Microservices are language and technology agnostic. You can pick almost any programming language (e.g. Java, Python, Mendix, Go), data storage solution (e.g. PostgreSQL, MongoDB, InfluxDB) or operating system (e.g. Linux, FreeBSD, Windows) to implement a microservice making it possible pick the most suitable combination of technologies and use them at their full advantage.

However, decomposing a system into a collection of collaborating services also comes at a (sometimes substantial!) price:

  • There is typically much more operational complexity. Because there are many components and typically a large infrastructure to manage, activities such as deploying, upgrading, and monitoring the condition of a system is much more time consuming and complex. Furthermore, because of technology diversity, there are also many kinds of specialized deployment procedures that you need to carry out.
  • Data is eventually consistent. You have to live with the fact that (temporary) inconsistencies could end up in your data, and you must invest in implementing facilities that keep your data is consistent.
  • Because of distribution development is harder in general -- it is more difficult to diagnose errors (e.g. a failure in one service could trigger a chain reaction of errors, without having proper error traces), it is harder to test a system because of additional deployment complexity. The network links between services may be slow and subject to failure, causing all kinds of unpredictable problems. Also machines that host critical services may crash.

Studying the architecture


When applied properly, e.g. functionality is well separated and there is strong cohesion and weak coupling between services, while investing in solutions to cope with the challenges listed above, the benefits of microservices can be reaped, resulting in a scalable systems that can be developed my multiple teams working on features concurrently.

However, an important prerequisite for making changes in such an environment, and maintaining or improving the quality properties of a system, requires discipline and a relatively good understanding of the environment -- in the beginning, I faced all kinds of practical problems when I wanted to make even a subtle change -- some areas of our platform where documented, while others were not. Some documentation was also outdated, slightly incomplete and sometimes inconsistent with the actual implementation.

Certain areas of our platform were also highly complex resulting in very complex architectural views, with many boxes and arrows. Furthermore, information was also scattered around many different places.

As part of my on-boarding process, and as a means to cope with some of my practical problems, I have created a documentation repository of the platform that our team develops by extending the (minimalistic) principles for configuration management described in the earlier blog post.

I realized that simply identifying the service components of which the system consists, is not enough to get an understanding of the system -- there are many items and complex details that need to be captured.

In addition to the identification of all configuration items, I also want:

  • Proper guidance. To understand a particular piece of functionality, I should not need to study every component in detail. Instead, I want to know the full context and only the details of the relevant components.
  • Completeness. I want all service components to be visible. I do not want any details to be covered up. For example, I have also seen quite a few diagrams that hide complex implementation details. I much rather want flaws to be visible so that they can be resolved at a later point in time.
  • Clear boundaries. Our platform is not self contained, but relies on services provided by other teams. I want to know what components are our responsibility and what is managed by external teams.
  • Clarity. I want to know what the purpose of a component is. Their names may not always necessarily reflect or explain what they do.
  • Consistency. No matter how nicely a diagram is drawn, it should match the actual implementation or it is of very little use.
  • References to the actual implementation. I also want to know where I can find the implementation of a component, such as its Git repository.

Documenting the architecture


To visualize the architecture of our platform and organize all relevant information, I followed a strategy:

  • I took the components (typically their source code repositories) as the basis for everything else -- every component translates to a box in the architecture diagram.
  • I analyzed the dependency relationships between the components and denoted them as arrows. When a box points to another box by means of an arrow, this means that the other box is a dependency that should be deployed first. When a dependency is absent, the service will (most likely) not work.
  • I also discovered that the platform diagram easily gets cluttered by the sheer amount of components -- I decided to combine components that have very strongly correlated functionality in feature groups (that have dashed borders). Every feature group in architecture diagrams refers to another sub architecture diagram that provides a more specialized view of the feature group.
  • To clearly illustrate the difference between components that are our responsibility and those that are maintained by others teams, I make all external dependencies visible in the top-level architecture diagram.

The notation I used for these diagrams is not something I have entirely invented from scratch -- it is inspired by graph theory, package management and service management concepts. Disnix, for example, can visualize deployment architectures by using a similar notation.

To find all relevant information to create the diagrams, I consulted various sources:

  • I studied existing documents and diagrams to get a basic understanding of the system and an idea of the details I should look at.
  • I talked to a variety of people from various teams.
  • I looked inside the configuration settings of all deployment solutions used, e.g. the Amazon AWS console, Docker, CloudFoundry, Kubernetes, Nix configuration files.
  • Peek inside the source code repositories and look for settings that are references to other systems, such as configuration values that store URLs.
  • When I am in doubt: I consider the deployment configuration files and source code the "ultimate source of truth", because no matter how nice a diagram looks, it is useless if it is implemented differently.

Finally, just drawing diagrams will not completely suffice when the goal is provide clarity. I also observed that I need to document some leftover details.

Foremost, having a diagram without the semantics not explained will typically leave too many details open to interpretation to the user, so you need to explain the notation.

Second, you need to provide additional details about the services. I typically enumerate the following properties in a table for every component:

  • The name of the component.
  • A one line description stating its purpose.
  • The type of project (e.g. a Python/Java/Go project, Docker container, AWS Lambda function, etc.). This is useful to determine the kind of deployment procedure for the component.
  • A reference to the source code repository, e.g. a Git repository. The README of the corresponding repository should provide more detailed information about the project.

Benefits


Although it is quite a bit of work to set up, having a well documented architecture provides us the following benefits:

  • More effective deployment. Because of the feature groups and dividing the architecture into multiple layers, general concepts and details are separated. This makes it easier for developers to focus and absorb the right detailed knowledge to change a service.
  • More consensus in the team about the structure of the system and general quality attributes, such as scalability and security.
  • Better on-boarding for new team members.

Discussion


Writing architecture documentation IMO is not rocket science, just discipline. Obviously, there are much more sophisticated tools available to organize and visualize architectures (even tools that can generate code and reverse engineer code), but this is IMO not a hard requirement to start documenting.

However, you can not take all confusion away -- even if you have the best possible architecture documentation, people's thinking habits are shaped by the concepts they know and there will always be a slight mismatch (which is documented in academic research: 'Why Is It So Hard to Define Software Architecture?' written by Jason Baragry and Karl Reed).

Finally, architecture documentation is only a first good step to improve the quality of service-oriented systems. To make it a success, much more is needed, such as:

  • Automated (and reproducible) deployment processes.
  • More documentation (such as the APIs, end-user documentation, project documentation).
  • Automated unit, integration and acceptance testing.
  • Monitoring.
  • Measuring and improving code quality, test coverage, etc.
  • Using design patterns, architectural patterns, good programming abstractions.
  • And many more aspects.

But to do these things properly, having proper architecture documentation is an important prerequisite.

Related work


UPDATE: after publishing this blog post and giving an internal presentation at Mendix about this subject, I received a couple of questions about architectural approaches that share similarities with my approach, such as the C4 model.

The C4 model also uses a layered approach in which the top-level diagram displays the context of the system (the relation of the system with external users and systems), and deeper layers gradually reveal more details of the inner workings of the system while limiting the view to a subset of components to prevent details obfuscating the purpose.

I did not use this approach as an example reference, but my work is basically built on top of the same underlying principles that the C4 model builds on -- creating abstractions.

Creating abstractions in modeling is popularized already in the 70s by various computer scientists, such as Edward Yourdon and Tom DeMarco, who bring concepts from structural programming to other domains, such as modeling (as explained in the paper: 'The Choice of New Software Development Methodologies for Software Development Projects' by Edward Yourdon).

One of the mental aids in structured programming is abstraction, so that it "only matters what something does, disregarding how it works". I took data flow modeling (DFD) as an example technique (that also facilitates layers of abstractions including a top-level context DFD), but I replaced the data flow notation by dependency modeling.

Furthermore, the C4 model also provides a number of diagram types for each abstraction layer with specific purposes, but in my approach the notation and purposes of each layer are left abstract.

Sunday, December 30, 2018

8th yearly blog reflection

Similar to previous years, I will reflect over last year's writings. Again the occasion is my blog's anniversary -- today, it has been exactly 8 years ago that I started this blog.

Disnix


In the first two months of this year, most of my work was focused on Disnix. I added a number of extra features to Disnix that are particularly useful to diagnose errors in an environment in which services are distributed over a collection of machines in a network.

I also revised the internals of Disnix in such a way so that it has become possible to deploy systems with circular dependencies, by dropping the ordering requirement on inter-dependencies, when this is desired.

Finally, I extended my custom web application framework's example applications repository (that I released last year) and made it a public Disnix example case, to provide a more realistic/meaningful public example in addition to my trivial/hypothetical examples.

The new features described in these three blog posts are part of Disnix 0.8, released in March this year.

Mendix


The biggest news of the year is probably the fact that I embarked on a new challenge. In April, I joined Mendix, a company that provides a low-code application development platform.

Since the moment I joined, there have been many things I learned and quite a few new people I got to know.

From my learning experiences, I wrote an introduction blog post to the Mendix platform, specifically aimed at people with an advanced programming background (as a sidenote: the Mendix platform targets various kinds of audiences, including users with a limited programming background).

At Mendix, the Nix project and its tools are still largely unknown technology. In the company, sharing learning experiences within the entire R&D team is generally encouraged.

Mendix already uses rivalling technologies to automate application deployments. As an exercise to learn the technical architecture of running applications better, I automated the deployment process of Mendix applications with the Nix package manager and NixOS making it possible to automatically deploy a Mendix application by writing a simple NixOS configuration file and running only a single command-line instruction.

I also presented the Nix automation process to the entire R&D department and wrote an article about it on the public Mendix blog. (I kept a transcription on my own blog for archiving purposes).

Nix


In addition to Disnix, and joining Mendix (where I gave an introduction to Nix), I did some general Nix-related work as well.

To improve my (somewhat unorthodox) way of working, I created a syntax highlighter for the Nix expression language for one of my favourite tools: the editor the comes with the Midnight Commander.

As a personal experiment and proposal to tidy up some internals in the Nix packages repository, I developed my own build function abstractions to mimic Nix's stdenv.mkDerivation {} function abstraction with the purpose to clearly identify and separate its concerns.

Later, I extended this function abstraction approach to tidy up the deployment automation of pluggable SDKs, such as the Android SDK, to make it easier to automatically compose SDKs with plugins and the corresponding applications that they build.

Finally, I experimented with an approach that can automatically patch all ELF binaries in the Android SDK (and other binary only software projects), so that they will run from the Nix store without any problems.

Some of the function abstraction techniques described in the blog posts listed above as well as the auto patching strategy are used in the revised version of the Android build infrastructure that I merged into the master version of Nixpkgs last week. Aside from upgrading the Android SDK to the latest version, these improvements make maintaining the Android SDK and its plugins much easier.

In addition to writing Nix-related blog posts, I did much more Nix related stuff. I also gave a talk at NixCon 2018 (the third conference held about Nix and its related technologies) about Dysnomia's current state of affairs and I released a new major version of node2nix that adds Node.js 10.x support.

Overall top 10 of my blog posts


As with previous years, I will publish my overall top 10 of most popular blog posts:

  1. Managing private Nix packages outside the Nixpkgs tree. As I predicted in my blog reflection last year, this blog post is going to overtake the popularity of the blog post about GNU Guix. It seems that this blog post proves that we should provide more practical/hands-on information to people who just want to start using the Nix package manager.
  2. On Nix and GNU Guix. This has been my most popular blog post since 2012 and has now dropped to the second place. I think this is caused by the fact that GNU Guix is not as controversial as it used to be around the time it was first announced.
  3. An evaluation and comparison of Snappy Ubuntu. Remains my third most popular blog post but is gradually dropping in popularity. I have not heard much about the Snappy package manager developments in the last two years.
  4. Setting up a multi-user Nix installation on non-NixOS systems. Is still my fourth most popular blog post. I believe this blog post should drop in popularity soon, thanks to a number of improvements made to the Nix package manager. There is now a Nix installer for non-NixOS systems that supports multi-user and single-user installations on any Linux and macOS system.
  5. Yet another blog post about Object Oriented Programming and JavaScript. Still at the same place compared to last year. I noticed that still quite a few people use this blog post as a resource to learn more about prototypes in JavaScript, despite the fact that newer implementations of the ECMAScript standard have better functions (such as: Object.create) to manage objects with prototypes.
  6. An alternative explanation of the Nix package manager. Remains at exactly the same spot compared to last year. It probably remains popular due to the fact that this blog post is my preferred explanation recipe.
  7. On NixOps, Disnix, service deployment and infrastructure deployment. No change compared to last year. It still seems to clear up confusion to groups of people.
  8. Asynchronous programming with JavaScript. Maintains the same position compared to last year. I have no idea why it remains so popular, despite many improvements to the Node.js runtime and JavaScript language.
  9. Composing FHS-compatible chroot environments with Nix (or deploying Steam in NixOS). The popularity of this blog post has been gradually dropping in the last few years, but all of sudden it increased again. As a result, it now rose to the 9th place.
  10. A more realistic public Disnix example. This is the only blog post I wrote in 2018. It seems that this public example case is helpful for people to understand the kind of systems and requirements you have to meet in order to use Disnix to its full potential.

Discussion


What you may have probably noticed is that there is a drop in some of my more technical blog posts this year. This drop is caused by the fact that I am basically in a transition period -- I still have familiarize myself with my new environment and it is probably going to take a bit of time before I am back in my usual rythm.

But do not be worried. As usual, I have plenty of ideas and there will be more interesting stuff coming next year!

The final thing I would like to say is:

HAPPY NEW YEAR!!!!

Tuesday, October 30, 2018

Auto patching prebuilt binary software packages for deployment with the Nix package manager

As explained in many previous blog posts, most of the quality properties of the Nix package manager (such as reliable deployment) stem from the fact that all packages are stored in a so-called Nix store, in which every package resides in its own isolated folder with a hash prefix that is derived from all build inputs (such as: /nix/store/gf00m2nz8079di7ihc6fj75v5jbh8p8v-zlib-1.2.11).

This unorthodox naming convention makes it possible to safely store multiple versions and variants of the same package next to each other.

Although isolating packages in the Nix store provides all kinds of benefits, it also has a big drawback -- common components, such as shared libraries, can no longer be found in their "usual locations", such as /lib.

For packages that are built from source with the Nix package manager this is typically not a problem:

  • The Nix expression language computes the Nix store paths for the required packages. By simply referring to the variable that contains the build result, you can obtain the Nix store path of the package, without having to remember them yourself.
  • Nix statically binds shared libraries to ELF binaries by modifying the binary's RPATH field. As a result, binaries no longer rely on the presence of their library dependencies in global locations (such as /lib), but use the libraries stored in isolation in the Nix store.
  • The GNU linker (the ld command) has been wrapped to transparently add the paths of all the library package to the RPATH field of the ELF binary, whenever a dynamic library is provided.

As a result, you can build most packages from source code by simply executing their standardized build procedures in a Nix builder environment, such as: ./configure --prefix=$out; make; make install.

When it is desired to deploy prebuilt binary packages with Nix then you may probably run into various kinds of challenges:

  • ELF executables require the presence of an ELF interpreter in /lib/ld-linux.so.2 (on x86) and /lib/ld-linux-x86-64.so.2 (on x86-64), which is impure and does not exist in NixOS.
  • ELF binaries produced by conventional means typically have no RPATH configured. As a result, they expect libraries to be present in global namespaces, such as /lib. Since these directories do not exist in NixOS an executable will typically fail to work.

To make prebuilt binaries work in NixOS, there are basically two solutions -- it is possible to compose so-called FHS user environments from a set of Nix packages in which shared components can be found in their "usual locations". The drawback is that it requires special privileges and additional work to compose such environments.

The preferred solution is to patch prebuilt ELF binaries with patchelf (e.g. appending the library dependencies to the RPATH of the executable) so that their dependencies are loaded from the Nix store. I wrote a guide that demonstrates how to do this for a number of relatively simple packages.

Although it is possible to patch prebuilt ELF binaries to make them run work from the Nix store, such a process is typically tedious and time consuming -- you must dissect a package, search for all relevant ELF binaries, figure out which libraries a binary requires, find the corresponding packages that provide them and then update the deployment instructions to patch the ELF binaries.

For small projects, a manual binary patching process is still somewhat manageable, but for a complex project such as the Android SDK, that provides a large collection of plugins containing a mix of many 32-bit and 64-bit executables, manual patching is quite labourious, in particular when it is desired to keep all plugins up to date -- plugin packages are updated quite frequently forcing the packager to re-examine all binaries over and over again.

To make the Android SDK patching process easier, I wrote a small tool that can mostly automate it. The tool can also be used for other kinds of binary packages.

Automatic searching for library locations


In order to make ELF binaries work, they must be patched in such a way that they use an ELF interpreter from the Nix store and their RPATH fields should contain all paths to the libraries that they require.

We can gather a list of required libraries for an executable, by running:

$ patchelf --print-needed ./zipmix
libm.so.6
libc.so.6

Instead of manually patching the executable with this provided information, we can also create a function that searches for the corresponding libraries in a list of search paths. The tool could take the first path that provides the required libraries.

For example, by setting the following colon-separated seach environment variable:

$ export libs=/nix/store/7y10kn6791h88vmykdrddb178pjid5bv-glibc-2.27/lib:/nix/store/xh42vn6irgl1cwhyzyq1a0jyd9aiwqnf-zlib-1.2.11/lib

The tool can automatically discover that the path: /nix/store/7y10kn6791h88vmykdrddb178pjid5bv-glibc-2.27/lib provides both libm.so.6 and libc.so.6.

We can also run into situations in which we cannot find any valid path to a required library -- in such cases, we can throw an error and notify the user.

It is also possible extend the searching approach to the ELF interpreter. The following command provides the path to the required ELF interpreter:

$ patchelf --print-interpreter ./zipmix
/lib64/ld-linux-x86-64.so.2

We can search in the list of library packages for the ELF interpreter as well so that we no longer have to explicitly specify it.

Dealing with multiple architectures


Another problem with the Android SDK is that plugin packages may provide both x86 and x86-64 binaries. You cannot link libraries compiled for x86 against an x86-64 executable and vice versa. This restriction could introduce a new kind of risk in the automatic patching process.

Fortunately, it is also possible to figure out for what kind of architecture a binary was compiled:

$ readelf -h ./zipmix
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64

The above command-line instruction shows that we have a 64-bit binary (Class: ELF64) compiled for the x86-64 architecture (Machine: Advanced Micro Devices X86-64)

I have also added a check that ensures that the tool will only add a library path to the RPATH if the architecture of the library is compatible with the binary. As a result, it is not possible to accidentally link a library with an incompatible architecture to a binary.

Patching collections of binaries


Another inconvenience is the fact that Android SDK plugins typically provide more than one binary that needs to be patched. We can also recursively search an entire directory for ELF binaries:

$ autopatchelf ./bin

The above command-line instruction recursively searches for binaries in the bin/ sub directory and automatically patches them.

Sometimes recursively patching executables in a directory hierarchy could have undesired side effects. For example, the Android SDK also provides emulators having their own set of ELF binaries that need to run in the emulator. Patching these binaries typically breaks the software running in the emulator. We can also disable recursion if this is desired:

$ autopatchelf --no-recurse ./bin

or revert to patching individual executables:

$ autopatchelf ./zipmix

The result


The result of having most aspects automated of a binary patching process results in a substantial reduction in code size for the Nix expressions that need to deploy prebuilt packages.

In my previous blog post, I have shown two example cases for which I manually derived the patchelf instructions that I need to run. By using the autopatchelf tool I can significantly decrease the size of the corresponding Nix expressions.

For example, the following expression deploys kzipmix:

{stdenv, fetchurl, autopatchelf, glibc}:

stdenv.mkDerivation {
  name = "kzipmix-20150319";
  src = fetchurl {
    url = http://static.jonof.id.au/dl/kenutils/kzipmix-20150319-linux.tar.gz;
    sha256 = "0fv3zxhmwc3p34larp2d6rwmf4cxxwi71nif4qm96firawzzsf94";
  };
  buildInputs = [ autopatchelf ];
  libs = stdenv.lib.makeLibraryPath [ glibc ];
  installPhase = ''
    ${if stdenv.system == "i686-linux" then "cd i686"
    else if stdenv.system == "x86_64-linux" then "cd x86_64"
    else throw "Unsupported system architecture: ${stdenv.system}"}
    mkdir -p $out/bin
    cp zipmix kzip $out/bin
    autopatchelf $out/bin
  '';
}

In the expression shown above, it suffices to simply move the executable to $out/bin and running autopatchelf.

I have also shown a more complicated example demonstrating how to patch the Quake 4 demo. I can significantly reduce the amount of code by substituting all the patchelf instructions by a single autopatchelf invocation:

{stdenv, fetchurl, glibc, SDL, xlibs}:

stdenv.mkDerivation {
  name = "quake4-demo-1.0";
  src = fetchurl {
    url = ftp://ftp.idsoftware.com/idstuff/quake4/demo/quake4-linux-1.0-demo.x86.run;
    sha256 = "0wxw2iw84x92qxjbl2kp5rn52p6k8kr67p4qrimlkl9dna69xrk9";
  };
  buildInputs = [ autopatchelf ];
  libs = stdenv.lib.makeLibraryPath [ glibc SDL xlibs.libX11 xlibs.libXext ];

  buildCommand = ''
    # Extract files from the installer
    cp $src quake4-linux-1.0-demo.x86.run
    bash ./quake4-linux-1.0-demo.x86.run --noexec --keep
    # Move extracted files into the Nix store
    mkdir -p $out/libexec
    mv quake4-linux-1.0-demo $out/libexec
    cd $out/libexec/quake4-linux-1.0-demo
    # Remove obsolete setup files
    rm -rf setup.data
    # Patch ELF binaries
    autopatchelf .
    # Remove libgcc_s.so.1 that conflicts with Mesa3D's libGL.so
    rm ./bin/Linux/x86/libgcc_s.so.1
    # Create wrappers for the executables and ensure that they are executable
    for i in q4ded quake4
    do
        mkdir -p $out/bin
        cat > $out/bin/$i <<EOF
    #! ${stdenv.shell} -e
    cd $out/libexec/quake4-linux-1.0-demo
    ./bin/Linux/x86/$i.x86 "\$@"
    EOF
        chmod +x $out/libexec/quake4-linux-1.0-demo/bin/Linux/x86/$i.x86
        chmod +x $out/bin/$i
    done
  '';
}

For the Android SDK, there is even a more substantial win in code size reductions. The following Nix expression is used to patch the Android build-tools plugin package:

{deployAndroidPackage, lib, package, os, autopatchelf, makeWrapper, pkgs, pkgs_i686}:

deployAndroidPackage {
  inherit package os;
  buildInputs = [ autopatchelf makeWrapper ];

  libs_x86_64 = lib.optionalString (os == "linux")
    (lib.makeLibraryPath [ pkgs.glibc pkgs.zlib pkgs.ncurses5 ]);
  libs_i386 = lib.optionalString (os == "linux")
    (lib.makeLibraryPath [ pkgs_i686.glibc pkgs_i686.zlib pkgs_i686.ncurses5 ]);

  patchInstructions = ''
    ${lib.optionalString (os == "linux") ''
      export libs_i386=$packageBaseDir/lib:$libs_i386
      export libs_x86_64=$packageBaseDir/lib64:$libs_x86_64
      autopatchelf $packageBaseDir/lib64 libs --no-recurse
      autopatchelf $packageBaseDir libs --no-recurse
    ''}

    wrapProgram $PWD/mainDexClasses \
      --prefix PATH : ${pkgs.jdk8}/bin
  '';
  noAuditTmpdir = true;
}

The above expression specifies the search libraries per architecture for x86 (i386) and x86_64 and automatically patches the binaries in the lib64/ sub folder and base directories. The autopatchelf tool ensures that no library of an incompatible architecture gets linked to a binary.

Discussion


The automated patching approach described in this blog post is not entirely a new idea -- in Nixpkgs, Aszlig Neusepoff created an autopatchelf hook that is integrated into the fixup phase of the stdenv.mkDerivation {} function. It shares a number of similar features -- it accepts a list of library packages (the runtimeDependencies environment variable) and automatically adds the provided runtime dependencies to the RPATH of all binaries in all the output folders.

There are also a number of differences -- my approach provides an autopatchelf command-line tool that can be invoked in any stage of a build process and provides full control over the patching process. It can also be used outside a Nix builder environment, which is useful for experimentation purposes. This increased level of flexibility is required for more complex prebuilt binary packages, such as the Android SDK and its plugins -- for some plugins, you cannot generalize the patching process and you typically require more control.

It also offers better support to cope with repositories providing binaries of multiple architectures -- while the Nixpkgs version has a check that prevents incompatible libraries from being linked, it does not allow you to have fine grained control over library paths to consider for each architecture.

Another difference between my implementation and the autopatchelf hook is that it works with colon separated library paths instead of white space delimited Nix store paths. The autopatchelf hook makes the assumption that a dependency (by convention) stores all shared libraries in the lib/ sub folder.

My implementation works with arbitrary library paths and arbitrary environment variables that you can specify as a parameter. To patch certain kinds of Android plugins, you must be able to refer to libraries that reside in unconventional locations in the same package. You can even use the LD_LIBRARY_PATH environment variable (that is typically used to dynamically load libraries from a set of locations) in conjunction with autopatchelf to make dynamic library references static.

There is also a use case that the autopatchelf command-line tool does not support -- the autopatchelf hook can also be used for source compiled projects whose executables may need to dynamically load dependencies via the dlopen() function call.

Dynamically loaded libraries are not known at link time (because they are not provided to the Nix-wrapped ld command), and as a result, they are not added to the RPATH of an executable. The Nixpkgs autopatchelf hook allows you to easily supplement the library paths of these dynamically loaded libraries after the build process completes.

Availability


The autopatchelf command-line tool can be found in the nix-patchtools repository. The goal of this repository to provide a collection of tools that help making the patching processes of complex prebuilt packages more convenient. In the future, I may identify more patterns and provide additional tooling to automate them.

autopatchelf is prominently used in my refactored version of the Android SDK to automatically patch all ELF binaries. I have the intention to integrate this new Android SDK implementation into Nixpkgs soon.

Follow up


UPDATE: In the meantime, I have been working with Aszlig, the author of the autopatchelf hook, to get the functionality I need for auto patching the Android SDK integrated in Nixpkgs.

The result is that the Nixpkgs' version now implements a number of similar features and is also capable of patching the Android SDK. The build-tools expression shown earlier, is now implemented as follows:

{deployAndroidPackage, lib, package, os, autoPatchelfHook, makeWrapper, pkgs, pkgs_i686}:

deployAndroidPackage {
  inherit package os;
  buildInputs = [ autoPatchelfHook makeWrapper ]
    ++ lib.optionalString (os == "linux") [ pkgs.glibc pkgs.zlib pkgs.ncurses5 pkgs_i686.glibc pkgs_i686.zlib pkgs_i686.ncurses5 ];

  patchInstructions = ''
    ${lib.optionalString (os == "linux") ''
      addAutoPatchelfSearchPath $packageBaseDir/lib
      addAutoPatchelfSearchPath $packageBaseDir/lib64
      autoPatchelf --no-recurse $packageBaseDir/lib64
      autoPatchelf --no-recurse $packageBaseDir
    ''}

    wrapProgram $PWD/mainDexClasses \
      --prefix PATH : ${pkgs.jdk8}/bin
  '';
  noAuditTmpdir = true;
}

In the above expression we do the following:

  • By adding the autoPatchelfHook package as a buildInput, we can invoke the autoPatchelf function in the builder environment and use it in any phase of a build process. To prevent the fixup hook from doing any work (that generalizes the patch process and makes the wrong assumptions for the Android SDK), the deployAndroidPackage function propagates the dontAutoPatchelf = true; parameter to the generic builder so that this fixup step will be skipped.
  • The autopatchelf hook uses the packages that are specified as buildInputs to find the libraries it needs, whereas my implementation uses libs, libs_i386 or libs_x86_64 (or any other environment variable that is specified as a command-line parameter). It is robust enough to skip incompatible libraries, e.g. x86 libraries for x86-64 executables.
  • My implementation works with colon separated library paths whereas autopatchelf hook works with Nix store paths making the assumption that there is a lib/ sub folder in which the libraries can be found that it needs. As a result, I no longer use the lib.makeLibraryPath function.
  • In some cases, we also want the autopatchelf hook to inspect non-standardized directories, such as uncommon directories in the same package. To make this work, we can add additional paths to the search cache by invoking the addAutoPatchelfSearchPath function.

Saturday, September 22, 2018

Creating Nix build function abstractions for pluggable SDKs

Two months ago, I decomposed the stdenv.mkDerivation {} function abstraction in the Nix packages collection that is basically the de-facto way in the Nix expression language to build software packages from source.

I identified some of its major concerns and developed my own implementation that is composed of layers in which each layer gradually adds a responsibility until it has most of the features that the upstream version also has.

In addition to providing a better separation of concerns, I also identified a pattern that I repeatedly use to create these abstraction layers:

{stdenv, foo, bar}:
{name, buildInputs ? [], ...}@args:

let
  extraArgs = removeAttrs args [ "name" "buildInputs" ];
in
stdenv.someBuildFunction ({
  name = "mypackage-"+name;
  buildInputs = [ foo bar ] ++ buildInputs;
} // extraArgs)

Build function abstractions that follow this pattern (as outlined in the code fragment shown above) have the following properties:

  • The outer function header (first line) specifies all common build-time dependencies required to build a project. For example, if we want to build a function abstraction for Python projects, then python is such a common build-time dependency.
  • The inner function header specifies all relevant build parameters and accepts an arbitrary number of arguments. Some arguments have a specific purpose for the kind of software project that we want to build (e.g. name and buildInputs) while other arguments can be passed verbatim to the build function abstraction that we use as a basis.
  • In the body, we invoke a function abstraction (quite frequently stdenv.mkDerivation {}) that builds the project. We use the build parameters that have a specific meaning to configure specialized build properties and we pass all remaining build parameters that are not conflicting verbatim to the build function that we use a basis.

    A subset of these arguments have no specific meaning and are simply exposed as environment variables in the builder environment.

    Because some parameters are already being used for a specific purpose and others may be incompatible with the build function that we invoke in the body, we compose a variable named: extraArgs in which we remove the conflicting arguments.

Aside from having a function that is tailored towards the needs of building a specific software project (such as a Python project), using this pattern provides the following additional benefits:

  • A build procedure is extendable/tweakable -- we can adjust the build procedure by adding or changing the build phases, and tweak them by providing build hooks (that execute arbitrary command-line instructions before or after the execution of a phase). This is particularly useful to build additional abstractions around it for more specialized deployment procedures.
  • Because an arbitrary number of arguments can be propagated (that can be exposed as environment variables in the build environment), we have more configuration flexibility.

The original objective of using this pattern is to create an abstraction function for GNU Make/GNU Autotools projects. However, this pattern can also be useful to create custom abstractions for other kinds of software projects, such as Python, Perl, Node.js etc. projects, that also have (mostly) standardized build procedures.

After completing the blog post about layered build function abstractions, I have been improving the Nix packages/projects that I maintain. In the process, I also identified a new kind of packaging scenario that is not yet covered by the pattern shown above.

Deploying SDKs


In the Nix packages collection, most build-time dependencies are fully functional software packages. Notable exceptions are so-called SDKs, such as the Android SDK -- the Android SDK "package" is only a minimal set of utilities (such as a plugin manager, AVD manager and monitor).

In order to build Android projects from source code and manage Android app installations, you need to install a variety of plugins, such as build-tools, platform-tools, platform SDKs and emulators.

Installing all plugins is typically a much too costly operation -- it requires you to download many gigabytes of data. In most cases, you only want to install a very small subset of them.

I have developed a function abstraction that makes it possible to deploy the Android SDK with a desired set of plugins, such as:

with import <nixpkgs> {};

let
  androidComposition = androidenv.composeAndroidPackages {
    toolsVersion = "25.2.5";
    platformToolsVersion = "27.0.1";
    buildToolsVersions = [ "27.0.3" ];
    includeEmulator = true;
    emulatorVersion = "27.2.0";
  };
in
androidComposition.androidsdk

When building the above expression (default.nix) with the following command-line instruction:

$ nix-build
/nix/store/zvailnl4f1261cn87s9n29lhj9i7y7iy-androidsdk

We get an Android SDK installation, with tools plugin version 25.2.5, platform-tools version 27.0.1, one instance of the build-tools (version 27.0.1) and an emulator of version 27.0.2. The Nix package manager will download the required plugins automatically.

Writing build function abstractions for SDKs


If you want to create function abstractions for software projects that depend on an SDK, you not only have to execute a build procedure, but you must also compose the SDK in such a way that all plugins are installed that a project requires. If any of the mandatory plugins are missing, the build will most likely fail.

As a result, the function interface must also provide parameters that allow you to configure the plugins in addition to the build parameters.

A very straight forward approach is to write a function whose interface contains both the plugin and build parameters, and propagates each of the required parameters to the SDK composition function, but manually writing this mapping has a number of drawbacks -- it duplicates functionality of the SDK composition function, it is tedious to write, and makes it very difficult to keep it consistent in case the SDK's functionality changes.

As a solution, I have extended the previously shown pattern with support for SDK deployments:

{composeMySDK, stdenv}:
{foo, bar, ...}@args:

let
  mySDKFormalArgs = builtins.functionArgs composeMySDK;
  mySDKArgs = builtins.intersectAttrs mySDKFormalArgs args;
  mySDK = composeMySDK mySDKArgs;
  extraArgs = removeAttrs args ([ "foo" "bar" ]
    ++ builtins.attrNames mySDKFormalArgs);
in
stdenv.mkDerivation ({
  buildInputs = [ mySDK ];
  buildPhase = ''
    ${mySDK}/bin/build
  '';
} // extraArgs)

In the above code fragment, we have added the following steps:

  • First, we dynamically extract the formal arguments of the function that composes the SDK (mySDKFormalArgs).
  • Then, we compute the intersection of the formal arguments of the composition function and the actual arguments from the build function arguments set (args). The resulting attribute set (mySDKArgs) are the actual arguments we need to propagate to the SDK composition function.
  • The next step is to deploy the SDK with all its plugins by propagating the SDK arguments set as function parameters to the SDK composition function (mySDK).
  • Finally, we remove the arguments that we have passed to the SDK composition function from the extra arguments set (extraArgs), because these parameters have no specific meaning for the build procedure.

With this pattern, the build abstraction function evolves automatically with the SDK composition function without requiring me to make any additional changes.

To build an Android project from source code, I can write an expression such as:

{androidenv}:

androidenv.buildApp {
  # Build parameters
  name = "MyFirstApp";
  src = ../../src/myfirstapp
  antFlags = "-Dtarget=android-16";

  # SDK composition parameters
  platformVersions = [ 16 ];
  toolsVersion = "25.2.5";
  platformToolsVersion = "27.0.1";
  buildToolsVersions = [ "27.0.3" ];
}

The expression shown above has the following properties:

  • The above function invocation propagates three build parameters: name referring to the name of the Nix package, src referring to a filesystem location that contains the source code of an Android project, and antFlags that contains command-line arguments that are passed to the Apache Ant build tool.
  • It propagates four SDK composition parameters: platformVersions referring to the platform SDKs that must be installed, toolsVersion to the version of the tools package, platformToolsVersion to the platform-tools package and buildToolsVersion to the build-tool packages.

By evaluating the above function invocation, the Android SDK with the plugins will be composed, and the corresponding SDK will be passed as a build input to the builder environment.

In the build environment, Apache Ant gets invoked build that builds the project from source code. The android.buildApp implementation will dynamically propagate the SDK composition parameters to the androidenv.composeAndroidPackages function.

Availability


The extended build function abstraction pattern described in this blog post is among the structural improvements I have been implementing in the mobile app building infrastructure in Nixpkgs. Currently, it is used in standalone test versions of the Nix android build environment, iOS build environment and Titanium build environment.

The Titanium SDK build function abstraction (a JavaScript-based cross-platform development framework that can produce Android, iOS, and several other kinds of applications from the same codebase) automatically composes both Xcode wrappers and Android SDKs to make the builds work.

The test repositories can be found on my GitHub page and the changes live in the nextgen branches. At some point, they will be reintegrated into the upstream Nixpkgs repository.

Besides mobile app development SDKs, this pattern is generic enough to be applied to other kinds of projects as well.