Monday, May 27, 2019

A Nix-friendly XML-based data exchange library for the Disnix toolset

In the last few months, I have been intensively working on a variety of internal improvements to the Disnix toolset.

One of the more increasingly complex and tedious aspects in the Disnix toolset is data exchange -- Disnix implements declarative deployment in the sense that it takes three specifications written in the Nix expression language as inputs: a services model that specifies the deployable units, their properties and how they depend on each other, an infrastructure model specifies the available target machines and their properties, and a distribution model that specifies the mapping between services in the services models and target machines in the infrastructure model.

From these three declarative models, Disnix derives all the activities that need to be carried out to get a system in a running state: building services from source, distributing services (and their dependencies) to target machines, activating the services, and (optionally) restoring state snapshots.

Using the Nix expression language for these input models is useful for a variety of reasons:

  • We can use the Nix package manager's build infrastructure to reliably build services from source code, including all their required dependencies, and store them in isolation in the Nix store. The Nix store ensures that multiple variants and versions can co-exist and that we can always roll back to previous versions.
  • Because the Nix expression language is a purely functional domain-specific language (that in addition to data structures supports functions), we can make all required configuration parameters (such as the dependencies of the services that we intend to deploy) explicit by using functions so that we know that all mandatory settings have been specified.

Although the Nix expression language is a first class-citizen concept for tasks carried out by the Nix package manager, we also want to use the same specifications to instruct tools that carry out activities that Nix does not implement, such as the tools that activate the services and restore state snapshots.

The Nix expression language is not designed to be used by other tools than Nix (as a sidenote: despite this limitation, it is still somewhat possible to use the Nix expression language independently of the package manager in experimental setups, such as this online tutorial, but the libexpr component of Nix does not have a stable interface or commitment to make the language portable across tools).

As a solution, I convert objects in the Nix expression language to XML, so that they can be consumed by any of the tools that implement the activities that Nix does not support.

Although this may sound conceptually straight forward, the amount of data that needs to be converted, and code that needs to be written to parse that data is growing bigger and more complex, and becomes increasingly harder to adjust and maintain.

To cope with this growing complexity, I have standardized a collection of Nix-XML conversion patterns, and wrote a library named: libnixxml that can be used to make data interchange in both directions more convenient.

Converting objects in the Nix expression language to XML


The Nix expression language supports a variety of language integrations. For example, it can export Nix objects to XML and JSON, and import from JSON and TOML data.

The following Nix attribute set:
{
  message = "This is a test";
  tags = [ "test" "example" ];
}

can be converted to XML (with the builtins.toXML primop) or by running:

$ nix-instantiate --eval-only --xml --strict example.nix

resulting in the following XML data:

<?xml version='1.0' encoding='utf-8'?>
<expr>
  <attrs>
    <attr column="3" line="2" name="message" path="/home/sander/example.nix">
      <string value="This is a test" />
    </attr>
    <attr column="3" line="3" name="tags" path="/home/sander/example.nix">
      <list>
        <string value="test" />
        <string value="example" />
      </list>
    </attr>
  </attrs>
</expr>

Although the above XML code fragment is valid XML, it is basically also just a literal translation of the underlying abstract syntax tree (AST) to XML.

An AST dump is not always very practical for consumption by an external application -- it is not very "readable", contains data that we do not always need (e.g. line and column data), and imposes (due to the structure) additional complexity on a program to parse the XML data to a domain model. As a result, exported XML data almost always needs to be converted to an XML format that is more practical for consumption.

For all the input models that Disnix consumes, I was originally handwriting XSL stylesheets converting the XML data to a format that can be more easily consumed and handwriting all the parsing code. Eventually, I derived a number of standard patterns.

For example, a more practical XML representation of the earlier shown Nix expression could be:

<?xml version="1.0"?>
<expr>
  <message>This is a test</message>
  <tags>
    <elem>test</elem>
    <elem>example</elem>
  </tags>
</expr>

In the above expression, the type and meta information is discarded. The attribute set is translated to a collection of XML sub elements in which the element names correspond to the attribute keys. The list elements are translated to generic sub elements (the above example uses elem, but any element name can be picked). The above notation is IMO, more readable, more concise and easier to parse by an external program.

Attribute keys may be identifiers, but can also be strings containing characters that invalidate certain XML element names (e.g. < or >). It is also possible to use a slightly more verbose notation in which a generic element name is used and the name property is used for each attribute set key:

<?xml version="1.0"?>
<expr>
  <attr name="message">This is a test</attr>
  <attr name="tags">
    <elem>test</elem>
    <elem>example</elem>
  </attr>
</expr>

When an application has a static domain model, it is not necessary to know any types (e.g. this conversion can be done in the application code using the application domain model). However, it may also be desired to construct data structures dynamically.

For dynamic object construction, type information needs to be known. Optionally, XML elements can be annotated with type information:

<?xml version="1.0"?>
<expr type="attrs">
  <attr name="message" type="string">This is a test</attr>
  <attr name="tags" type="list">
    <elem type="string">test</elem>
    <elem type="string">example</elem>
  </attr>
</expr>

To automatically convert data to XML format following the above listed conventions, I have created a standardized XSL stylesheet and command-line tool that can automatically convert Nix expressions.

The following command generates the first XML code fragment:

$ nixexpr2xml --attr-style simple example.nix

We can use the verbose notation for attribute sets, by running:

$ nixexpr2xml --attr-style verbose example.nix

Type annotations can be enabled by running:

$ nixexpr2xml --attr-style verbose --enable-types example.nix

The root, attribute and list element representations as well as the attribute set and types properties use generic element and property names. Their names can also be adjusted, if desired:

$ nixexpr2xml --root-element-name root \
  --list-element-name item \
  --attr-element-name property \
  --name-attribute-name key \
  --type-attribute-name mytype
  example.nix

Parsing a domain model


In addition to producing more "practical" XML data, I have also implemented utility functions that help me consuming the XML data to construct a domain model in the C programming language, that consists values (strings, integers etc.), structs, list-like data structures (e.g. arrays, linked lists) and table-like data structures, such as hash tables.

For example, the following XML document only containing a string:

<expr>hello</expr>

can be parsed to a string in C as follows:

#include <nixxml-parse.h>

xmlNodePtr element;
/* Open XML file and obtain root element */
xmlChar *value = NixXML_parse_value(element, NULL);
printf("value is: %s\n"); // value is: hello

We can also use functions to parse (nested) data structures. For example, to parse the following XML code fragment representing an attribute set:

<expr>
  <attr name="firstName">Sander</attr>
  <attr name="lastName">van der Burg</attr>
</expr>

We can use the following code snippet:

#include <stdlib.h>
#include <nixxml-parse.h>

xmlNodePtr element;

typedef struct
{
    xmlChar *firstName;
    xmlChar *lastName;
}
ExampleStruct;

void *create_example_struct(xmlNodePtr element, void *userdata)
{
    return calloc(1, sizeof(ExampleStruct));
}

void parse_and_insert_example_struct_member(xmlNodePtr element, void *table, const xmlChar *key, void *userdata)
{
    ExampleStruct *example = (ExampleStruct*)table;

    if(xmlStrcmp(key, (xmlChar*) "firstName") == 0)
        example->firstName = NixXML_parse_value(element, userdata);
    else if(xmlStrcmp(key, (xmlChar*) "lastName") == 0)
        example->lastName = NixXML_parse_value(element, userdata);
}

/* Open XML file and obtain root element */

ExampleStruct *example = NixXML_parse_verbose_heterogeneous_attrset(element, "attr", "name", NULL, create_example_struct, parse_and_insert_example_struct_member);

To parse the attribute set in the XML code fragment above (that uses a verbose notation) and derive a struct from it, we invoke the NixXML_parse_verbose_heterogeneous_attrset() function. The parameters specify that the XML code fragment should be parsed as follows:

  • It expects the name of the XML element of each attribute to be called: attr.
  • The property that refers to the name of the attribute is called: name.
  • To create a struct that stores the attributes in the XML file, the function: create_example_struct() will be executed that allocates memory for it and initializes all fields with NULL values.
  • The logic that parses the attribute values and assigns them to the struct members is in the parse_and_insert_example_member() function. The implementation uses NixXML_parse_value() (as shown in the previous example) to parse the attribute values.

In addition to parsing values as strings and attribute sets as structs, it is also possible to:

  • Parse lists, by invoking: NixXML_parse_list()
  • Parse uniformly typed attribute sets (in which every attribute set member has the same type), by invoking: NixXML_parse_verbose_attrset()
  • Parse attribute sets using the simple XML notation for attribute sets (as opposed to the verbose notation): NixXML_parse_simple_attrset() and NixXML_parse_simple_heterogeneous_attrset()

Printing Nix or XML representation of a domain model


In addition to parsing NixXML data to construct a domain model, the inverse process is also possible -- the API also provides convenience functions to print an XML or Nix representation of a domain model.

For example, the following string in C:

char *greeting = "Hello";

can be displayed as a string in the Nix expression language as follows:

#include <nixxml-print-nix.h>

NixXML_print_string_nix(stdout, greeting, 0, NULL); // outputs: "Hello"

or as an XML document, by running:

#include <nixxml-print-xml.h>

NixXML_print_open_root_tag(stdout, "expr");
NixXML_print_string_xml(stdout, greeting, 0, NULL, NULL);
NixXML_print_close_root_tag(stdout, "expr");

producing the following output:

<expr>Hello</expr>

The example struct shown in the previous section can be printed as a Nix expression with the following code:

#include <nixxml-print-nix.h>

void print_example_attributes_nix(FILE *file, const void *value, const int indent_level, void *userdata, NixXML_PrintValueFunc print_value)
{
    ExampleStruct *example = (ExampleStruct*)value;
    NixXML_print_attribute_nix(file, "firstName", example->firstName, indent_level, userdata, NixXML_print_string_nix);
    NixXML_print_attribute_nix(file, "lastName", example->lastName, indent_level, userdata, NixXML_print_string_nix);
}

NixXML_print_attrset_nix(stdout, &example, 0, NULL, print_example_attributes_nix, NULL);

The above code fragment executes the function: NixXML_print_attrset_nix() to print the example struct as an attribute set. The attribute set printing function invokes the function: print_example_attributes_nix() to print the attribute set members.

The print_example_attributes_nix() function prints each attribute assignment. It uses the NixXML_print_string_nix() function (shown in the previous example) to print each member as a string in the Nix expression language.

The result of running the above code is the following Nix expression:

{
  "firstName" = "Sander";
  "lastName" = "van der Burg";
}

the same struct can be printed as XML (using the verbose notation for attribute sets) with the following code:

#include <nixxml-print-xml.h>

void print_example_attributes_xml(FILE *file, const void *value, const char *child_element_name, const char *name_property_name, const int indent_level, const char *type_property_name, void *userdata, NixXML_PrintXMLValueFunc print_value)
{
    ExampleStruct *example = (ExampleStruct*)value;
    NixXML_print_verbose_attribute_xml(file, child_element_name, name_property_name, "firstName", example->firstName, indent_level, NULL, userdata, NixXML_print_string_xml);
    NixXML_print_verbose_attribute_xml(file, child_element_name, name_property_name, "lastName", example->lastName, indent_level, NULL, userdata, NixXML_print_string_xml);
}

NixXML_print_open_root_tag(stdout, "expr");
NixXML_print_verbose_attrset_xml(stdout, &example, "attr", "name", 0, NULL, NULL, print_example_attributes_xml, NULL);
NixXML_print_close_root_tag(stdout, "expr");

The above code fragment uses a similar strategy as the previous example (by invoking NixXML_print_verbose_attrset_xml()) to print the example struct as an XML file using a verbose notation for attribute sets.

The attribute set members are printed by the print_example_attributes_xml() function.

The result of running the above code is the following XML output:

<expr>
  <attr name="firstName">Sander</attr>
  <attr name="lastName">van der Burg</attr>
</expr>

In addition to printing values and attribute sets, it is also possible to:

  • Print lists in Nix and XML format: NixXML_print_list_nix(), NixXML_print_list_xml()
  • Print attribute sets in simple XML notation: NixXML_print_simple_attrset_xml()
  • Print strings as int, float or bool: NixXML_print_string_as_*_xml.
  • Print integers: NixXML_print_int_xml()
  • Disable indentation by setting the indent_level parameter to -1.
  • Print type annotated XML, by setting the type_property_name parameter to a string that is not NULL.

Using abstract data structures


There is no standardized library for abstract data structures in C, e.g. lists, maps, trees etc. As a result, each framework provides their own implementations of them. To parse lists and attribute sets (that have arbitrary structures), you need generalized data structures that are list-like or table-like.

libnixxml provides two sub libraries to demonstrate how integration with abstract data structures can be implemented. One sub library is called libnixxml-data that uses pointer arrays for lists and xmlHashTable for attribute sets, and another is called libnixxml-glib that integrates with GLib using GPtrArray structs for lists and GHashTables for attribute sets.

The following XML document:

<expr>
  <elem>test</elem>
  <elem>example</elem>
</expr@>

can be parsed as a pointer array (array of strings) as follows:

#include <nixxml-ptrarray.h>

xmlNodePtr element;
/* Open XML file and obtain root element */
void **array = NixXML_parse_ptr_array(element, "elem", NULL, NixXML_parse_value);

and printed as a Nix expression with:

NixXML_print_ptr_array_nix(stdout, array, 0, NULL, NixXML_print_string_nix);

and as XML with:

NixXML_print_open_root_tag(stdout, "expr");
NixXML_print_ptr_array_xml(stdout, array, "elem", 0, NULL, NULL, NixXML_print_string_xml);
NixXML_print_close_root_tag(stdout, "expr");

Similarly, there is a module that works with xmlHashTables providing a similar function interface as the pointer array module.

Working with generic NixXML nodes


By using generic data structures to represent lists and tables, type annotated NixXML data and a generic NixXML_Node struct (that indicates what kind of node we have, such as a value, list or attribute set) we can also automatically parse an entire document by using a single function call:

#include <nixxml-ptrarray.h>
#include <nixxml-xmlhashtable.h>
#include <nixxml-parse-generic.h>

xmlNodePtr element;
/* Open XML file and obtain root element */
NixXML_Node *node = NixXML_generic_parse_expr(element,
    "type",
    "name",
    NixXML_create_ptr_array,
    NixXML_create_xml_hash_table,
    NixXML_add_value_to_ptr_array,
    NixXML_insert_into_xml_hash_table,
    NixXML_finalize_ptr_array);

The above function composes a generic NixXML_Node object. The function interface uses function pointers to compose lists and tables. These functions are provided by the pointer array and xmlHashTable modules in the libnixxml-data library.

We can also print an entire NixXML_Node object structure as a Nix expression:

#include <nixxml-print-generic-nix.h>

NixXML_print_generic_expr_nix(stdout,
    node,
    0,
    NixXML_print_ptr_array_nix,
    NixXML_print_xml_hash_table_nix);

as well as XML (using simple or verbose notation for attribute sets):

#include <nixxml-print-generic-xml.h>

NixXML_print_generic_expr_verbose_xml(stdout,
    node,
    0,
    "expr",
    "elem",
    "attr",
    "name",
    "type",
    NixXML_print_ptr_array_xml,
    NixXML_print_xml_hash_table_verbose_xml);

Summary


The following table summarizes the concepts described in this blog post:

Concept Nix expression representation XML representation C application domain model
value "hello" hello char*
list [ "hello" "bye" ] <elem>hello</elem><elem>bye</elem> void**, linked list, ...
attribute set { a = "hello"; b = "bye"; } <a>hello</a><b>bye</b> xmlHashTablePtr, struct, ...
attribute set { a = "hello"; b = "bye"; } <attr name="a">hello</attr><attr name="b">bye</attr> xmlHashTablePtr, struct, ...

The above table shows the concepts that the NixXML defines, and how they can be represented in the Nix expression language, XML and in a domain model of a C application.

The representations of these concepts can be translated as follows:

  • To convert a raw AST XML representation of a Nix expression to NixXML, we can use the included XSL stylesheet or run the nixexpr2xml command.
  • XML concepts can be parsed to a domain model in a C application by invoking NixXML_parse_* functions for the appropriate concepts and XML representation.
  • Domain model elements can be printed as XML by invoking NixXML_print_*_xml functions.
  • Domain model elements can be printed in the Nix expression language by invoking NixXML_print_*_nix functions.

Benefits


I have re-engineered the current development versions of Disnix and the Dynamic Disnix toolsets to use libnixxml for data exchange. For Disnix, there is much fewer boilerplate code that I need to write for the parsing infrastructure, making it significantly easier to maintain it.

In the Dynamic Disnix framework, libnixxml provides even more benefits beyond a simpler parsing infrastructure. The Dynamic Disnix toolset provides deployment planning methods, documentation and visualization tools. These concerns are orthogonal to the features of the core Disnix toolset -- there is first-class Nix/Disnix integration, but the features of Dynamic Disnix should work with any service-oriented system (having a model that works with services and dependencies) regardless of what technology is used to carry out the deployment process itself.

With libnixxml it is now quite easy to make all these tools both accept Nix and XML representations of their input models, and make them output data in both Nix and XML. It is now also possible to use most features of Dynamic Disnix, such as the visualization features described in the previous blog post, independently of Nix and Disnix.

Moreover, the deployment planning methods should now also be able to more conveniently invoke external tools, such as SAT-solvers.

Related work


libnixxml is not the only Nix language integration facility I wrote. I also wrote NiJS (that is JavaScript-based) and PNDP (that is PHP-based). Aside from the language (C programming language), the purpose of libnixxml is not to replicate the functionality of these two libraries in C.

Basically, libnixxml has the inverse purpose -- NiJS and PNDP are useful for systems that already have a domain model (e.g. a domain-specific configuration management tool), and make it possible to generate the required Nix expression language code to conveniently integrate with Nix.

In libnixxml, the Nix expression representation is the basis and libnixxml makes it more convenient for external programs to consume such a Nix expression. Moreover, libnixxml only facilitates data interchange, and not all Nix expression language features.

Conclusion


In this blog post, I have described libnixxml that makes XML-based data interchange with configurations in the Nix expression language and domain models in the C programming language more convenient. It is part of the current development version of Disnix and can be obtained as a separate project from my GitHub page.

No comments:

Post a Comment