Sander van der Burg's blog: December 2015

Wednesday, December 30, 2015

Fifth yearly blog reflection

Today, it's my blog's fifth anniversary. As usual, this is a nice opportunity to reflect over last year's writings.

Disnix

Something that I cannot leave unmentioned is Disnix, a toolset that I have developed as part of my master's and PhD research. For quite some time, its development was progressing at a very low pace, mainly because I had other obligations -- I had to finish my PhD thesis, and after I left academia, I was working on other kinds of aspects.

Fortunately, things have changed considerably. Since October last year I have been actively using Disnix to maintain the deployment of a production system that can be decomposed into independently deployable services. As a result, the development of Disnix also became much more progressive, which resulted in a large number of Disnix related blog posts and some major improvements.

In the first blog post, I compared Disnix with another tool from the Nix project: NixOps, described their differences and demonstrated that they can be combined to fully automate all deployment aspects of a service-oriented system. Shortly after publishing this blog post, I announced the next Disnix release: Disnix 0.3, 4 years after its previous release.

A few months later, I announced yet another Disnix release: Disnix 0.4 in which I have integrated the majority of state deployment facilities from the prototype described in the HotSWUp 2012 paper.

The remainder of blog posts provide solutions for additional problems and describe some optimizations. I have formulated a port assignment problem which may manifest itself while deploying microservices and developed a tool that can be used to provide a solution. I also modified Disnix to deploy target-specific services (in addition to target-agnostic services), which in some scenarios, make deployments more efficient.

Another optimization that I have developed is on demand activation and self termination of services. This is particularly useful for poorly developed services, that for example, leak memory.

Finally, I have attended NixCon2015 where I gave a talk about Disnix (including two live demos) and shown how it can be used to deploy (micro)services. An interesting aspect of the presentation is the first live demo in which I deploy a simple example system into a network of heterogeneous machines (machines running multiple operating systems, having multiple CPU architectures, reachable by multiple connection protocols).

The Nix project

In addition to Disnix, I have also written about some general Nix aspects. In February, I have visited FOSDEM. In this year's edition, we had a NixOS stand to promote the project (including its sub projects). From my own personal experience, I know that advertising Nix is quite challenging. For this event, I crafted a sales pitch explanation recipe, that worked quite well for me in most cases.

A blog post that I am particularly proud of is my evaluation and comparison of Snappy Ubuntu with Nix/NixOS, in which I describe the deployment properties of Snappy and compare how they conceptually relate to Nix/NixOS. It attracted a huge amount of visitors breaking my old monthly visitors record from three years ago!

I also wrote a tutorial blog post demonstrating how we can deploy prebuilt binaries with the Nix package manager. In some cases, packaging prebuilt software can be quite challenging, and the purpose of this blog post to show a number techniques that can be used to accomplish this.

Methodology

Besides deployment, I have also written two methodology related blog posts. In the first blog post, I have described my experiences with Agile software development and Scrum. Something that has been bothering me for quite a while is these people claiming that "implementing" such a methodology considerably improves development and quality of software.

In my opinion this is ridiculous! These methodologies provide some structure, but the "secret" lies in its undefined parts -- to be agile you should accept that nothing will completely go as planned, you should remain focussed, take small steps (not huge leaps), and most importantly: continuously adapt and improve. But no methodology provides a universally applicable recipe that makes you successful in doing it.

However, despite being critical, I think that implementing a methodology is not bad per se. In another blog post, I have described how I implemented a basic software configuration management process in a small organization.

Development

I have also reflected over my experiences while developing command-line utilities and wrote a blog post with some considerations I take into account.

Side projects and research

In my previous reflections, there was always a section dedicated to research and side projects. Unfortunately, this year there is not much to report about -- I made a number of small changes and additions to my side projects, but I did not made any significant advancements.

Probably the fact that Disnix became a main and side project contributes to that. Moreover, I also have other stuff to do that has nothing to do with software development or research. I hope that I can find more time next year to report about my other side projects, but I guess this is basically just a luxury problem. :-)

Blog posts

As with my previous annual blog reflections, I will also publish the top 10 of my most frequently read blog posts:

On Nix and GNU Guix. As with the previous three blog reflections, this blog post remains on top. However, its popularity finally seems to be challenged by the number two!
An evaluation and comparison of Snappy Ubuntu. This is the only blog post I have written this year that ended up in the overall top 10. It attracted a record number of visitors in one month and now rivals the number one in popularity.
An alternative explanation of the Nix package manager. This was last year's number two and dropped to the third place, because of the Snappy Ubuntu blog post.
Setting up a multi-user Nix installation on non-NixOS systems. This blog post was also in last year's top 10 but it seems to have become even more popular. I think this is probably caused by the fact that it is still hard to set up a multi-user installation.
Managing private Nix packages outside the Nixpkgs tree. I wrote this blog for newcomers and observed that people keep frequently consulting it. As a consequence, it has entered the overall top 10.
Asynchronous programming with JavaScript. This blog post was also in last year's top 10 and became slightly more popular. As a result, it moved to the 6th position.
Yet another blog post about Object Oriented Programming and JavaScript. Another JavaScript related blog post that was in last year's top 10. It became slightly more popular and moved to the 7th place.
Composing FHS-compatible chroot environments with Nix (or deploying Steam in NixOS). This blog post was the third most popular last year, but now seems to be not that interesting anymore.
Setting up a Hydra build cluster for continuous integration and testing (part 1). Remains a popular blog post, but also considerably dropped in popularity compared to last year.
Using Nix while doing development. A very popular blog post last year, but considerably dropped in popularity.

Conclusion

I am still not out of ideas yet, so stay tuned! The remaining thing I want to say is:

HAPPY NEW YEAR!!!!!!!!!!!

Friday, December 4, 2015

On-demand service activation and self termination

I have written quite a few blog posts on service deployment with Disnix this year. The deployment mechanics that Disnix implements work quite well for my own purposes.

Unfortunately, having a relatively good deployment solution does not necessarily mean that a system functions well in a production environment -- there are also many other concerns that must be dealt with.

Another important concern of service-oriented systems is dealing with resource consumption, such as RAM, CPU and disk space. Obviously, services need them to accomplish something. However, since they are typically long running, they also consume resources even if they are not doing any work.

These problems could become quite severe if services have been poorly developed. For example, they may leak memory and never fully release the RAM they have allocated. As a result, an entire machine may eventually run out of memory. Moreover, "idle" services may degrade the performance of other services running on the same machine.

There are various ways to deal with resource problems:

The most obvious solution is buying bigger or additional hardware resources, but this typically increases the costs of maintaining a production environment. Moreover, it does not take the source of some of the problems away.
Another solution would be to fix and optimize problematic services, but this could be a time consuming and costly process, in particular when there is a high technical debt.
A third solution would be to support on-demand service activation and self termination -- a service gets activated the first time it is consulted and terminates itself after a period of idleness.

In this blog post, I will describe how to implement and deploy a system supporting the last solution.

To accomplish this goal, we need to modify the implementations of the services -- we must retrieve an incoming connection from the host system's service manager that activates a service when a client connects and self terminate when the moment is right.

Furthermore, we need to adapt a service's deployment procedure to use these facilities.

Retrieving a socket from the host system's service manager

In many conventional deployment scenarios, the services themselves are responsible for creating the sockets to which clients can connect. However, if we want to activate them on-demand this property conflicts -- the socket must already exist before the process runs, so that it can be started when a client connects.

We can use a service manager that supports socket activation to accomplish on-demand activation. There are various solutions supporting this property. The most prominently advertised solution is probably systemd, but there are other solutions that can do this as well, such as launchd, inetd, or xinetd, albeit the protocols that activated processes must implement differ.

In one of my toy example systems used for testing Disnix (the TCP proxy example) I used to do the following:

static int create_server_socket(int source_port)
{
    int sockfd, on = 1;
    struct sockaddr_in client_addr;
        
    /* Create socket */
    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if(sockfd < 0)
    {
        fprintf(stderr, "Error creating server socket!\n");
        return -1;
    }    

    /* Create address struct */
    memset(&client_addr, '\0', sizeof(client_addr));
    client_addr.sin_family = AF_INET;
    client_addr.sin_addr.s_addr = htonl(INADDR_ANY);
    client_addr.sin_port = htons(source_port);
        
    /* Set socket options to reuse the address */
    setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &on, 4);
      
    /* Bind the name (ip address) to the socket */
    if(bind(sockfd, (struct sockaddr *)&client_addr, sizeof(client_addr)) < 0)
        fprintf(stderr, "Error binding on port: %d, %s\n", source_port, strerror(errno));
        
    /* Listen for connections on the socket */
    if(listen(sockfd, 5) < 0)
        fprintf(stderr, "Error listening on port %d\n", source_port);

    /* Return the socket file descriptor */
    return sockfd;
}

The function listed above is responsible for creating a socket file descriptor, binding the socket to an IP address and TCP port, and listening for incoming connections.

To support on-demand activation, I need to modify this function to retrieve the server socket from the service manager. Systemd's socket activation protocol works by passing the socket as the third file descriptor to the process that it spawns. By adjusting the previously listed code into the following:

static int create_server_socket(int source_port)
{
    int sockfd, on = 1;

#ifdef SYSTEMD_SOCKET_ACTIVATION
    int n = sd_listen_fds(0);
    
    if(n > 1)
    {
        fprintf(stderr, "Too many file descriptors received!\n");
        return -1;
    }
    else if(n == 1)
        sockfd = SD_LISTEN_FDS_START + 0;
    else
    {
#endif
        struct sockaddr_in client_addr;
        
        /* Create socket */
        sockfd = socket(AF_INET, SOCK_STREAM, 0);
        if(sockfd < 0)
        {
            fprintf(stderr, "Error creating server socket!\n");
            return -1;
        }
        
        /* Create address struct */
        memset(&client_addr, '\0', sizeof(client_addr));
        client_addr.sin_family = AF_INET;
        client_addr.sin_addr.s_addr = htonl(INADDR_ANY);
        client_addr.sin_port = htons(source_port);
        
        /* Set socket options to reuse the address */
        setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &on, 4);
        
        /* Bind the name (ip address) to the socket */
        if(bind(sockfd, (struct sockaddr *)&client_addr, sizeof(client_addr)) < 0)
            fprintf(stderr, "Error binding on port: %d, %s\n", source_port, strerror(errno));
        
        /* Listen for connections on the socket */
        if(listen(sockfd, 5) < 0)
            fprintf(stderr, "Error listening on port %d\n", source_port);

#ifdef SYSTEMD_SOCKET_ACTIVATION
    }
#endif

    /* Return the socket file descriptor */
    return sockfd;
}

the server will use the socket that has been created by systemd (and passed as a third file descriptor). Moreover, if the server is started as a standalone process, it will revert to its old behaviour and allocates the server socket itself.

I have wrapped the systemd specific functionality inside a conditional preprocessor block so that it only gets included when I explicitly ask for it. The downside of supporting systemd's socket activation protocol is that we require some functionality that is exposed by a shared library that has been bundled with systemd. As systemd is Linux (and glibc) specific, it makes no sense to build a service with this functionality enabled on non-systemd based Linux distributions and non-Linux operating systems.

Besides conditionally including the code, I also made linking against the systemd library conditional in the Makefile:

CC = gcc

ifeq ($(SYSTEMD_SOCKET_ACTIVATION),1)
    EXTRA_BUILDFLAGS=-DSYSTEMD_SOCKET_ACTIVATION=1 $(shell pkg-config --cflags --libs libsystemd)
endif

all:
 $(CC) $(EXTRA_BUILDFLAGS) hello-world-server.c -o hello-world-server

...

so that the systemd-specific code block and library only get included if I run 'make' with socket activation explicitly enabled:

$ make SYSTEMD_SOCKET_ACTIVATION=1

Implementing self termination

As with on-demand activation, there is no way to do self termination generically and we must modify the service to support this property in some way.

In the TCP proxy example, I have implemented a simple approach using a counter (that is initially set to 0):

volatile unsigned int num_of_connections = 0;

For each client that connects to the server, we fork a child process that handles the connection. Each time we fork, I also raise the connection counter in the parent process:

while(TRUE)
{
    /* Create client socket if there is an incoming connection */
    if((client_sockfd = wait_for_connection(server_sockfd)) >= 0)
    {
        /* Fork a new process for each incoming client */
        pid_t pid = fork();
     
        if(pid == 0)
        {
            /* Handle the client's request and terminate
             * when it disconnects */
        }
        else if(pid == -1)
            fprintf(stderr, "Cannot fork connection handling process!\n");
#ifdef SELF_TERMINATION
        else
            num_of_connections++;
#endif
    }

    close(client_sockfd);
    client_sockfd = -1;
}

(As with socket activation, I have wrapped the termination functionality in a conditional preprocessor block -- it makes no sense to include this functionality into a service that cannot be activated on demand).

When a client disconnects, the process handling its connection terminates and sends a SIGCHLD signal to the parent. We can configure a signal handler for this type of signal as follows:

#ifdef SELF_TERMINATION
    signal(SIGCHLD, sigreap);
#endif

and use the corresponding signal handler function to decrease the counter and wait for the client process to terminate:

#ifdef SELF_TERMINATION

void sigreap(int sig)
{
    pid_t pid;
    int status;
    num_of_connections--;
    
    /* Event handler when a child terminates */
    signal(SIGCHLD, sigreap);
    
    /* Wait until all child processes terminate */
    while((pid = waitpid(-1, &status, WNOHANG)) > 0);

Finally, the server can terminate itself when the counter has reached 0 (which means that it is not handling any connections and the server has become idle):

    if(num_of_connections == 0)
        _exit(0);
}
#endif

Deploying services with on demand activation and self termination enabled

Besides implementing socket activation and self termination, we must also deploy the server with these features enabled. When using Disnix as a deployment system, we can write the following service expression to accomplish this:

{stdenv, pkgconfig, systemd}:
{port, enableSystemdSocketActivation ? false}:

let
  makeFlags = "PREFIX=$out port=${toString port}${stdenv.lib.optionalString enableSystemdSocketActivation " SYSTEMD_SOCKET_ACTIVATION=1"}";
in
stdenv.mkDerivation {
  name = "hello-world-server";
  src = ../../../services/hello-world-server;
  buildInputs = if enableSystemdSocketActivation then [ pkgconfig systemd ] else [];
  buildPhase = "make ${makeFlags}";
  installPhase = ''
    make ${makeFlags} install
    
    mkdir -p $out/etc
    cat > $out/etc/process_config <<EOF
    container_process=$out/bin/process
    EOF
    
    ${stdenv.lib.optionalString enableSystemdSocketActivation ''
      mkdir -p $out/etc
      cat > $out/etc/socket <<EOF
      [Unit]
      Description=Hello world server socket
      
      [Socket]
      ListenStream=${toString port}
      EOF
    ''}
  '';
}

In the expression shown above, we do the following:

We make the socket activation and self termination features configurable by exposing it as a function parameter (that defaults to false disabling it).
If the socket activation parameter has been enabled, we pass the SYSTEMD_SOCKET_ACTIVATION=1 flag to 'make' so that these facilities are enabled in the build system.
We must also provide two extra dependencies: pkgconfig and systemd to allow the program to find the required library functions to retrieve the socket from systemd.
We also compose a systemd socket unit file that configures systemd on the target system to allocate a server socket that activates the process when a client connects to it.

Modifying Dysnomia modules to support socket activation

As explained in an older blog post, Disnix consults a plugin system called Dysnomia that takes care of executing various kinds of deployment activities, such as activating and deactivating services. The reason that a plugin system is used, is because services can be any kind of deployment unit with no generic activation procedure.

For services of the 'process' and 'wrapper' type, Dysnomia integrates with the host system's service manager. To support systemd's socket activation feature, we must modify the corresponding Dysnomia modules to start the socket unit instead of the service unit on activation. For example:

$ systemctl start disnix-53bb1pl...-hello-world-server.socket

starts the socket unit, which in turn starts the service unit with the same name when a client connects to it.

To deactivate the service, we must first stop the socket unit and then the service unit:

$ systemctl stop disnix-53bb1pl...-hello-world-server.socket
$ systemctl stop disnix-53bb1pl...-hello-world-server.service

Discussion

In this blog post, I have described an on-demand service activation and self termination approach using systemd, Disnix, and a number of code modifications. Some benefits of this approach are that we can save system resources such as RAM and CPU, improve the performance of non-idle services running on a same machine, and reduce the impact of poorly implemented services that (for example) leak memory.

There are also some disadvantages. For example, connecting to an inactive service introduces latency, in particular when a service has a slow start up procedure making it less suitable for systems that must remain responsive.

Moreover, it does not cope with potential disk space issues -- a non-running service still consumes disk space for storing its package dependencies and persistent state, such as databases.

Finally, there are some practical notes on the solutions described in the blog post. The self termination procedure in the example program terminates the server immediately after it has discovered that there are no active connections. In practice, it may be better to implement a timeout to prevent unnecessary latencies.

Furthermore, I have only experimented with systemd's socket activation features. However, it is also possible to modify the Dysnomia modules to support different kinds of activation protocols, such as the ones provided by launchd, inetd or xinetd.

The TCP proxy example uses C as an implementation language, but systemd's socket activation protocol is not limited to C programs. For instance, an example program on GitHub demonstrates how a Python program running an embedded HTTP server can be activated with systemd's socket activation mechanism.

References

I have modified the development version of Dysnomia to support the socket activation feature of systemd. Moreover, I have extended the TCP proxy example package with a sub example that implements the on-demand activation and self termination approach described in this blog post.

Both packages can be obtained from my GitHub page.