Sander van der Burg's blog: Diagnosing problems and running maintenance tasks in a network with services deployed by Disnix

I have been maintaining a production system with Disnix for quite some time. Although deployment works quite conveniently for me (I may probably be a bit biased, since I created Disnix :-) ), you cannot get around unforeseen incidents and problems, such as:

Crashing processes due to bugs or excessive load.
Database problems, such as inconsistencies in the data.

Errors in distributed systems are typically much more difficult to debug than single machine system failures. For example, tracing the origins of an error in distributed systems is generally hard -- one service's fault may be caused by a message propagated by another service residing on a different machine in the network.

But even if you know the origins of an error (e.g. you can clearly observe that a web application is crashing or a database connection), you may face other kinds of challenges:

You have to figure out to which machine in the network a service has been deployed.
You have to connect to the machine, e.g. through an SSH connection, to run debugging tasks.
You have to know the configuration properties of a service to diagnose it -- in Disnix, as explained in earlier blog posts, services can take any form -- they can be web services, but also web applications, databases and processes.

Because of these challenges, diagnosing errors and running maintenance tasks in a system deployed by Disnix is always unnecessarily time-consuming and inconvenient.

To alleviate this burden, I have developed a small tool and extension that establishes remote shell connections with environments providing all relevant configuration properties. Furthermore, the tool gives suggestions to the end-user explaining what kinds of maintenance tasks he could carry out.

The shell activity of Dysnomia

As explained in previous Disnix-related blog posts, Disnix carries out all activities to deploy a service oriented system to a network machines (i.e. to bring it in a running state), such as building services from source code, distributing their intra-dependency closures to the target machines, and activating or deactivating every service.

For the build and distribution activities, Disnix uses, as its name implies, the Nix package manager because it offers a number of powerful properties, such as strong reproducibility guarantees and atomic upgrades and rollbacks.

For the remaining activities that Nix does not support, e.g. activating or deactivating services, Disnix uses a companion tool called Dysnomia. Because services in a Disnix context could take any form, there is no generic means to activate or deactivate them -- for this reason, Dysnomia provides a plugin system with modules that carry out specific activities for a specific service type.

One of the plugins that Dysnomia provides is the deployment of MySQL databases to a MySQL DBMS server. Dysnomia deployment activities are driven by two kinds of configuration specifications. A component configuration defines the properties of a deployable unit, such as a MySQL database:

create table author
( AUTHOR_ID  INTEGER       NOT NULL,
  FirstName  VARCHAR(255)  NOT NULL,
  LastName   VARCHAR(255)  NOT NULL,
  PRIMARY KEY(AUTHOR_ID)
);

create table books
( ISBN       VARCHAR(255)  NOT NULL,
  Title      VARCHAR(255)  NOT NULL,
  AUTHOR_ID  VARCHAR(255)  NOT NULL,
  PRIMARY KEY(ISBN),
  FOREIGN KEY(AUTHOR_ID) references author(AUTHOR_ID) on update cascade on delete cascade
);

The above configuration is a MySQL script (~/testdb) that creates the database schema consisting of two tables.

The container configuration captures properties of the environment in which the component should be hosted, which is in this particular case, a MySQL DBMS server:

type=mysql-database
mysqlUsername=root
mysqlPassword=verysecret

The above component configuration (~/mysql-production) defines the type stating that mysql-database plugin must be used, and provides the authentication credentials required to connect to the DBMS server.

The Dysnomia plugin for MySQL implements various kinds of deployment activities for MySQL databases. For example, the activation activity is implemented as follows:

...

case "$1" in
    activate)
        # Initalize the given schema if the database does not exists
        if [ "$(echo "show databases" | @mysql@ --user=$mysqlUsername --password=$mysqlPassword -N | grep -x $componentName)" = "" ]
        then
            ( echo "create database $componentName;"
              echo "use $componentName;"
              
              if [ -d $2/mysql-databases ]
              then
                  cat $2/mysql-databases/*.sql
              fi
            ) | @mysql@ $socketArg --user=$mysqlUsername --password=$mysqlPassword -N
        fi
        markComponentAsActive
    ;;

    ...
esac

The above code fragment checks whether a database with the given schema exists and if it does not, it will create it by running the database initialization script provided by the component configuration. As may also be observed, the above activity uses the container properties (such as the authentication credentials) as environment variables.

Dysnomia activities can be executed by invoking the dysnomia command-line tool. For example, the following command will activate the MySQL database in the MySQL database server:

$ dysnomia --operation activate \
  --component ~/testdb --container ~/mysql-production

To make the execution of arbitrary tasks more convenient, I have created a new Dysnomia option called: shell. The shell operation is basically an activity that does not execute anything, but instead spawns a shell session that provides the container configuration properties as environment variables.

Moreover, the shell activity of a Dysnomia plugin typically displays suggestions for shell commands that the user may want to carry out.

For example, when we run the following command:

$ dysnomia --shell \
  --component ~/testdb --container ~/mysql-production

Dysnomia spawns a shell session that shows the following:

This is a shell session that can be used to control the 'staff' MySQL database.

Module specific environment variables:
mysqlUsername  Username of the account that has the privileges to administer
               the database
mysqlPassword  Password of the above account
mysqlSocket    Path to the UNIX domain socket that is used to connect to the
               server (optional)

Some useful commands:
/nix/store/h0kcf5g2ssyancr9m2i8sr09b3wq2zy0-mariadb-10.1.28/bin/mysql  --user=$mysqlUsername --password=$mysqlPassword staff Start a MySQL interactive terminal

General environment variables:
this_dysnomia_module     Path to the Dysnomia module
this_component           Path to the mutable component
this_container           Path to the container configuration file

[dysnomia-shell:~]#

By executing the command-line suggestion shown above in the above shell session, we get a MySQL interactive terminal allowing us to execute arbitrary SQL commands. It saves us the burden looking up all the MySQL configuration properties, such as the authentication credentials and the database name.

The Dysnomia shell feature is heavily inspired by nix-shell that works in quite a similar way -- it will take the build dependencies of a package build as inputs (which typically manifest themselves as environment variables) and fetches the sources, but it will not execute the package build procedure. Instead, it spawns an interactive shell session allowing the user to execute arbitrary build tasks. This Nix feature is particularly useful for development projects.

Diagnosing services with Disnix

In addition to extending Dysnomia with the shell feature, I have also extended Disnix to make this feature available in a distributed context.

The following command can be executed to spawn a shell for a particular service of the ridiculous staff tracker example (that happens to be a MySQL database):

$ disnix-diagnose -S staff
[test2]: Connecting to service: /nix/store/yazjd3hcb9ds160cq03z66y5crbxiwq0-staff deployed to container: mysql-database
This is a shell session that can be used to control the 'staff' MySQL database.

Module specific environment variables:
mysqlUsername  Username of the account that has the privileges to administer
               the database
mysqlPassword  Password of the above account
mysqlSocket    Path to the UNIX domain socket that is used to connect to the
               server (optional)

Some useful commands:
/nix/store/h0kcf5g2ssyancr9m2i8sr09b3wq2zy0-mariadb-10.1.28/bin/mysql  --user=$mysqlUsername --password=$mysqlPassword staff Start a MySQL interactive terminal

General environment variables:
this_dysnomia_module     Path to the Dysnomia module
this_component           Path to the mutable component
this_container           Path to the container configuration file

[dysnomia-shell:~]#

The above command-line instruction will lookup the location of the staff database in the configuration of the system that is currently deployed, connects to it (typically through SSH) and spawns a Dysnomia shell for the given service type.

In addition to an interactive shell, you can also directly run shell commands. For example, the following command will query all the staff records:

$ disnix-diagnose -S staff \
  --command 'echo "select * from staff" | mysql -u $mysqlUsername -p $mysqlPassword staff'

In most cases, only one instance of a service exists, but Disnix can also deploy redundant instances of the same service. For example, we may want to deploy two redundant instances of the web application front end in the distribution.nix configuration file:

stafftracker = [ infrastructure.test1 infrastructure.test2 ];

When trying to spawn a Dysnomia shell, the tool returns an error because it does not know to which instance to connect to:

$ disnix-diagnose -S stafftracker
Multiple mappings found! Please specify a --target and, optionally, a
--container parameter! Alternatively, you can execute commands for all possible
service mappings by providing a --command parameter.

This service has been mapped to:

container: apache-webapplication, target: test1
container: apache-webapplication, target: test2

In this case, we must refine our query with a --target parameter. For example, the following command connects to the web front-end on the test1 machine:

$ disnix-diagnose -S stafftracker --target test1

It is still possible to execute remote shell commands for redundantly deployed services. For example, the following command gets executed twice, because we have two instances deployed:

$ disnix-diagnose -S stafftracker \
  --command 'echo I will see this message two times!'

In some cases, you may want to execute other kinds of maintenance tasks or you simply want to know where a particular service resides. This can be done by running the following command:

$ disnix-diagnose -S stafftracker --show-mappings
This service has been mapped to:

container: apache-webapplication, target: test1
container: apache-webapplication, target: test2

Conclusion

In this blog post, I have described a new feature of Dysnomia and Disnix that spawns interactive shell sessions making problem solving and maintenance tasks more convenient.

disnix-diagnose and the shell extension are part of the development versions of Disnix and Dysnomia and will become available in the next release.

Sander van der Burg's blog

Wednesday, January 31, 2018

Diagnosing problems and running maintenance tasks in a network with services deployed by Disnix

The shell activity of Dysnomia

Diagnosing services with Disnix

Conclusion

No comments:

Post a Comment