Thursday, December 31, 2020

Annual blog reflection over 2020

In my previous blog post that I wrote yesterday, I celebrated my blog's 10th anniversary and did a reflection over the last decade. However, I did not elaborate much about 2020.

Because 2020 is a year for the history books, I have decided to also do an annual reflection over the last year (similar to my previous annual blog reflections).

A summary of blog posts written in 2020


Nearly all of the blog posts that I have written this year were in service of only two major goals: developing the Nix process management framework and implementing service container support in Disnix.

Both of them took a substantial amount of development effort. Much more than I initially anticipated.

Investigating process management


I already started working on this topic last year. In November 2019, I wrote a blog post about packaging sysvinit scripts and a Nix-based functional organization for configuring process instances, that could potentially also be applied to other process management solutions, such as systemd and supervisord.

After building my first version of a framework (which was already a substantial leap in reaching my full objective), I thought it would not take me that much time to get all details that I originally planned finished. It turns out that I heavily underestimated the complexity.

To test my framework, I needed a simple test program that could daemonize on its own, which was (and still is) a common practice for running services on Linux and many other UNIX-like operating systems.

I thought writing such a tool that daemonizes would be easy, but after some research, I discovered that it is actually quite complicated to do it properly. I wrote a blog post about my findings.

It took me roughly three months to finish the first implementation of the process manager-agnostic abstraction layer that makes it possible to write a high-level specification of a running process, that could universally target all kinds of process managers, such as sysvinit, systemd, supervisord and launchd.

After completing the abstraction layer, I also discovered that a sufficiently high-level deployment specification of running processes could also target other kinds deployment solutions.

I have developed two additional backends for the Nix process management framework: one that uses Docker and another using Disnix. Both solutions are technically not qualified as process managers, but they can still be used as such by only using a limited set of features of these tools.

To be able to develop the Docker backend, I needed to dive deep into the underlying concepts of Docker. I wrote a blog post about the relevant Docker deployment concepts, and also gave a presentation about it at Mendix.

While implementing more examples, I also realized that to more securely run long-running services, they typically need to run as unprivileged users. To get predictable results, these unprivileged users require stable user IDs and group IDs.

Several years ago, I have already worked on a port assigner tool that could already assign unique TCP/UDP port numbers to services, so that multiple instances can co-exist.

I have extended the port assigner tool to assign arbitrary numeric IDs to generically solve this problem. In turns out that implementing this tool was much more difficult than expected -- the Dynamic Disnix toolset was originally developed under very high time pressure and had a substantial amount of technical debt.

In order to implement the numeric ID assigner tool, I needed to revise the model parsing libraries, that broke the implementations of some of the deployment planner algorithms.

To fix them, I was forced to study how these algorithms worked again. I wrote a blog post about the graph-based deployment planning algorithms and a new implementation that should be better maintainable. Retrospectively, I wish I did my homework better at the time when I wrote the original implementation.

In September, I gave a talk about the Nix process management framework at NixCon 2020, that was held online.

I pretty much reached all my objectives that I initially set for the Nix process management framework, but there is still some leftover work to bring it at an acceptable usability level -- to be able to more easily add new backends (somebody gave me s6 as an option) the transformation layer needs to be standardized.

Moreover, I still need to develop a test strategy for services so that you can be (reasonably) sure that they work with a variety of process managers and under a variety of conditions (e.g. unprivileged user deployments).

Exposing services as containers in Disnix


Disnix is a Nix-based distributed service deployment tool. Services can basically be any kind of deployment unit whose life-cycle can be managed by a companion tool called Dysnomia.

There is one practical problem though, in order to deploy a service-oriented system with Disnix, it typically requires the presence of already deployed containers (not be confused with Linux containers), that are environments in which services are managed by another service.

Some examples of container providers and corresponding services are:

  • The MySQL DBMS (as a container) and multiple hosted MySQL databases (as services)
  • Apache Tomcat (as a container) and multiple hosted Java web applications (as services)
  • systemd (as a container) and multiple hosted systemd unit configuration files (as services)

Disnix deploys the services (as described above), but not the containers. These need to be deployed by other means first.

In the past, I have been working on solutions that manage the underlying infrastructure of services as well (I typically used to call this problem domain: infrastructure deployment). For example, NixOps can deploy a network of NixOS machines that also expose container services that can be used by Disnix. It is also possible to deploy the containers as services, in a separate deployment layer managed by Disnix.

When the Nix process management framework became more usable, I wanted to make the deployment of container providers also a more accessible feature. I heavily revised Disnix with a new feature that makes it possible to expose services as container providers, making it possible to deploy both the container services and application services from a single deployment model.

To make this feature work reliably, I was again forced to revise the model transformation pipeline. This time I concluded that the lack of references in the Nix expression language was an impediment.

Another nice feature by combining the Nix process management framework and Disnix is that you can more easily deploy a heterogeneous system locally.

I have released a new version of Disnix: version 0.10, that provides all these new features.

The Monitoring playground


Besides working on the two major topics shown above, the only other thing I did was a Mendix crafting project in which I developed a monitoring playground, allowing me to locally experiment with alerting scripts, including the visualization and testing.

Some thoughts


From a blogging perspective, I am happy what I have accomplished this year -- not only have I managed to reach my usual level of productivity again (last year was somewhat disappointing), I also managed to both develop a working Nix process management framework (making it possible to use all kinds of process managers), and use Disnix to deploy both container and application services. Both of these features are on my wish list for many years.

In the Nix community, having the ability to also use other process managers than systemd, is something we have been discussing already since late 2014.

However, there are also two major things that kept me mentally occupied in the last year.

Open source work


Many blog posts are about the open source work I do. Some of my open source work is done as part of my day time job as a software engineer -- sometimes we can write a new useful feature, make an extension to an existing project that may come in handy, or do something as part of an investigation/learning project.

However, the majority of my open source work is done in my spare time -- in many cases, my motivation is not as altruistic as people may think: typically I need something to solve my own problems or there is some technical concept that I would like to explore. However, I still do a substantial amount of work to help other people or for the "greater good".

Open source projects are typically quite satisfying the work on, but they also have negative aspects (typically the negative aspects are negligible in the early stages of a project). Sadly as projects get more popular and gain more exposure, the negativity attached to them also grows.

For example, although I got quite a few positive reactions on my work on the Nix process management framework (especially at NixCon 2020), I know that not everybody is or will be happy about it.

I have worked with people in the past, who consider this kind of work a complete waste of time -- in their opinion, we already have Kubernetes that has already solved all relevant service management problems (some people even think it is a solution to all problems).

I have to admit that, while Kubernetes can be used to solve similar kind of problems (and what is not supported as a first-class feature, can still be scripted in ad-hoc ways), there is still much to think about:

  • As explained in my blog post about Docker concepts, the Nix store typically supports much more efficient sharing of common dependencies between packages than layered Docker images, resulting in much lower disk space consumption and RAM consumption of processes that have common dependencies.
  • Docker containers support the deployment of so-called microservices because of a common denominator: processes. Almost all modern operating systems and programming-languages have a notion of processes.

    As a consequence, lots of systems nowadays typically get constructed in such a way that they can be easily decomposed into processes (translating to container instances), imposing substantial overhead on each process instance (because these containers typically need to embed common sub services).

    Services can also be more efficiently deployed (in terms of storage and RAM usage) as units by managed by a common runtime (e.g. multiple Java web applications managed by Apache Tomcat or multple PHP applications managed by the Apache HTTP server).

    The latter form of reuse is now slowly disappearing, because it does not fit nicely in a container model. In Disnix, this form of reuse is a first-class concept.
  • Microservices managed by Docker (somewhat) support technology diversity, because of the fact that all major programming languages support the concept of processes.

    However, one particular kind of technology that you cannot choose is the the operating system -- Docker/Kubernetes relies on non-standardized Linux-only concepts.

    I have also been thinking about the option to pick your operating system as well: you need security, then pick: OpenBSD, you want performance, then pick: Linux etc. The Nix process management framework allows you to also target process managers on different operating systems than Linux, such as BSD rc scripts and Apple's launchd.

I personally believe that these goals are still important, and that keeps me motivated to work on it.

Furthermore, I also believe that it is important to have multiple implementations of tools that solve the same or similar kind of problems -- in the open source world, there are lot of "battles" between communities about which technology should be the standard for a certain problem.

My favourite example of such a battle is the system's process manager -- many Linux distributions nowadays have adopted systemd, but this is not without any controversy, such as in the Debian project.

It took them many years to come to the decision to adopt it, and still there are people who want to discuss "init system diversity". Likewise, there are people who find the systemd-adoption decision unacceptable, and have forked Debian into Devuan, providing a Debian-based distribution without systemd.

With the Nix process management framework the fact that systemd exists (and may not be everybody's first choice) is not a big deal -- you can actually switch to other solutions, if desired. A battle between service managers is not required. A sufficiently high-level specification of a well understood problem allows you to target multiple solutions.

Another problem I face is that these two projects are not the only projects I have been working on or maintain. There are many other projects I have been working on the past.

Sadly, I am also a very bad multitasker. If there are problems reported with my other projects, and the fix is straight forward, or there is a straight forward pull request, then it is typically no big deal to respond.

However, I also learned that some for some of the problems other people face, there is no quick fix. Sometimes I get pull requests that partially solves a problem, or in other cases: fix a specific problem, but breaks others features. These pull requests cannot always be directly accepted and also need a substantial amount of my time for reviewing.

For certain kinds of reported problems, I need to work on a fundamental revision that requires a substantial amount of development effort -- however, it is impossible to pick up such a task while working on another "major project".

Alternatively, I need to make the decision to abandon what I am currently working on and make the switch. However, this option also does not have my preference because I know it will significantly delay my original goal.

I have noticed that lots of people get dissatisfied and frustrated, including myself. Moreover, I also consider it a bad thing to feel pressure on the things I am working on in my spare time.

So what to do about it? Maybe I can write a separate blog post on this subject.

Anyway, I was not planning to abandon or stop anything. Eventually, I will pick up these other problems as well -- my strategy for now, is to do it when I am ready. People simply have to wait (so if you are reading this and waiting for something: yes, I will pick it up eventually, just be patient).

The COVID-19 crisis


The other subject that kept me (and pretty much everybody in the world) busy is the COVID-19 crisis.

I still remember the beginning of 2020 -- for me personally, it started out very well. I visited some friends that I have not seen in a long time, and then FOSDEM came, the Free and Open Source Developers European Meeting.

Already in January, I heard about this new virus that was rapidly spreading in the Wuhan region on the news. At that time, nobody in the Netherlands (or in Europe) was really worried yet. Even to questions, such as: "what will happen when it reaches Europe?", people typically responded with: "ah yes, well influenza has a impact on people too, it will not be worse!".

A few weeks later, it started to spread to countries close to Europe. The first problematic country I heard about was Iran, and a couple of weeks later it reached Italy. In Italy, it spread so rapidly that within only a few weeks, the intensive care capacity was completely drained, forcing medical personnel to make choices who could be helped and who could not.

By then, it sounded pretty serious to me. Furthermore, I was already quite sure that it was only a matter of time before it would reach the Netherlands. And indeed, at the end of February, the first COVID-19 case was reported. Apparently this person contracted the virus in Italy.

Then the spreading went quickly -- every day, more and more COVID-19 cases were reported and this amount grew exponentially. Similar to other countries, we also slowly ran into capacity problems in hospitals (materials, equipment, personnel, intensive care capacity etc.). In particular, the intensive care capacity reached at a very critical level. Fortunately, there were hospitals in Germany willing to help us out.

In March, a country-wide lockdown was announced -- basically all group activities were forbidden, schools and non-essential shops were closed, and everybody who is capable of working from should work from home. As a consequence, since March, I have been permanently working from home.

As with pretty much everybody in the world, COVID-19 has negative consequences for me as well. Fortunately, I have not much to complain about -- I did not get unemployed, I did not get sick, and also nobody in my direct neighbourhood ran into any serious problems.

The biggest implication of the COVID-19 pandemic for me is social contacts -- despite the lockdown I still regularly meet up with family and frequent acquaintances, but I have barely met any new people. For example, at Mendix, I typically came in contact with all kinds of people in the company, especially those that do not work in the same team.

Moreover, I also learned that quite a few of my contacts got isolated because of all group activities that were canceled -- for example I did not have any music rehearsals in a while, causing me not to see or speak to any of my friends there.

Same thing with conferences and meet ups -- because most of them were canceled or turned into online events, it is very difficult to have good interactions with new people.

I also did not do any traveling -- my summer holiday was basically a staycation. Fortunately, in the summer, we have managed to minimize the amount of infections, making it possible to open up public places. I visited some touristic places in the Netherlands, that are normally crowded by people from abroad. That by itself was quite interesting -- I normally tend to neglect national touristic sites.

Although the COVID-19 pandemic brought all kinds of negative things, there were also a couple of things that I consider a good thing:

  • At Mendix, we have an open office space that typically tends to be very crowded and noisy. It is not that I cannot work in such an environment, but I also realize that I do appreciate silence, especially for programming tasks that require concentration. At home, it is quiet, I have much fewer distractions and I also typically feel much less tired after a busy work day.
  • I also typically used to neglect home improvements a lot. The COVID-19 crisis helped me to finally prioritize some non-urgent home improvements tasks -- for example, on the attic, where my musical instruments are stored, I finally took the time to organize everything in such a way that I can rehearse conveniently.
  • Besides the fact that rehearsals and concerts were cancelled, I actually practiced a lot -- I even studied many advanced solo pieces that I have not looked at in years. Playing music became a standard activity between my programming tasks, to clear my mind. Normally, I would use this time to talk to people at the coffee machine in the office.
  • During busy times I also used to tend to neglect house keeping tasks a lot. I still remember (many years ago) when I just moved into my first house, doing the dishes was already a problem (I had no dish washer at that time). When working from home, it is not a problem to keep everything tidy.
  • It is also much easier to maintain healthy daily habits. In the first lockdown (that was in spring), cycling/walking/running was a daily routine that I could maintain with ease.

In the Netherlands, we have managed to overcome the first lockdown in just a matter of weeks by social distancing. Sadly, after the restrictions were relaxed we got sloppy and at the end of the summer the infection rate started to grow. We also ran into all kinds of problems to mitigate the infections -- limited test capacity, people who got tired of all the countermeasures not following the rules, illegal parties etc.

Since a couple of weeks we are in our second lockdown with a comparable level of strictness -- again, the schools and non-essentials shops are closed etc. The second lockdown feels a lot worse than the first -- now it is in the winter, people are no longer motivated (the amount of people that revolt in the Netherlands have grown substantially, including people spreading news that everything is a Hoax and/or a one big scheme organized by left-wing politicians) and it is already taking much longer than the first.

Fortunately, there is a tiny light at the end of the tunnel. In Europe, one vaccine (the Pfizer vaccine) has been approved and more submissions are pending (with good results). By Monday, the authorities will start to vaccinate people in the Netherlands.

If we can keep the infection rates and the mutations under control (such as the mutation that appeared in England) then we will eventually build up the required group immunity to finally get the situation under control (this probably is going to take many more months, but at least it is a start).

Conclusion


This elaborate reflection blog post (that is considerably longer than all my previous yearly reflections combined) reflects over 2020 that is probably a year that will no go unnoticed in the history books.

I hope everybody remains in good health and stays motivated to do what it is needed to get the virus under control.

Moreover, when the crisis is over, I also hope we can retain the positive things learned in this crisis, such as making it more a habit to allow people to work (at least partially) from home. The open-source complaint in this blog post is just a minor inconvenience compared to the COVID-19 crisis and the impact that it has on many people in the world.

The final thing I would like to say is:


HAPPY NEW YEAR!!!!!!!!!!!!

No comments:

Post a Comment