Thursday, January 31, 2019

A minimalistic discovery and architecture documentation process

In a blog post written a couple of years ago, I have described how to set up a basic configuration management process in a small organization that is based on the process framework described in the IEEE 828-2012 configuration management standard. The most important prerequisite for setting up such a process is identifying all configuration items (CIs) and storing them in a well-organized repository.

There are many ways to organize configuration items ranging from simple to very sophisticated solutions. I used a very small set of free and open source tools, and a couple of simple conventions to set up a CI repository:

  • A Git repository with an hierarchical directory structure referring to configurations items. Each path component in the directory structure serves a specific purpose to group configuration items. The overall strategy was to use a directory structure with a maximum of three levels: environment/machine/application. Using Git makes it possible to version configuration items and share the repository with team members.
  • Using markdown to write down the purposes of the configuration items and descriptions how they can be reproduced. Markdown works well for two reasons: it can be nicely formatted in a browser, but also read from a terminal when logged in to remote servers via SSH.
  • Using Dia for drawing diagrams of systems consisting of more complex applications components. Dia is not the most elegant program around, but it works well enough, it is free and open source, and supported on Linux, Windows and macOS.

My main motivation to formalize configuration management (but only lightly), despite being in a small organization, is to prevent errors and minimize delays and disruptions while remaining flexible by not being bound to all kinds of complex management procedures.

I wrote this blog post while I was still employed at a small-sized startup company with only one development team. In the meantime, I have joined a much bigger organization (Mendix) that has many cross-disciplinary development teams that work concurrently on various aspects of our service and product portfolio.

About microservices


When I just joined, the amount of information I had to absorb was quite overwhelming. I also learned that we heavily adopted the microservices architecture paradigm for our entire online service platform.

According to Martin Fowler's blog post on microservices, using microservices offers the following benefits:

  • Strong module boundaries. You can divide the functionality of a system into microservices and make separate teams responsible for the development of each service. This makes it possible to iterate faster and offer better quality because teams can focus on themselves on a subset of features only.
  • Independent deployment. Microservices can be deployed independently making it possible to ship features when they are done, without having complex integration cycles.
  • Technology diversity. Microservices are language and technology agnostic. You can pick almost any programming language (e.g. Java, Python, Mendix, Go), data storage solution (e.g. PostgreSQL, MongoDB, InfluxDB) or operating system (e.g. Linux, FreeBSD, Windows) to implement a microservice making it possible pick the most suitable combination of technologies and use them at their full advantage.

However, decomposing a system into a collection of collaborating services also comes at a (sometimes substantial!) price:

  • There is typically much more operational complexity. Because there are many components and typically a large infrastructure to manage, activities such as deploying, upgrading, and monitoring the condition of a system is much more time consuming and complex. Furthermore, because of technology diversity, there are also many kinds of specialized deployment procedures that you need to carry out.
  • Data is eventually consistent. You have to live with the fact that (temporary) inconsistencies could end up in your data, and you must invest in implementing facilities that keep your data is consistent.
  • Because of distribution development is harder in general -- it is more difficult to diagnose errors (e.g. a failure in one service could trigger a chain reaction of errors, without having proper error traces), it is harder to test a system because of additional deployment complexity. The network links between services may be slow and subject to failure, causing all kinds of unpredictable problems. Also machines that host critical services may crash.

Studying the architecture


When applied properly, e.g. functionality is well separated and there is strong cohesion and weak coupling between services, while investing in solutions to cope with the challenges listed above, the benefits of microservices can be reaped, resulting in a scalable systems that can be developed my multiple teams working on features concurrently.

However, an important prerequisite for making changes in such an environment, and maintaining or improving the quality properties of a system, requires discipline and a relatively good understanding of the environment -- in the beginning, I faced all kinds of practical problems when I wanted to make even a subtle change -- some areas of our platform where documented, while others were not. Some documentation was also outdated, slightly incomplete and sometimes inconsistent with the actual implementation.

Certain areas of our platform were also highly complex resulting in very complex architectural views, with many boxes and arrows. Furthermore, information was also scattered around many different places.

As part of my on-boarding process, and as a means to cope with some of my practical problems, I have created a documentation repository of the platform that our team develops by extending the (minimalistic) principles for configuration management described in the earlier blog post.

I realized that simply identifying the service components of which the system consists, is not enough to get an understanding of the system -- there are many items and complex details that need to be captured.

In addition to the identification of all configuration items, I also want:

  • Proper guidance. To understand a particular piece of functionality, I should not need to study every component in detail. Instead, I want to know the full context and only the details of the relevant components.
  • Completeness. I want all service components to be visible. I do not want any details to be covered up. For example, I have also seen quite a few diagrams that hide complex implementation details. I much rather want flaws to be visible so that they can be resolved at a later point in time.
  • Clear boundaries. Our platform is not self contained, but relies on services provided by other teams. I want to know what components are our responsibility and what is managed by external teams.
  • Clarity. I want to know what the purpose of a component is. Their names may not always necessarily reflect or explain what they do.
  • Consistency. No matter how nicely a diagram is drawn, it should match the actual implementation or it is of very little use.
  • References to the actual implementation. I also want to know where I can find the implementation of a component, such as its Git repository.

Documenting the architecture


To visualize the architecture of our platform and organize all relevant information, I followed a strategy:

  • I took the components (typically their source code repositories) as the basis for everything else -- every component translates to a box in the architecture diagram.
  • I analyzed the dependency relationships between the components and denoted them as arrows. When a box points to another box by means of an arrow, this means that the other box is a dependency that should be deployed first. When a dependency is absent, the service will (most likely) not work.
  • I also discovered that the platform diagram easily gets cluttered by the sheer amount of components -- I decided to combine components that have very strongly correlated functionality in feature groups (that have dashed borders). Every feature group in architecture diagrams refers to another sub architecture diagram that provides a more specialized view of the feature group.
  • To clearly illustrate the difference between components that are our responsibility and those that are maintained by others teams, I make all external dependencies visible in the top-level architecture diagram.

The notation I used for these diagrams is not something I have entirely invented from scratch -- it is inspired by graph theory, package management and service management concepts. Disnix, for example, can visualize deployment architectures by using a similar notation.

To find all relevant information to create the diagrams, I consulted various sources:

  • I studied existing documents and diagrams to get a basic understanding of the system and an idea of the details I should look at.
  • I talked to a variety of people from various teams.
  • I looked inside the configuration settings of all deployment solutions used, e.g. the Amazon AWS console, Docker, CloudFoundry, Kubernetes, Nix configuration files.
  • Peek inside the source code repositories and look for settings that are references to other systems, such as configuration values that store URLs.
  • When I am in doubt: I consider the deployment configuration files and source code the "ultimate source of truth", because no matter how nice a diagram looks, it is useless if it is implemented differently.

Finally, just drawing diagrams will not completely suffice when the goal is provide clarity. I also observed that I need to document some leftover details.

Foremost, having a diagram without the semantics not explained will typically leave too many details open to interpretation to the user, so you need to explain the notation.

Second, you need to provide additional details about the services. I typically enumerate the following properties in a table for every component:

  • The name of the component.
  • A one line description stating its purpose.
  • The type of project (e.g. a Python/Java/Go project, Docker container, AWS Lambda function, etc.). This is useful to determine the kind of deployment procedure for the component.
  • A reference to the source code repository, e.g. a Git repository. The README of the corresponding repository should provide more detailed information about the project.

Benefits


Although it is quite a bit of work to set up, having a well documented architecture provides us the following benefits:

  • More effective deployment. Because of the feature groups and dividing the architecture into multiple layers, general concepts and details are separated. This makes it easier for developers to focus and absorb the right detailed knowledge to change a service.
  • More consensus in the team about the structure of the system and general quality attributes, such as scalability and security.
  • Better on-boarding for new team members.

Discussion


Writing architecture documentation IMO is not rocket science, just discipline. Obviously, there are much more sophisticated tools available to organize and visualize architectures (even tools that can generate code and reverse engineer code), but this is IMO not a hard requirement to start documenting.

However, you can not take all confusion away -- even if you have the best possible architecture documentation, people's thinking habits are shaped by the concepts they know and there will always be a slight mismatch (which is documented in academic research: 'Why Is It So Hard to Define Software Architecture?' written by Jason Baragry and Karl Reed).

Finally, architecture documentation is only a first good step to improve the quality of service-oriented systems. To make it a success, much more is needed, such as:

  • Automated (and reproducible) deployment processes.
  • More documentation (such as the APIs, end-user documentation, project documentation).
  • Automated unit, integration and acceptance testing.
  • Monitoring.
  • Measuring and improving code quality, test coverage, etc.
  • Using design patterns, architectural patterns, good programming abstractions.
  • And many more aspects.

But to do these things properly, having proper architecture documentation is an important prerequisite.

Related work


UPDATE: after publishing this blog post and giving an internal presentation at Mendix about this subject, I received a couple of questions about architectural approaches that share similarities with my approach, such as the C4 model.

The C4 model also uses a layered approach in which the top-level diagram displays the context of the system (the relation of the system with external users and systems), and deeper layers gradually reveal more details of the inner workings of the system while limiting the view to a subset of components to prevent details obfuscating the purpose.

I did not use this approach as an example reference, but my work is basically built on top of the same underlying principles that the C4 model builds on -- creating abstractions.

Creating abstractions in modeling is popularized already in the 70s by various computer scientists, such as Edward Yourdon and Tom DeMarco, who bring concepts from structural programming to other domains, such as modeling (as explained in the paper: 'The Choice of New Software Development Methodologies for Software Development Projects' by Edward Yourdon).

One of the mental aids in structured programming is abstraction, so that it "only matters what something does, disregarding how it works". I took data flow modeling (DFD) as an example technique (that also facilitates layers of abstractions including a top-level context DFD), but I replaced the data flow notation by dependency modeling.

Furthermore, the C4 model also provides a number of diagram types for each abstraction layer with specific purposes, but in my approach the notation and purposes of each layer are left abstract.