Tuesday, October 19, 2021

Using my Commodore Amiga 500 in 2021


Due to the high number of new COVID-19 infections in my home country last summer, I had to "improvise" yet another summer holiday. As a result, I finally found the time to tinker with my old computers again after a very long time of inactivity.

As I have explained in two blog posts that I wrote over ten years ago, the first computer (a Commodore 128 bought by my parents in 1985) and second computer (Commodore Amiga 500 bought by my parents in 1992) that I ever used, are still in my possession.

In the last few years, I have used the Commodore 128 a couple of times, but I have not touched the Commodore Amiga 500 since I wrote my blog post about it ten years ago.

It turns out that the Commodore Amiga 500 still works, but I ran into a number of problems:

  • A black and white display. I used to have a monitor, but it broke down in 1997. Since then, I have been using Genlock device to attach the Amiga to a TV screen. Unfortunately, in 2021 the Genlock device no longer seems to work.

    The only display option I had left is to attach the Amiga to a TV with an RCA to SCART cable by using the monochrome video output. The downside is that it is only capable of displaying a black and white screen.
  • No secondary disk drive. I used to have two 3.5-inch double density disk drives: an internal disk drive (inside the case) and an external disk drive that you can attach to the disk drive port.

    The external disk drive still seems to respond when I insert a floppy disk (the led blinks), but it no longer seems to be capable of reading any disks.
  • Bad hard drive and expansion slot problems. The expansion board (that contains the hard drive) seems to give me all kinds of problems.

    Sometimes the Amiga completely fails to detect it. In other occasions, I ran into crashes causing the filesystem to return me write errors. Attempting to repair them typically results in new errors.

    After thoroughly examining the disk with DiskSalv, I learned that the drive has physical damage and needs to be replaced.

I also ran into an interesting problem from a user point of view -- exchanging data to and from my Amiga (such as software downloaded from the Internet and programs that I used to write) is quite a challenge. In late 1996, when my parents switched to the PC, I used floppy disks to exchange data.

In 2021, floppy drives have completely disappeared from all modern computers. In the rare occasion that I still need to read a floppy disk, I have an external USB floppy drive at my disposal, but it is only capable of reading high density 3.5-inch floppy disks. A Commodore Amiga's standard floppy drive (with the exception of the Amiga 4000) is only capable of reading double density disks.

Fortunately, I have discovered that there are still many things possible with old machines. I brought both my Commodore 128 and Commodore 500 to the Home Computer Museum in Helmond for repairs. Furthermore, I have ordered all kinds of replacement peripherals.

Getting it all to work, turned out to be quite a challenge. Eventually, I have managed to overcome all my problems and the machine works like a charm again.

In this blog post, I will describe what problems I faced and how I solved them.

Some interesting properties of the Amiga


I often receive many questions from all kinds of people who want to know why it is so interesting to use such an old machine. Aside from nostalgic reasons, I think the machine is an interesting piece of computer history. At the time the first model was launched: the Amiga 1000 in 1985, the machine was far ahead of its time and provided unique multimedia capabilities.

Back in the late 80s, system resources were very limited (such as CPU, RAM and storage) compared to modern machines, but there were all kinds of interesting facilities to overcome their design limitations.

For example, the original Amiga 500 model only had 512 KiB of RAM and 32 configurable color registers. Colors can be picked out of a range of 4096 possible colors.

Despite only having the ability to configure a maximum 32 distinct colors, it could still display photo-realistic images:


As can be seen, the screen shot above clearly has more than 32 distinct colors. This is made possible by using a special screen mode called Hold-and-Modify (HAM).

In HAM mode, a pixel's color can be picked from a palette of 16 base colors, or a color component (red, green or blue) of the adjacent pixel can be changed. The HAM screen mode makes it possible to use all possible 4096 colors, albeit with some restrictions on the adjacent color values.

Another unique selling point of the Amiga were its sound capabilities. It could mix 4 audio channels in hardware, and easily combined with graphics, animations and games. The Amiga has all kinds of interesting music productivity software, such as ProTracker, that I used a lot.

To make all these multimedia features possible, the Amiga has its own unique hardware architecture:


The above diagram provides a simplified view of the most important chips in the Amiga 500 and how they are connected:

  • On the left, the CPU is shown: a Motorola 68000 that runs at approximately 7 MHz (the actual clock speeds differ somewhat on a PAL and NTSC display). The CPU is responsible for doing calculations and executing programs.
  • On the right, the unique Amiga chips are shown. Each of them has a specific purpose:
    • Denise (Display ENabler) is responsible for producing the RGB signal for the display, provides bitplane registers for storing graphics data, and is responsible for displaying sprites.
    • Agnus (Address GeNerator UnitS) provides a blitter (that is responsible for quick transfers of data in chip memory, typically graphics data), and a copper: a programmable co-processor that is aligned with the video beam.

      The copper makes all kinds of interesting graphical features possible, while keeping the CPU free for work. For example, the following screenshot of the game Trolls:


      clearly contains more than 32 distinct colors. For example, the rainbow-like background provides a unique color on each scanline. The copper is used in such a way that the value of the background color register is changed on each scanline, while the screen is drawn.

      The copper also makes it possible to switch between screen modes (low resolution, high resolution) on the same physical display, such as in the Workbench:


      As can be seen in the above screenshot, the upper part of the screen shows Deluxe Paint in low-res mode with its own unique set of colors, while the lower part shows the workbench in high resolution mode (with a different color palette). The copper can change the display properties while the screen is rendered, while keeping the CPU free to do work.
    • Paula is a multi-functional chip that provides sound support, such as processing sample data from memory and mixing 4 audio channels. Because it does mixing in hardware, the CPU is still free to do work.

      It also controls the disk drive, serial port, mouse and joysticks.
  • All the chips in the above diagram require access to memory. Chip RAM is memory that is shared between all chips. As a consequence, they share the same memory bus.

    A shared bus imposes speed restrictions -- on even clock cycles the CPU can access chip memory, while on the uneven cycles the chips have memory access.

    Many Amiga programs are optimized in such a way that all CPU's memory access operations are at even clock cycles as much as possible. When the CPU needs to access memory on uneven clock cycles, it is forced to wait, losing execution speed.
  • An Amiga can also be extended with Fast RAM that does not suffer from any speed limitations. Fast RAM is on a different memory bus that can only be accessed by the CPU and not by any of the chips.

    (As a sidenote: there is also Slow RAM that is not shown in the diagram. It falls in between chip and fast RAM. Slow RAM is memory that is exclusive to the CPU, but cannot be used on uneven clock cycles).

Compared to other computer architectures used at the same time, such as the PC, 7 MHz of CPU clock speed does not sound all that impressive, but the combination of all these autonomous chips working together is what makes many incredible multimedia properties possible.

My Amiga 500 specs



When my parents bought my Commodore Amiga 500 machine in 1992, it still had the original chipset and 512 KiB of Chip RAM. The only peripherals were an external 3.5-inch floppy drive and a kickstart switcher allowing me switch between Kickstart 1.3 and 2.0. (The kickstart are portions of the Amiga operating system residing in the ROM).

Some time later, the Agnus and Denise chips were upgraded (we moved from the Original Chipset to the Enchanced Chipset), extending the amount of chip RAM to 1 MiB and making it possible to use super high resolution screen modes.

At some point, we bought a KCS PowerPC board making it possible to emulate a PC and run MS-DOS applications. Although the product calls itself an emulator, it is also provides a board that extends the hardware with a number of interesting features:

  • A 10 MHz NEC V30 CPU that is pin and instruction-compatible with an Intel 8086/8088 CPU. Moreover, it implements some 80186 instructions, some of its own instructions, and is between 10-30% faster.
  • 1 MiB of RAM that can be used by the NEC V30 CPU for conventional and upper memory. In addition, the board's memory can also be used by the Amiga as additional chip RAM, fast RAM and as a RAM disk.
  • A clock (powered by a battery) so that you do not have reconfigure the date and time on startup. This PC clock can also be used in Amiga mode.

Eventually, we also obtained a hard drive. The Amiga 500 does not include any hard drive, nor has it an internal hard drive connector.

Nonetheless, it can be extended through the Zorro expansion slot with an extension board. We obtained this extension board: MacroSystem evolution providing a SCSI connector, a whopping 8 MiB of fast RAM and an additional floppy drive connector. To the SCSI connector, a 120 MiB Maxtor 7120SR hard-drive was attached.

Installing new and replacement peripherals


In this section, I will describe my replacement peripherals and what I did to make them work.

RCB to SCART cable


As explained in the introduction, I no longer have a monitor and the Genlock device is broken, only making it possible to have a black and white display.

Fortunately, all kinds of replacement options seem to be available to connect an Amiga to a more modern display.

I have ordered an RGB to SCART cable. It can be attached to the RGB and audio output of the Amiga and to the SCART input on my LCD TV.

GoTek floppy emulator


Another problem is that the secondary floppy drive is broken and could not be repaired.

Even if I could find a suitable replacement drive, floppy disks are very difficult media to use for data exchange these days.

Even with an old PC that still has an internal floppy drive (capable of reading both high and double density floppy disks), exchanging information remains difficult -- due to limitations of a PC floppy controller, a PC is incapable of reading Amiga disks, but an Amiga can read and write to PC floppy disks. A PC formatted floppy disk has less storage capacity than an Amiga formatted disk.

There is also an interesting alternative to a real floppy drive: the GoTek floppy emulator.

The GoTek floppy emulator works with disk image files stored on a USB memory stick. The numeric digit on the display indicates which disk image is currently inserted into the drive. With the rotating switch you can switch between disk images. It operates at the same speed as a real disk drive and produces similar sounds.

Booting from floppy disk 0 starts a program that allows you to configure disk images for the remaining numeric entries:


The GoTek floppy emulator can act both as a replacement for the internal floppy drive as well as an external floppy drive and uses the same connectors.

I have decided to buy an external model, because the internal floppy drive still works and I want to keep the machine as close to the original as possible. I can turn the GoTek floppy drive into the primary disk drive, by using the DF0 switch on the right side of the Amiga case.

Because all disk images are stored on a FAT filesystem-formatted USB stick, makes exchanging information with a PC much easier. I can transfer the same disk files that I can use in the Amiga emulator to the USB memory stick on my PC and then natively use them on a real Amiga.

SCSI2SD


As explained earlier, the 29-year old SCSI hard drive connected to the expansion board is showing all kinds of age-related problems. Although I could search for a compatible second-hand hard drive that was built in the same era, it is probably not going to last very long either.

Fortunately, for retro-computing purposes, an interesting replacement device was developed: the SCSI2SD, that can be used as drop-in replacement for a SCSI hard drive and other kinds of SCSI devices.

This device can be attached to the same SCSI and power connector cables that the old hard drive uses. As the name implies, its major difference is that is uses a (modern) SD-card for storage.


The left picture (shown above) shows the interior of the MacroSystem evolution board's case with the original Maxtor hard drive attached. On the right, I have replaced the hard drive with a SCSI2SD board (that uses a 16 GiB SD-card for storage).

Another nice property of the SCSI2SD is that an SD card offers much more storage capacity. The smallest SD card that I could buy offers 16 GiB of storage, which is a substantially more than the 120 MiB that the old Maxtor hard drive from 1992 used to offer.

Unfortunately, the designers of the original Amiga operating system did not forsee that people would use devices with so much storage capacity. From a technical point of view, AmigaOS versions 3.1 and older are incapable of addressing more than 4 GiB of storage per device.

In addition to the operating system's storage addressing limit, I discovered that there is another limit -- the SCSI controller on the MacroSystem evolution extension board is unable to address more than 1 GiB of storage space per SCSI device. Trying to format a partition beyond this 1 GiB boundary results in a "DOS disk not found" error. This limit does not seem to be documented anywhere in the MacroSystem evolution manual.

To cope with these limitations, the SCSI2SD device can be configured in such a way that it stays within the boundaries of the operating system. To do this, it needs to be connected to a PC with a micro USB cable and configured with the scsi2sd-util tool.

After many rounds of trial and error, I ended up using the following settings:

  • Enable SCSI terminator (V5.1 only): on
  • SCSI Host Speed: Normal
  • Startup Delay (seconds): 0
  • SCSI Selection Delay: 255
  • Enable Parity: on
  • Enable Unit Attention: off
  • Enable SCSI2 Mode: on
  • Disable glitch filter: off
  • Enable disk cache (experimental): off
  • Enable SCSI Disconnect: off
  • Respond to short SCSI selection pulses: on
  • Map LUNS to SCSI IDs: off

Furthermore, the SCSI2SD allows you to configure multiple SCSI devices and put restrictions on how much storage from the SD card can be used per device.

I have configured one SCSI device (representing a 1 GiB hard drive) with the following settings:

  • Enable SCSI Target: on
  • SCSI ID: 0
  • Device Type: Hard Drive
  • Quirks Mode: None
  • SD card start sector: 0
  • Sector size (bytes): 512
  • Sector count: leave it alone
  • Device size: 1 GB

I left the Vendor, ProductID, Revision and Serial Number values untouched. The Sector count is derived automatically from the start sector and device size.

Before using the SD card, I recommend to erase it first. Strictly speaking, this is not required, but I have learned in a very painful way that DiskSalv, a tool that is frequently used to fix corrupted Amiga file systems, may get confused if there are traces of a previous filesystem left behind. As a result, it may incorrectly treat files as invalid file references causing further corruption.

On Linux, I can clear the memory of the SD card with the following command (/dev/sdb refers to the device file of my SD-card reader):

$ dd if=/dev/zero of=/dev/sdb bs=1M status=progress

After clearing the SD card, I can insert it into the SCSI2SD device, do the partitioning and perform the installation of the Workbench. This process turns out to be more tricky than I thought -- the MacroSystem evolution board seems to only include a manual that is in German, requiring me to brush up my German reading skills.

The first step is to use the HDToolBox tool (included with the Amiga Workbench 2.1 installation disk) to detect the hard disk.

(As a sidenote: check if the SCSI cable is properly attached to both the SCSI2SD device, as well as the board. In my first attempt, the firmware was able to detect that there was a SCSI device with LUN 0, but it could not detect that it was a hard drive. After many rounds of trial and error, I discovered that the SCSI cable was not properly attached to the extension board!).

By default, HDToolBox works with the standard SCSI driver bundled with the Amiga operating system (scsi.device) which is not compatible with the SCSI controller on the MacroSystem Evolution board.

To use the correct driver, I had to configure HDToolBox to use a different driver, by opening a shell session and running the following command-line instructions:

Install2.1:HDTools
HDToolBox evolution.device

In the above code fragment, I pass the driver name: evolution.device as a command-line parameter to HDToolBox.

With the above configuration setting, the SCSI2SD device gets detected by HDToolBox:


I did the partitioning of my SD-card hard drive as follows:


Partition Device Name Capacity Bootable
DH0 100 MiB yes
KCS 100 MiB no
DH1 400 MiB no
DH2 400 MiB no

I did not change any advanced file system settings. I have configured all partitions to use mask: 0xfffffe and max transfer: 0xffffff.

Beyond creating partitions, there was another tricky configuration aspect I had to take into account -- I had to reserve the second partition (the KCS partition) as a hard drive for the KCS PowerPC emulator.

In my first partitioning attempt, I configured the KCS partition as the last partition, but that seems to cause problems when I start the KCS PowerPC emulator, typically resulting in a very slow startup followed by a system crash.

It appears that this problem is a caused by a memory addressing problem. Putting the KCS partition under the 200 MiB limit seems to fix the problem. Since most addressing boundaries are power of 2 values, my guess is that the KCS PowerPC emulator expects a hard drive partition to reside below the 256 MiB limit.

After creating the partitions and rebooting the machine, I can format them. For some unknown reason, a regular format does not seem to work, so I ended up doing a quick format instead.

Finally, I can install the workbench on the DH0: partition by running the Workbench installer (that resides in the: Install2.1 folder on the installation disk):


Null modem cable


The GoTek floppy drive and SCSI2SD already make it much easier to exchange data with my Amiga, but they are still somewhat impractical for exchanging small files, such as Protracker modules or software packages (in LhA format) downloaded from Aminet.

I have also bought a good old-fashioned null modem cable that can be used to link two computers through their serial ports. Modern computers no longer have a RS-232 serial port, but you can still use an USB to RS-232 converter that indirectly makes it possible to link up with a USB connection.

To link up, the serial port settings on both ends need to be the same and the baud rate should not be to high. I have configured the following settings on my Amiga (configured with the SYS:Prefs/Serial preferences program):

  • Baud rate: 19,200
  • Input buffer size: 512
  • Handshaking: RTS/CTS
  • Parity: None
  • Bits/Char: 8
  • Stop Bits: 1

With a terminal client, such as NComm, I can make a terminal connection to my Linux machine. By installing lrzsz on my Linux machine, I can exchange files by using the Zmodem protocol.

There are a variety of ways to link my Amiga with a Linux PC. A quick and easy way to exchange files, is by starting picocom on the Linux machine with the following parameters:

$ picocom --baud 19200 \
  --flow h \
  --parity n \
  --databits 8 \
  --stopbits 1 \
  /dev/ttyUSB0

After starting Picocom, I can download files from my Linux PC by selecting: Transfer -> Download in the NComm menu. This action opens a file dialog on my Linux machine that allows me to pick the files that I want to download.

Similarly, I can upload files to my Linux machine by selecting Transfer -> Upload. On my Linux machine, a file dialog appears that allows me to pick the target directory where the uploaded files need to be stored.

In addition to simple file exchange, I can also expose a Linux terminal over a serial port and use my Amiga to remotely provide command-line instructions:

$ agetty --flow-control ttyUSB0 19200


To keep the terminal screen formatted nicely (e.g. a fixed number of rows and columns) I should run the following command in the terminal session:

stty rows 48 cols 80

By using NComm's upload function, I can transfer files to the current working directory.

Downloading a file from my Linux PC can be done by running the sz command:

$ sz mod.cool

The above command allows me to download the ProTracker module file: mod.cool from the current working directory.

It is also possible to remotely administer an Amiga machine from my Linux machine. Running the following command starts a shell session exposed over the serial port:

> NewShell AUX:

With a terminal client on my Linux machine, such as Minicom, I can run Amiga shell instructions remotely:

$ minicom -b 19200 -D /dev/ttyUSB0

showing me the following output:


Usage


All these new hardware peripherals open up all kinds of new interesting possibilities.

Using the SD card in FS-UAE


For example, I can detach the SD card from the SCSI2SD device, put it in my PC, and then use the hard drive in the emulator (both FS-UAE and WinUAE seem to work).

By giving the card reader's device file public permissions:

$ chmod 666 /dev/sdb

FS-UAE, that runs as an ordinary user, should be able to access it. By configuring a hard drive that refers to the device file:

hard_drive_0 = /dev/sdb

we have configured FS-UAE to use the SD card as a virtual hard drive (allowing me to use the exact same installation):


An advantage of using the SD card in the emulator is that we can perform installations of software packages much faster. I can temporarily boost the emulator's execution and disk drive speed, saving me quite a bit of installation time.

I can also more conveniently transfer large files from my host system to the SD card. For example, I can create a temp folder and expose it in FS-UAE as a secondary virtual hard drive:

hard_drive_1 = /home/sander/temp
hard_drive_1_label = temp

and then copy all files from the temp: drive to the SD card:


Using the KCS PowerPC board with the new peripherals


The GoTek floppy emulator and the SCSI2SD device can also be used in the KCS PowerPC board emulator.

In addition to Amiga floppy disks, the GoTek floppy emulator can also be used for emulating double density PC disks. The only inconvenience is that it is impossible to format an empty disk on the Amiga for a PC with CrossDOS.

However, on my Linux machine, it is possible to create an empty 720 KiB disk image, format it as a DOS disk, and put the image file on the USB stick:

$ dd if=/dev/zero of=./mypcdisk.img bs=1k count=720
$ mkdosfs -n mydisk ./mypcdisk.img

The KCS PowerPC emulator also makes it possible to use Amiga's serial and parallel ports. As a result, I can also transfer files from my Linux PC by using a PC terminal client, such as Telix:


To connect to my Linux PC, I am using almost the same serial port settings as in the Workbench preferences. The only limitation is that I need to lower my baud rate -- it seems that Telix no longer works reliably for baud rates higher than 9600 bits per second.

The KCS PowerPC board is a very capable PC emulator. Some PC aspects are handled by real hardware, so that there is no speed loss -- the board provides a real 8086/8088 compatible CPU and 1 MiB of memory.

It also provides its own implementation of a system BIOS and VGA BIOS. As a result, text-mode DOS applications work as well as their native XT-PC counterparts, sometimes even slightly better.

One particular aspect that is fully emulated in software is CGA/EGA/VGA graphics. As I have explained in a blog written several years ago, the Amiga uses bitplane encoding for graphics whereas PC hardware uses chunky graphics. To allow graphics to be displayed, the data needs to be translated into planar graphics format, making graphics rendering very slow.

For example, it is possible to run Microsoft Windows 3.0 (in real mode) in the emulator, but the graphics are rendered very very slowly:


Interestingly enough, the game: Commander Keen seems to work at an acceptable speed:


I think Commander Keen runs so fast in the emulator (despite its slow graphics emulation), because of the adaptive tile refresh technique (updating the screen by only redrawing the necessary parts).

File reading problems and crashes


Although all these replacement peripherals are nice, such as the SCSI2SD, I was also running into a very annoying recurring problem.

I have noticed that after using the SCSI2SD for a while, sometimes a file may get incorrectly read.

Incorrectly read files lead to all kinds of interesting problems. For example, unpacking an LhA or Zip archive from the hard drive may sometimes result in one or more CRC errors. I have also noticed subtle screen and audio glitches while playing games stored on the SD card.

A really annoying problem is when an executable is incorrectly read -- this typically results in program failure crashes with error codes 8000 0003 or 8000 0004. The former error is caused by executing a wrong CPU instruction.

These read errors do not seem to happen all the time. For example, reading a previously incorrectly read file may actually open it successfully, so it appears that files are correctly written to disk.

After some investigation and comparing my SD card configuration with the old SCSI hard drive, I have noticed that the read speeds were a bit poor. SysInfo shows me a read speed of roughly 698 KiB per second:


By studying the MacroSystem Evolution manual (in German) and comparing the configuration with the Workbench installation on the old hard drive, I discovered that there is a burst mode option that can boost read performance.

To enable burst mode, I need to copy the Evolution utilities from the MacroSystem evolution driver disk to my hard drive (e.g. by copying DF0:Evolution3 to DH0:Programs/Evolution3). and add the following command-line instruction to S:User-Startup:

DH0:Programs/Evolution3/Utilities/HDParms 0 NOCHANGE NOFORMAT NOCACHE BURST

Resulting in read speeds that are roughly 30% faster:


Unfortunately, faster read speeds also seem to dramatically increase the likelyhood on read errors making my system quite unreliable.

I am still not completely sure what is causing these incorrect reads, but from my experiments I know that read speeds definitely have something to do with it. Restoring the configuration to no longer use burst mode (and slower reads), seems to make my system much more stable.

I also learned that these read problems are very similar to problems reported about a wrong MaxTransfer value. According to this page, setting it to 0x1fe00 should be a safe value. I tried adjusting the MaxTransfer value, but it does not seem to change anything.

Although my system seems to be stable enough after making these modifications, I would still like to expand my knowledge about this subject so that I can fully explain what is going on.

Conclusion



It took me several months to figure out all these details, but with my replacement peripherals, my Commodore Amiga 500 works great again. The machine is more than 29 years old and I can still run all applications and games that I used to work with in the mid 1990s and more. Furthermore, data exchange with my Linux PC has become much easier.

Back in the early 90s, I did not have the luxury to download software and information from Internet.

I also learned many new things about terminal connections. It seems that Linux (because of its UNIX heritage) has all kinds of nice facilities to expose itself as a terminal server.

After visiting the home computer museum, I became more motivated to preserve my Amiga 500 in the best possible way. It seems that as of today, there are still replacement parts for sale and many things can be repaired.

My recommendation is that if you still own a classic machine, do not just throw it away. You may regret it later.

Future work


Aside from finding a proper explanation for the file reading problems, I am still searching for a real replacement floppy drive. Moreover, I still need to investigate whether the Genlock device can be repaired.

Tuesday, August 31, 2021

A more elaborate approach for bypassing NPM's dependency management features in Nix builds

Nix is a general purpose package manager that can be used to automate the deployments of a variety of systems -- it can deploy components written in a variety of programming languages (e.g. C, C++, Java, Go, Rust, Perl, Python, JavaScript) using various kinds of technologies and frameworks, such as Django, Android, and Node.js.

Another unique selling point of Nix is that it provides strong reproducibility guarantees. If a build succeeds on one machine, then performing the same build on another should result in a build that is (nearly) bit-identical.

Nix improves build reproducibility by complementing build processes with features, such as:

  • Storing all artifacts in isolation in a so-called Nix store: /nix/store (e.g. packages, configuration files), in which every path is unique by prefixing it with an SHA256 hash code derived from all build inputs (e.g. dependencies, build scripts etc.). Isolated paths make it possible for multiple variants and versions of the same packages to safely co-exist.
  • Clearing environment variables or setting them to dummy values. In combination with unique and isolated Nix store paths, search environment variables must configured in such a way that the build script can find its dependencies in the Nix store, or it will fail.

    Having to specify all search environment variables may sound inconvenient, but prevents undeclared dependencies to accidentally make a build succeed -- deployment of such a package is very likely to fail on machine that misses an unknown dependency.
  • Running builds as an unprivileged user that does not have any rights to make modifications to the host system -- a build can only write in its designated temp folder or output paths.
  • Optionally running builds in a chroot environment, so that a build cannot possibly find any undeclared host system dependencies through hard-coded absolute paths.
  • Restricting network access to prevent a build from obtaining unknown dependencies that may influence the build outcome.

For many build tools, the Nixpkgs repository provides abstraction functions that allow you to easily construct a package from source code (e.g. GNU Make, GNU Autotools, Apache Ant, Perl's MakeMaker, SCons etc.).

However, certain tools are difficult to use in combination with Nix -- for example, NPM that is used to deploy Node.js projects.

NPM is both a dependency and build manager and the former aspect conflicts with Nix -- builds in Nix are typically prevented from downloading files from remote network locations, with the exception of so-called fixed-output derivations in which the output hash is known in advance.

If network connections would be allowed in regular builds, then Nix can no longer ensure that a build is reproducible (i.e. that the hash code in the Nix store path reflects the same build output derived from all inputs).

To cope with the conflicting dependency management feature of NPM, various kinds of integrations have been developed. npm2nix was the first, and several years ago I have started node2nix to provide a solution that aims for accuracy.

Basically, the build process of an NPM package in Nix boils down to performing the following steps in a Nix derivation:

# populate the node_modules/ folder
npm install --offline

We must first obtain the required dependencies of a project through the Nix package manager and install them in the correct locations in the node_modules/ directory tree.

Finally, we should run NPM in offline mode forcing it not to re-obtain or re-install any dependencies, but still perform build management tasks, such as running build scripts.

From a high-level point of view, this principle may look simple, but in practice it is not:

  • With earlier versions of NPM, we were forced to imitate its dependency resolution algorithm. At first sight, it looked simple, but getting it right (such as coping with circular dependencies and dependency de-duplication) is much more difficult than expected.
  • NPM 5.x introduced lock files. For NPM development projects, they provide exact version specifiers of all dependencies and transitive dependencies, making it much easier to know which dependencies need to be installed.

    Unfortunately, NPM also introduced an offline cache, that prevents us from simply copying packages into the node_modules/ tree. As a result, we need to make additional complex modifications to the package.json configuration files of all dependencies.

    Furthermore, end user package installations do not work with lock files, requiring us to still keep our custom implementation of the dependency resolution algorithm.
  • NPM's behaviour with dependencies on directories on the local file system has changed. In old versions of NPM, such dependencies were copied, but in newer versions, they are symlinked. Furthermore, each directory dependency maintains its own node_modules/ directory for transitive dependencies.

Because we need to take many kinds of installation scenarios into account and work around the directory dependency challenges, the implementation of the build environment: node-env.nix in node2nix has become very complicated.

It has become so complicated that I consider it a major impediment in making any significant changes to the build environment.

In the last few weeks, I have been working on a companion tool named: placebo-npm that should simplify the installation process. Moreover, it should also fix a number of frequently reported issues.

In this blog post, I will explain how the tool works.

Lock-driven deployments


In NPM 5.x, package-lock.json files were introduced. The fact that they capture the exact versions of all dependencies and make all transitive dependencies known, makes certain aspects of an NPM deployment in a Nix build environment easier.

For lock-driven projects, we no longer have to run our own implementation of the dependency resolution algorithm to figure out what the exact versions of all dependencies and transitive dependencies are.

For example, a project with the following package.json:

{
  "name": "simpleproject",
  "version": "0.0.1",
  "dependencies": {
    "underscore": "*",
    "prom2cb": "github:svanderburg/prom2cb",
    "async": "https://mylocalserver/async-3.2.1.tgz"
  }
}

may have the following package-lock.json file:

{
  "name": "simpleproject",
  "version": "0.0.1",
  "lockfileVersion": 1,
  "requires": true,
  "dependencies": {
    "async": {
      "version": "https://mylocalserver/async-3.2.1.tgz",
      "integrity": "sha512-XdD5lRO/87udXCMC9meWdYiR+Nq6ZjUfXidViUZGu2F1MO4T3XwZ1et0hb2++BgLfhyJwy44BGB/yx80ABx8hg=="
    },
    "prom2cb": {
      "version": "github:svanderburg/prom2cb#fab277adce1af3bc685f06fa1e43d889362a0e34",
      "from": "github:svanderburg/prom2cb"
    },
    "underscore": {
      "version": "1.13.1",
      "resolved": "https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz",
      "integrity": "sha512-hzSoAVtJF+3ZtiFX0VgfFPHEDRm7Y/QPjGyNo4TVdnDTdft3tr8hEkD25a1jC+TjTuE7tkHGKkhwCgs9dgBB2g=="
    }
  }
}

As you may notice, the package.json file declares three dependencies:

  • The first dependency is underscore that refers to the latest version in the NPM registry. In the package-lock.json file, the dependency is frozen to version 1.13.1. The resolved property provides the URL where the tarball should be obtained from. Its integrity can be verified with the given SHA512 hash.
  • The second dependency: prom2cb refers to the latest revision of the main branch of the prom2cb Git repository on GitHub. In the package-lock.json file, it is pinpointed to the fab277... revision.
  • The third dependency: async refers to a tarball that is downloaded from an arbitrary HTTP URL. The package-lock.json records its SHA512 integrity hash to make sure that we can only deploy with the version that we have used previously.

As explained earlier, to ensure purity, in a Nix build environment, we cannot allow NPM to obtain the required dependencies of a project. Instead, we must let Nix obtain all the dependencies.

When all dependencies have been obtained, we should populate the node_modules/ folder of the project. In the above example, it is just simply a matter of unpacking the tarballs or copying the Git clones into the node_modules/ folder of the project. No transitive dependencies need to be deployed.

For projects that do not rely on build scripts (that perform tasks, such as linting, compiling code, such as TypeScript etc.) this typically suffices to make a project work.

However, when we also need build management, we need to run the full installation process:

$ npm install --offline

npm ERR! code ENOTCACHED
npm ERR! request to https://registry.npmjs.org/async/-/async-3.2.1.tgz failed: cache mode is 'only-if-cached' but no cached response available.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/sander/.npm/_logs/2021-08-29T12_56_13_978Z-debug.log

Unfortunately, NPM still tries to obtain the dependencies despite the fact that they have already been copied into the right locations into node_modules folder.

Bypassing the offline cache


To cope with the problem that manually obtained dependencies cannot be detected, my initial idea was to use the NPM offline cache in a specific way.

The offline cache claims to be content-addressable, meaning that every item can be looked up by using a hash code that represents its contents, regardless of its origins. Unfortunately, it turns out that this property cannot be fully exploited.

For example, when we obtain the underscore tarball (with the exact same contents) from a different URL:

$ npm cache add http://mylocalcache/underscore-1.13.1.tgz

and run the installation in offline mode:

$ npm install --offline
npm ERR! code ENOTCACHED
npm ERR! request to https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz failed: cache mode is 'only-if-cached' but no cached response available.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/sander/.npm/_logs/2021-08-26T13_50_15_137Z-debug.log

The installation still fails, despite the fact that we already have a tarball (with the exact same SHA512 hash) in our cache.

However, downloading underscore from its original location (the NPM registry):

$ npm cache add underscore@1.13.1

makes the installation succeed.

The reason why downloading the same tarball from an arbitrary HTTP URL does not work is because NPM will only compute a SHA1 hash. Obtaining a tarball from the NPM registry causes NPM to compute a SHA512 hash. Because it was downloaded from a different source, it fails to recognize the SHA512 hash in the package-lock.json file.

We also run into similar issues when we obtain an old package from the NPM registry that only has an SHA1 hash. Importing the same file from a local file path causes NPM to compute a SHA512 hash. As a result, npm install tries to re-obtain the same tarball from the remote location, because the hash was not recognized.

To cope with these problems, placebo-npm will completely bypass the cache. After all dependencies have been copied to the node_modules folder, it modifies their package.json configuration files with hidden metadata properties to trick NPM that they came from their original locations.

For example, to make the underscore dependency work (that is normally obtained from the NPM registry), we must add the following properties to the package.json file:

{
  ...
  _from: "underscore@https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz",
  _integrity: "sha512-XdD5lRO/87udXCMC9meWdYiR+Nq6ZjUfXidViUZGu2F1MO4T3XwZ1et0hb2++BgLfhyJwy44BGB/yx80ABx8hg==",
  _resolved: "https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz"
}

For prom2cb (that is a Git dependency), we should add:

{
  ...
  _from = "github:svanderburg/prom2cb",
  _integrity = "",
  _resolved = "github:svanderburg/prom2cb#fab277adce1af3bc685f06fa1e43d889362a0e34"
}

and for HTTP/HTTPS dependencies and local files we should do something similar (adding _from and _integrity fields).

With these modifications, NPM will no longer attempt to consult the local cache, making the dependency installation step succeed.

Handling directory dependencies


Another challenge is dependencies on local directories, that are frequently used for local development projects:

{
  "name": "simpleproject",
  "version": "0.0.1",
  "dependencies": {
    "underscore": "*",
    "prom2cb": "github:svanderburg/prom2cb",
    "async": "https://mylocalserver/async-3.2.1.tgz",
    "mydep": "../../mydep",
  }
}

In the package.json file shown above, a new dependency has been added: mydep that refers to a relative local directory dependency: ../../mydep.

If we run npm install, then NPM creates a symlink to the folder in the project's node_modules/ folder and installs the transitive dependencies in the node_modules/ folder of the target dependency.

If we want to deploy the same project to a different machine, then it is required to put mydep in the exact same relative location, or the deployment will fail.

Deploying such an NPM project with Nix introduces a new problem -- all packages deployed by Nix are stored in the Nix store (typically /nix/store). After deploying the project, the relative path to the project (from the Nix store) will no longer be correct. Moreover, we also want Nix to automatically deploy the directory dependency as part of the deployment of the entire project.

To cope with these inconveniences, we are required to implement a tricky solution -- we must rewrite directory dependencies in such a way that can refer to a folder that is automatically deployed by Nix. Furthermore, the dependency should still end up being symlink to satisfy NPM -- copying directory dependencies in the node_modules/ folder is not accepted by NPM.

Usage


To conveniently install NPM dependencies from a local source (and satisfying npm in such a way that it believes the dependencies came from their original locations), I have created a tool called: placebo-npm.

We can, for example, obtain all required dependencies ourselves and put them in a local cache folder:

$ mkdir /home/sander/mycache
$ wget https://mylocalserver/async-3.2.1.tgz
$ wget https://registry.npmjs.org/underscore/-/underscore-1.13.1.tgz
$ git clone https://github.com/svanderburg/prom2cb

The deployment process that placebo-npm executes is driven by a package-placebo.json configuration file that has the following structure:

{
   "integrityHashToFile": {
     "sha512-hzSoAVtJF+3ZtiFX0VgfFPHEDRm7Y/QPjGyNo4TVdnDTdft3tr8hEkD25a1jC+TjTuE7tkHGKkhwCgs9dgBB2g==": "/home/sander/mycache/underscore-1.13.1.tgz",
     "sha512-XdD5lRO/87udXCMC9meWdYiR+Nq6ZjUfXidViUZGu2F1MO4T3XwZ1et0hb2++BgLfhyJwy44BGB/yx80ABx8hg==": "/home/sander/mycache/async-3.2.1.tgz"
   },
   "versionToFile": {
     github:svanderburg/prom2cb#fab277adce1af3bc685f06fa1e43d889362a0e34": "/home/sander/mycache/prom2cb"
   },
   "versionToDirectoryCopyLink": {
     "file:../dep": "/home/sander/alternatedir/dep"
   }
}

The placebo config maps dependencies in a package-lock.json file to local file references:

  • integrityHashToFile maps dependencies with an integrity hash to local files, which is useful for HTTP/HTTPS dependencies, registry dependencies, and local file dependencies.
  • versionToFile: maps dependencies with a version property to local directories. This is useful for Git dependencies.
  • versionToDirectoryCopyLink: specifies directories that need to be copied into a shadow directory named: placebo_node_dirs and creates symlinks to the shadow directories in the node_modules/ folder. This is useful for installing directory dependencies from arbitrary locations.

With the following command, we can install all required dependencies from the local cache directory and make all necessary modifications to let NPM accept the dependencies:

$ placebo-npm package-placebo.json

Finally, we can run:

$ npm install --offline

The above command does not attempt to re-obtain or re-install the dependencies, but still performs all required build management tasks.

Integration with Nix


All the functionality that placebo-npm provides has already been implemented in the node-env.nix module, but over the years it has evolved into a very complex beast -- it is implemented as a series of Nix functions that generates shell code.

As a consequence, it suffers from recursion problems and makes it extremely difficult to tweak/adjust build processes, such as modifying environment variables or injecting arbitrary build steps to work around Nix integration problems.

With placebo-npm we can reduce the Nix expression that builds projects (buildNPMProject) to an implementation that roughly has the following structure:

{stdenv, placebo-npm}:
{packagePlacebo}:

stdenv.mkDerivation ({
  pname = builtins.replaceStrings [ "@" "/" ] [ "_at_" "_slash_" ] pname; # Escape characters that aren't allowed in a store path

  placeboJSON = builtins.toJSON packagePlacebo;
  passAsFile = [ "placeboJSON" ];

  buildInputs = [ nodejs placebo-npm ] ++ buildInputs;

  buildPhase = ''
    runHook preBuild
    true
    runHook postBuild
  '';
  installPhase = ''
    runHook preInstall

    mkdir -p $out/lib/node_modules/${pname}
    mv * $out/lib/node_modules/${pname}
    cd $out/lib/node_modules/${pname}

    placebo-npm --placebo $placeboJSONPath
    npm install --offline

    runHook postInstall
  '';
} // extraArgs)

As may be observed, the implementation is much more compact and fits easily on one screen. The function accepts a packagePlacebo attribute set as a parameter (that gets translated into a JSON file by the Nix package manager).

Aside from some simple house keeping work, most of the complex work has been delegated to executing placebo-npm inside the build environment, before we run npm install.

The function above is also tweakable -- it is possible to inject arbitrary environment variables and adjust the build process through build hooks (e.g. preInstall and postInstall).

Another bonus feature of delegating all dependency installation functionality to the placebo-npm tool is that we can also use this tool as a build input for other kinds projects -- we can use it the construction process of systems that are built from monolithic repositories, in which NPM is invoked from the build process of the encapsulating project.

The only requirement is to run placebo-npm before npm install is invoked.

Other use cases


In addition to using placebo-npm as a companion tool for node2nix and setting up a simple local cache, it can also be useful to facilitate offline installations from external media, such as USB flash drives.

Discussion


With placebo-npm we can considerably simplify the implementation of node-env.nix (part of node2nix) making it much easier to maintain. I consider the node-env.nix module the second most complicated aspect of node2nix.

As a side effect, it has also become quite easy to provide tweakable build environments -- this should solve a large number of reported issues. Many reported issues are caused by the fact that it is difficult or sometimes impossible to make changes to a project so that it will cleanly deploy.

Moreover, placebo-npm can also be used as a build input for projects built from monolithic repositories, in which a sub set needs to be deployed by NPM.

The integration of the new node-env.nix implementation into node2nix is not completely done yet. I have reworked it, but the part that generates the package-placebo.json file and lets Nix obtain all required dependencies is still a work-in-progress.

I am experimenting with two implementations: a static approach that generates Nix expressions and dynamic implementation that directly consumes a package-lock.json file in the Nix expression language. Both approaches have pros and cons. As a result, node2nix needs to combine both of them into a hybrid approach.

In a next blog post, I will explain more about them.

Availability


The initial version of placebo-npm can be obtained from my GitHub page.

Tuesday, June 1, 2021

An unconventional method for creating backups and exchanging files


I have written many blog posts about software deployment and configuration management. For example, a couple of years ago, I have discussed a very basic configuration management process for small organizations, in which I explained that one of the worst things that could happen is that a machine breaks down and everything that it provides gets lost.

Fortunately, good configuration management practices and deployment tools (such as Nix) can help you to restore a machine's configuration with relative ease.

Another problem is managing a machine's data, which in many ways is even more important and complicated -- software packages can be typically obtained from a variety of sources, but data is typically unique (and therefore more valuable).

Even if a machine stays operational, the data that it stores can still be at risk -- it may get deleted by accident, or corrupted (for example, by the user, or a hardware problem).

It also does not matter whether a machine is used for business (for example, storing data for information systems) or personal use (for example, documents, pictures, and audio files). In both cases, data is valuable, and as a result, needs to be protected from loss and corruption.

In addition to recovery, the availability of data is often also very important -- many users (including me) typically own multiple devices (e.g. a desktop PC, laptop and phone) and typically want access to the same data from multiple places.

Because of the importance of data, I sometimes get questions from non-technical users that want to know how I manage my personal data (such as documents, images and audio files) and what tools I would recommend.

Similar to most computer users, I too have faced my own share of reliability problems -- of all the desktop computers I owned, I ended up with a completely broken hard drive three times, and a completely broken laptop once. Furthermore, I have also worked with all kinds of external media (e.g. floppy disks, CD-ROMs etc.) each having their own share of reliability problems.

To cope with data availability and loss, I came up with a custom script that I have been conveniently using to create backups and synchronize my data between the machines that I use.

In this blog post, I will explain how this script works.

About storage media


To cope with the potential loss of data, I have always made it a habit to transfer data to external media. I have worked with a variety of them, each having their advantages and disadvantages:

  • In the old days, I used floppy disks. Most people who are (at the time reading this blog post) in their early twenties or younger, may probably have no clue what I am talking about (for those people perhaps the 'Save icon' used in many desktop applications looks familiar).

    Roughly 25 years ago, floppy disks were a common means to exchange data between computers.

    Although they were common, they had many drawbacks. Probably the biggest drawback was their limited storage capacity -- I used to own 5.25 inch disks that (on PCs) were capable of storing ~360 KiB (if both sides are used), and the more sturdy 3.5 inch disks providing double density (720 KiB) and high density capacity (1.44 MiB).

    Furthermore, floppy disks were also quite slow and could be easily damaged, for example, by toughing the magnetic surface.
  • When I switched from the Commodore Amiga to the PC, I also used tapes for a while in addition to floppy disks. They provided a substantial amount of storage capacity (~500 MiB in 1996). As of 2019 (and this probably still applies to today), tapes are still considered very cheap and reliable media for archival of data.

    What I found impractical about tapes is that they are difficult to use as random access memory -- data on a tape is stored sequentially. As a consequence, it is typically very slow to find files or to "update" existing files. Typically, a backup tool needs to scan the tape from the beginning to the end or maintain a database with known storage locations.

    Many of my personal files (such as documents) are regularly updated and older versions do not have to be retained. Instead, they should be removed to clear up storage space. With tapes this is very difficult to do.
  • When writable CD/DVDs became affordable, I used them as a backup media for a while. Similar to tapes, they also have substantial storage capacity. Furthermore, they are very fast and convenient to read.

    A similar disadvantage is that they are not a very convenient medium for updating files. Although it is possible to write multi-sessions discs, in which files can be added, overwritten, or made invisible (essentially a "soft delete"), it remained inconvenient because you can not clear up the storage space that a deleted file used to occupy.

    I also learned the hard way that writable discs (and in particular rewritable discs) are not very reliable for long term storage -- I have discarded many old writable discs (10 years or older) that can no longer be read.

Nowadays, I use a variety of USB storage devices (such as memory sticks, hard drives) as backup media. They are relatively cheap, fast, have more than enough storage capacity, and I can use them as random access memory -- it is no problem at all to update and delete data existing data.

To cope with the potential breakage of USB storage media, I always make sure that I have at least two copies of my important data.

About data availability


As already explained in the introduction, I have multiple devices for which I want the same data to be available. For example, on both my desktop PC and company laptop, I want to have access to my music and research papers collection.

A possible solution is to use a shared storage medium, such as a network drive. The advantage of this approach is that there is a single source of truth and I only need to maintain a single data collection -- when I add a new document it will immediately be available to both devices.

Although a network drive may be a possible solution, it is not a good fit for my use cases -- I typically use laptops for traveling. When I am not at home, I can no longer access my data stored on the network drive.

Another solution is to transfer all required files to the hard drive on my laptop. Doing a bulk transfer for the first time is typically not a big problem (in particular, if you use orthodox file managers), but keeping collections of files up-to-date between machines is in my experience quite tedious to do by hand.

Automating data synchronization


For both backing up and synchronizing files to other machines I need to regularly compare and update files in directories. In the former case, I need to sync data between local directories, and for the latter I need to sync data between directories on remote machines.

Each time I want make updates to my files, I want to inspect what has changed, and see which files require updating before actually doing it, so that I do not end up wasting time or risk modifying the wrong files.

Initially, I started to investigate how to implement a synchronization tool myself, but quite quickly I realized that there is already a tool available that is quite suitable for the job: rsync.

rsync is designed to efficiently transfer and synchronize files between drivers and machines across networks by comparing the modification times and sizes of files.

The only thing that I consider a drawback is that it is not fully optimized to conveniently automate my personal workflow -- to accomplish what I want, I need to memorize all the relevant rsync command-line options and run multiple command-line instructions.

To alleviate this problem, I have created a custom script, that evolved into a tool that I have named: gitlike-rsync.

Usage


gitlike-rsync is a tool that facilitates synchronisation of file collections between directories on local or remote machines using rsync and a workflow that is similar to managing Git projects.

Making backups


For example, if we have a data directory that we want to back up to another partition (for example, that refers to an external USB drive), we can open the directory:

$ cd /home/sander/Documents

and configure a destination directory, such as a directory on a backup drive (/media/MyBackupDrive/Documents):

$ gitlike-rsync destination-add /media/MyBackupDrive/Documents

By running the following command-line instruction, we can create a backup of the Documents folder:

$ gitlike-rsync push
sending incremental file list
.d..tp..... ./
>f+++++++++ bye.txt
>f+++++++++ hello.txt

sent 112 bytes  received 25 bytes  274.00 bytes/sec
total size is 10  speedup is 0.07 (DRY RUN)
Do you want to proceed (y/N)? y
sending incremental file list
.d..tp..... ./
>f+++++++++ bye.txt
              4 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=1/3)
>f+++++++++ hello.txt
              6 100%    5.86kB/s    0:00:00 (xfr#2, to-chk=0/3)

sent 202 bytes  received 57 bytes  518.00 bytes/sec
total size is 10  speedup is 0.04

The output above shows me the following:

  • When no additional command-line parameters have been provided, the script will first do a dry run and show the user what it intends to do. In the above example, it shows me that it wants to transfer the contents of the Documents folder that consists of only two files: hello.txt and bye.txt.
  • After providing my confirmation, the files in the destination directory will be updated -- the backup drive that is mounted on /media/MyBackupDrive.

I can conveniently make updates in my documents folder and update my backups.

For example, I can add a new file to the Documents folder named: greeting.txt, and run the push command again:

$ gitlike-rsync push
sending incremental file list
.d..t...... ./
>f+++++++++ greeting.txt

sent 129 bytes  received 22 bytes  302.00 bytes/sec
total size is 19  speedup is 0.13 (DRY RUN)
Do you want to proceed (y/N)? y
sending incremental file list
.d..t...... ./
>f+++++++++ greeting.txt
              9 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=1/4)

sent 182 bytes  received 38 bytes  440.00 bytes/sec
total size is 19  speedup is 0.09

In the above output, only the greeting.txt file is transferred to backup partition, leaving the other files untouched, because they have not changed.

Restoring files from a backup


In addition to the push command, gitlike-rsync also supports pull that can be used to sync data from the configured destination folders. The pull command can be used as a means to restore data from a backup partition.

For example, if I accidentally delete a file from the Documents folder:

$ rm hello.txt

and run the pull command:

$ gitlike-rsync pull
sending incremental file list
.d..t...... ./
>f+++++++++ hello.txt

sent 137 bytes  received 22 bytes  318.00 bytes/sec
total size is 19  speedup is 0.12 (DRY RUN)
Do you want to proceed (y/N)? y
sending incremental file list
.d..t...... ./
>f+++++++++ hello.txt
              6 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=0/4)

sent 183 bytes  received 38 bytes  442.00 bytes/sec
total size is 19  speedup is 0.09

the script is able to detect that hello.txt was removed and restore it from the backup partition.

Synchronizing files between machines in a network


In addition to local directories, that are useful for back ups, the gitlike-rsync script can also be used in a similar way to exchange files between machines, such as my desktop PC and office laptop.

With the following command-line instruction, I can automatically clone the Documents folder from my desktop PC to the Documents folder on my office laptop:

$ gitlike-rsync clone sander@desktop-pc:/home/sander/Documents

The above command connects to my desktop PC over SSH and retrieves the content of the Documents/ folder. It will also automatically configure the destination directory to synchronize with the Documents folder on the desktop PC.

When new documents have been added on the desktop PC, I just have to run the following command on my office laptop to update it:

$ gitlike-rsync pull

I can also modify the contents of the Documents folder on my office laptop and synchronize the changed files to my desktop PC with a push:

$ gitlike-rsync push

About versioning


As explained in the beginning of this blog post, in addition to the recovery of failing machines and equipment, another important reason to create backups is to protect yourself against accidental modifications.

Although gitlike-rsync can detect and display file changes, it does not do any versioning of any kind. This feature is deliberately left unimplemented, for very good reasons.

For most of my personal files (e.g. images, audio, video) I do not need any versioning. As soon as they are organized, they are not supposed to be changed.

However, for certain kinds of files I do need versioning, such as software development projects. Whenever I need versioning, my answer is very simple: I use the "ordinary" Git, even for projects that are private and not supposed to be shared on a public hosting service, such as GitHub.

As seasoned Git users may probably already know, you can turn any local directory into a Git repository, by running:

$ git init

The above command creates a local .git folder that tracks and stores changes locally.

When using a public hosting service, such as GitHub, and cloning a repository from GitHub, a remote: origin has been automatically configured to automatically push and pull changes to and from GitHub.

It is also possible to synchronize Git changes between arbitrary computers using a private SSH connection. I can, for example, configure a remote for a private repository, as follows:

$ git remote add origin sander@desktop-pc:/home/sander/Development/private-project

the above command configures the Git project that is stored in the /home/sander/Development/private-project directory on my desktop PC as a remote.

I can pull changes from the remote repository, by running:

$ git pull origin

and push locally stored changes, by running:

$ git push origin

As you may probably have already noticed, the above workflow is very similar to exchanging documents, shown earlier in this blog post.

What about backing up private Git repositories? To do this, I typically create tarballs of the Git project directories and sync them to my backup media with gitlike-rsync. The presence of the .git folder suffices to retain a project's history.

Conclusion


In this blog post, I have described gitlike-rsync, a simple opinionated wrapper script for exchanging files between local directories (for backups) and remote directories (for data exchange between machines).

As its name implies, it heavily builds on top of rsync for efficient data exchange, and the concepts of git as an inspiration for the workflow.

I have been conveniently using this script for over ten years, and it works extremely well for my own use cases and a variety of operating systems (Linux, Windows, macOS and FreeBSD).

My solution is obviously not rocket science -- my contribution is only the workflow automation. The "true credits" should go the developers of rsync and Git.

I also have to thank the COVID-19 crisis that allowed me to finally find the time to polish the script, document it and give it a name. In the Netherlands, as of today, there are still many restrictions, but the situation is slowly getting better.

Availability


I have added the gitlike-rsync script described in this blog post to my custom-scripts repository that can be obtained from my GitHub page.

Monday, April 26, 2021

A test framework for the Nix process management framework

As already explained in many previous blog posts, the Nix process management framework adds new ideas to earlier service management concepts explored in Nixpkgs and NixOS:

  • It makes it possible to deploy services on any operating system that can work with the Nix package manager, including conventional Linux distributions, macOS and FreeBSD. It also works on NixOS, but NixOS is not a requirement.
  • It allows you to construct multiple instances of the same service, by using constructor functions that identify conflicting configuration parameters. These constructor functions can be invoked in such a way that these configuration properties no longer conflict.
  • We can target multiple process managers from the same high-level deployment specifications. These high-level specifications are automatically translated to parameters for a target-specific configuration function for a specific process manager.

    It is also possible to override or augment the generated parameters, to work with configuration properties that are not universally supported.
  • There is a configuration option that conveniently allows you to disable user changes making it possible to deploy services as an unprivileged user.

Although the above features are interesting, one particular challenge is that the framework cannot guarantee that all possible variations will work after writing a high-level process configuration. The framework facilitates code reuse, but it is not a write once, run anywhere approach.

To make it possible to validate multiple service variants, I have developed a test framework that is built on top of the NixOS test driver that makes it possible to deploy and test a network of NixOS QEMU virtual machines with very minimal storage and RAM overhead.

In this blog post, I will describe how the test framework can be used.

Automating tests


Before developing the test framework, I was mostly testing all my packaged services manually. Because a manual test process is tedious and time consuming, I did not have any test coverage for anything but the most trivial example services. As a result, I frequently ran into many configuration breakages.

Typically, when I want to test a process instance, or a system that is composed of multiple collaborative processes, I perform the following steps:

  • First, I need to deploy the system for a specific process manager and configuration profile, e.g. for a privileged or unprivileged user, in an isolated environment, such as a virtual machine or container.
  • Then I need to wait for all process instances to become available. Readiness checks are critical and typically more complicated than expected -- for most services, there is a time window between a successful invocation of a process and its availability to carry out its primary task, such as accepting network connections. Executing tests before a service is ready, typically results in errors.

    Although there are process managers that can generally deal with this problem (e.g. systemd has the sd_notify protocol and s6 its own protocol and a sd_notify wrapper), the lack of a standardized protocol and its adoption still requires me to manually implement readiness checks.

    (As a sidenote: the only readiness check protocol that is standardized is for traditional System V services that daemonize on their own. The calling parent process should almost terminate immediately, but still wait until the spawned daemon child process notifies it to be ready.

    As described in an earlier blog post, this notification aspect is more complicated to implement than I thought. Moreover, not all traditional System V daemons follow this protocol.)
  • When all process instances are ready, I can check whether they properly carry out their tasks, and whether the integration of these processes work as expected.

An example


I have developed a Nix function: testService that automates the above process using the NixOS test driver -- I can use this function to create a test suite for systems that are made out of running processes, such as the webapps example described in my previous blog posts about the Nix process management framework.

The example system consists of a number of webapp processes with an embedded HTTP server returning HTML pages displaying their identities. Nginx reverse proxies forward incoming connections to the appropriate webapp processes by using their corresponding virtual host header values:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, stateDir ? "/var"
, runtimeDir ? "${stateDir}/run"
, logDir ? "${stateDir}/log"
, cacheDir ? "${stateDir}/cache"
, libDir ? "${stateDir}/lib"
, tmpDir ? (if stateDir == "/var" then "/tmp" else "${stateDir}/tmp")
, forceDisableUserChange ? false
, processManager
}:

let
  sharedConstructors = import ../../../examples/services-agnostic/constructors/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir cacheDir libDir tmpDir forceDisableUserChange processManager;
  };

  constructors = import ../../../examples/webapps-agnostic/constructors/constructors.nix {
    inherit pkgs stateDir runtimeDir logDir tmpDir forceDisableUserChange processManager;
    webappMode = null;
  };
in
rec {
  webapp1 = rec {
    port = 5000;
    dnsName = "webapp1.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "1";
    };
  };

  webapp2 = rec {
    port = 5001;
    dnsName = "webapp2.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "2";
    };
  };

  webapp3 = rec {
    port = 5002;
    dnsName = "webapp3.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "3";
    };
  };

  webapp4 = rec {
    port = 5003;
    dnsName = "webapp4.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "4";
    };
  };

  nginx = rec {
    port = if forceDisableUserChange then 8080 else 80;
    webapps = [ webapp1 webapp2 webapp3 webapp4 ];

    pkg = sharedConstructors.nginxReverseProxyHostBased {
      inherit port webapps;
    } {};
  };

  webapp5 = rec {
    port = 5004;
    dnsName = "webapp5.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "5";
    };
  };

  webapp6 = rec {
    port = 5005;
    dnsName = "webapp6.local";

    pkg = constructors.webapp {
      inherit port;
      instanceSuffix = "6";
    };
  };

  nginx2 = rec {
    port = if forceDisableUserChange then 8081 else 81;
    webapps = [ webapp5 webapp6 ];

    pkg = sharedConstructors.nginxReverseProxyHostBased {
      inherit port webapps;
      instanceSuffix = "2";
    } {};
  };
}

The processes model shown above (processes-advanced.nix) defines the following process instances:

  • There are six webapp process instances, each running an embedded HTTP service, returning HTML pages with their identities. The dnsName property specifies the DNS domain name value that should be used as a virtual host header to make the forwarding from the reverse proxies work.
  • There are two nginx reverse proxy instances. The former: nginx forwards incoming connections to the first four webapp instances. The latter: nginx2 forwards incoming connections to webapp5 and webapp6.

With the following command, I can connect to webapp2 through the first nginx reverse proxy:

$ curl -H 'Host: webapp2.local' http://localhost:8080
<!DOCTYPE html>
<html>
  <head>
    <title>Simple test webapp</title>
  </head>
  <body>
    Simple test webapp listening on port: 5001
  </body>
</html>

Creating a test suite


I can create a test suite for the web application system as follows:

{ pkgs, testService, processManagers, profiles }:

testService {
  exprFile = ./processes.nix;

  readiness = {instanceName, instance, ...}:
    ''
      machine.wait_for_open_port(${toString instance.port})
    '';

  tests = {instanceName, instance, ...}:
    pkgs.lib.optionalString (instanceName == "nginx" || instanceName == "nginx2")
      (pkgs.lib.concatMapStrings (webapp: ''
        machine.succeed(
            "curl --fail -H 'Host: ${webapp.dnsName}' http://localhost:${toString instance.port} | grep ': ${toString webapp.port}'"
        )
      '') instance.webapps);

  inherit processManagers profiles;
}

The Nix expression above invokes testService with the following parameters:

  • processManagers refers to a list of names of all the process managers that should be tested.
  • profiles refers to a list of configuration profiles that should be tested. Currently, it supports privileged for privileged deployments, and unprivileged for unprivileged deployments in an unprivileged user's home directory, without changing user permissions.
  • The exprFile parameter refers to the processes model of the system: processes-advanced.nix shown earlier.
  • The readiness parameter refers to a function that does a readiness check for each process instance. In the above example, it checks whether each service is actually listening on the required TCP port.
  • The tests parameter refers to a function that executes tests for each process instance. In the above example, it ignores all but the nginx instances, because explicitly testing a webapp instance is a redundant operation.

    For each nginx instance, it checks whether all webapp instances can be reached from it, by running the curl command.

The readiness and tests functions take the following parameters: instanceName identifies the process instance in the processes model, and instance refers to the attribute set containing its configuration.

Furthermore, they can refer to global process model configuration parameters:

  • stateDir: The directory in which state files are stored (typically /var for privileged deployments)
  • runtimeDir: The directory in which runtime files are stored (typically /var/run for privileged deployments).
  • forceDisableUserChange: Indicates whether to disable user changes (for unprivileged deployments) or not.

In addition to writing tests that work on instance level, it is also possible to write tests on system level, with the following parameters (not shown in the example):

  • initialTests: instructions that run right after deploying the system, but before the readiness checks, and instance-level tests.
  • postTests: instructions that run after the instance-level tests.

The above functions also accept the same global configuration parameters, and processes that refers to the entire processes model.

We can also configure other properties useful for testing:

  • systemPackages: installs additional packages into the system profile of the test virtual machine.
  • nixosConfig defines a NixOS module with configuration properties that will be added to the NixOS configuration of the test machine.
  • extraParams propagates additional parameters to the processes model.

Composing test functions


The Nix expression above is not self-contained. It is a function definition that needs to be invoked with all required parameters including all the process managers and profiles that we want to test for.

We can compose tests in the following Nix expression:

{ pkgs ? import <nixpkgs> { inherit system; }
, system ? builtins.currentSystem
, processManagers ? [ "supervisord" "sysvinit" "systemd" "disnix" "s6-rc" ]
, profiles ? [ "privileged" "unprivileged" ]
}:

let
  testService = import ../../nixproc/test-driver/universal.nix {
    inherit system;
  };
in
{

  nginx-reverse-proxy-hostbased = import ./nginx-reverse-proxy-hostbased {
    inherit pkgs processManagers profiles testService;
  };

  docker = import ./docker {
    inherit pkgs processManagers profiles testService;
  };

  ...
}

The above partial Nix expression (default.nix) invokes the function defined in the previous Nix expression that resides in the nginx-reverse-proxy-hostbased directory and propagates all required parameters. It also composes other test cases, such as docker.

The parameters of the composition expression allow you to globally configure all the desired service variants:

  • processManagers allows you to select the process managers you want to test for.
  • profiles allows you to select the configuration profiles.

With the following command, we can test our system as a privileged user, using systemd as a process manager:

$ nix-build -A nginx-reverse-proxy-hostbased.privileged.systemd

we can also run the same test, but then as an unprivileged user:

$ nix-build -A nginx-reverse-proxy-hostbased.unprivileged.systemd

In addition to systemd, any configured process manager can be used that works in NixOS. The following command runs a privileged test of the same service for sysvinit:

$ nix-build -A nginx-reverse-proxy-hostbased.privileged.sysvinit

Results


With the test driver in place, I have managed to expand my repository of example services, provided test coverage for them and fixed quite a few bugs in the framework caused by regressions.

Below is a screenshot of Hydra: the Nix-based continuous integration service showing an overview of test results for all kinds of variants of a service:


So far, the following services work multi-instance, with multiple process managers, and (optionally) as an unprivileged user:

  • Apache HTTP server. In the services repository, there are multiple constructors for deploying an Apache HTTP server: to deploy static web applications or dynamic web applications with PHP, and to use it as a reverse proxy (via HTTP and AJP) with HTTP basic authentication optionally enabled.
  • Apache Tomcat.
  • Nginx. For Nginx we also have multiple constructors. One to deploy a configuration for serving static web apps, and two for setting up reverse proxies using paths or virtual hosts to forward incoming requests to the appropriate services.

    The reverse proxy constructors can also generate configurations that will cache the responses of incoming requests.
  • MySQL/MariaDB.
  • PostgreSQL.
  • InfluxDB.
  • MongoDB.
  • OpenSSH.
  • svnserve.
  • xinetd.
  • fcron. By default, the fcron user and group are hardwired into the executable. To facilitate unprivileged user deployments, we automatically create a package build override to propagate the --with-run-non-privileged configuration flag so that it can run as unprivileged user. Similarly, for multiple instances we create an override to use a different user and group that does not conflict with the primary instance.
  • supervisord
  • s6-svscan

The following service also works with multiple instances and multiple process managers, but not as an unprivileged user:


The following services work with multiple process managers, but not multi-instance or as an unprivileged user:

  • D-Bus
  • Disnix
  • nix-daemon
  • Hydra

In theory, the above services could be adjusted to work as an unprivileged user, but doing so is not very useful -- for example, the nix-daemon's purpose is to facilitate multi-user package deployments. As an unprivileged user, you only want to facilitate package deployments for yourself.

Moreover, the multi-instance aspect is IMO also not very useful to explore for these services. For example, I can not think of a useful scenario to have two Hydra instances running next to each other.

Discussion


The test framework described in this blog post is an important feature addition to the Nix process management framework -- it allowed me to package more services and fix quite a few bugs caused by regressions.

I can now finally show that it is doable to package services and make them work under nearly all possible conditions that the framework supports (e.g. multiple instances, multiple process managers, and unprivileged user installations).

The only limitation of the test framework is that it is not operating system agnostic -- the NixOS test driver (that serves as its foundation), only works (as its name implies) with NixOS, which itself is a Linux distribution. As a result, we can not automatically test bsdrc scripts, launchd daemons, and cygrunsrv services.

In theory, it is also possible to make a more generalized test driver that works with multiple operating systems. The NixOS test driver is a combination of ideas (e.g. a shared Nix store between the host and guest system, an API to control QEMU, and an API to manage services). We could also dissect these ideas and run them on conventional QEMU VMs running different operating systems (with the Nix package manager).

Although making a more generalized test driver is interesting, it is beyond the scope of the Nix process management framework (which is about managing process instances, not entire systems).

Another drawback is that while it is possible to test all possible service variants on Linux, it may be very expensive to do so.

However, full process manager coverage is often not required to get a reasonable level of confidence. For many services, it typically suffices to implement the following strategy:

  • Pick two process managers: one that prefers foreground processes (e.g. supervisord) and one that prefers daemons (e.g. sysvinit). This is the most significant difference (from a configuration perspective) between all these different process managers.
  • If a service supports multiple configuration variants, and multiple instances, then create a processes model that concurrently deploys all these variants.

Implementing the above strategy only requires you to test four variants, providing a high degree of certainty that it will work with all other process managers as well.

Future work


Most of the interesting functionality required to work with the Nix process management framework is now implemented. I still need to implement more changes to make it more robust and "dog food" more of my own problems as much as possible.

Moreover, the docker backend still requires a bit more work to make it more usable.

Eventually, I will be thinking of an RFC that will upstream the interesting bits of the framework into Nixpkgs.

Availability


The Nix process management framework repository as well as the example services repository can be obtained from my GitHub page.