Tuesday, April 14, 2026

Building customized Linux distributions


Last year, I did a number of interesting retro-computing projects, such as running Linux and NetBSD on my Amiga 4000, and building a retro PC to run interesting applications from the late 90s, early 2000s. To run these kinds of applications, I deployed four kinds of classic PC operating system installations: Windows 98 SE, Windows XP, Slackware Linux 8.0 and MS-DOS 6.22.

It was fun to experiment with these classic operating systems. Already from a very young age, I have enjoyed exploring different kinds of operating systems, including their underlying concepts and capabilities.

One of the challenges that I ran into is producing a Slackware 8.0 Linux installation that covers all my needs. Already in 2001 (when this Linux distribution was still mainstream), I faced customization issues with all kinds of Linux distributions and found myself spending a considerable amount of time extending and tuning Linux-based systems.

I ran into the same kinds of challenges for my retro computing experiments. Similar to 2001, due to the amount of customization work, I became motivated to construct a Linux installation from scratch that fully implements my desired configuration for late 90s, early 00s experiences.

To make the work easier, I have decided to dig up old knowledge and old solutions. I could re-invent the wheel, but I consider it far more efficient to dig up my old personal Linux from Scratch-related projects that I have actively been developing between 2001-2009.

The first step in that process is refurbishing my custom developed automation tool.

In this blog post, I will explain the background of my customization work and show my automation tool.

Background


There are a number of reasons why no existing Linux distribution fully captured all my needs and I became motivated to heavily customize Linux distributions and even build my own distribution from scratch.

The modularity of Linux systems


One important reason is caused by the fact that Linux systems are modular and assembled from a variety of components maintained by various kinds of developers. Sometimes distributions make choices that I do not always prefer.

Contrary to the other operating systems that I mentioned in the introduction, Linux is, technically speaking, not a full operating system, but an operating system kernel. The kernel is the process that manages hardware resources and provides an interface to services (such as process management, memory management and filesystems) that applications can use. The kernel does not do anything directly for the user -- Linux needs to be combined with various components to become a useful system.

Linux is often complemented with tools from the GNU project, which purpose is to build a free (libre) UNIX-like operating system. The GNU project argues that for this reason, most Linux-based systems should be called GNU/Linux systems.

The reason why Linux is strongly connected to the GNU project is because Linus Torvalds from its beginning (in 1991) uses components from the GNU project, such the GNU C Compiler (GCC) and GNU assembler to compile the Linux kernel. Moreover, he complemented the first release of the Linux kernel with the GNU Bash shell to produce a usable system.

Furthermore, in 1991 a working kernel was one of the pieces still missing in the GNU project to produce a completely free UNIX-like system (GNU's own kernel: GNU Hurd was already in development before 1991, but not quite usable at the time Linux was first released).

In addition to the GNU project, there are many packages used from different kinds of projects, such as the X Window System, and the KDE and GNOME desktops.

There are various parties distributing pre-assembled Linux systems called Linux distributions. Already since the 90s, there have been many kinds of Linux distributions. Furthermore, there is no single accepted mainstream Linux distribution.

Although many conventional Linux distributions share many kinds of packages in addition to the Linux kernel (e.g. GNU packages, the X Window System and other kinds of utilities), there are also numerous of differences between them. For example, distributions may differ in the selected versions/variants of packages, the used package manager, complementary tools, the default desktop, and the amount of customizations for specific user groups.

Some Linux distributions are tailored towards the needs of (non-technical) end-users and have a substantial amount of end-user software packages included, including games and productivity software, which is really convenient.

There are also Linux distributions that use a completely different set of complementary components rather than the commonly used ones. A prominent example is Android, a commonly used mobile operating system.

Android uses the Linux kernel, but most of the remaining operating system components are custom developed and substantially different than conventional Linux distributions. For example, the Android project does not use, apart from the GNU build toolchain, any GNU utilities, GNU C library or the X Window System.

The distribution of Linux applications


Already since the mid 90s, most Linux distributions include package managers for handling the life-cycle of software packages by implementing activities such as installing, upgrading and removing packages.

Package managers make the installation of packages quite convenient -- with a single command, you can easily install packages from a distribution's package repository, including all required dependencies. Some distributions have nice graphical front-ends for their package managers.

Despite the fact that package managers are powerful and convenient, application deployment is not a fully solved problem for non-technical end-users. For example, quite often it is not straight forward to deploy external software packages that are not part of a Linux distribution.

I ran into the same kinds problems already in 1999. Although Slackware is convenient and covers many of my needs, I wanted to extend my Slackware 8.0 installation with quite a few non-distribution packages: development tools, games, and random utilities. I also had to tweak various configuration aspects to make certain applications more convenient to use.

Very few third-party software developers provide pre-built binary packages for Linux distributions. Despite the fact that many Linux distributions offer similar kinds of packages (e.g. the Linux kernel, GNU tools and libraries), they are not fully binary compatible with each other (for example, the versions of libraries in each distribution may be slightly different, the locations where these dependencies can be found may be slightly different). This is a problem that most third-party developers find inconvenient or impossible to solve.

There are some external projects that do offer pre-built binary packages (albeit for a limited set of Linux distributions only), but this is the exception rather than the rule.

It is a far more common practice that Linux application developers distribute their software in source code form only (e.g. a tarball with source code and build scripts). The responsibility of providing pre-built binary packages is delegated to the package maintainers of a Linux distribution or the users themselves.

It is a bit unfortunate that Slackware has never been a target for many third-party software developers, so the only way to go forward is to compile packages from source code myself.

The fact that you can compile software packages from source code is IMO a double edged sword -- I consider it a gift (for technical users, such as myself) that you can study and modify software packages, but also a burden, in particular for non-technical end-users because of the amount of technical details that you may get exposed to.

Fortunately, many packages can be built from source code in a straight forward way, such as:

./configure
make
make install

The above procedure often suffices to deploy packages in source code form that use GNU Autotools to manage its build infrastructure, which is the most common solution used for source packages -- the configure script checks the configuration of the system and provides configuration settings for the build procedure, make builds the project from source code, and make install installs the package on the target machine.

However, despite the fact that the procedure shown above is mostly straight forward, there are various kinds of problems that you may run into, such as:

  • Missing dependencies. Packages are rarely self contained -- they rely on the presence of existing packages on the system. For example, you need a number of tools to build C/C++ projects (e.g. GNU binutils, GCC, GNU Make).

    To build an application that integrates with the GNOME desktop, you need the GTK+ library to present.

    If a dependency is missing, the build procedure will fail. Some packages, may also have dependencies on packages that are not present in the Linux distribution's repository. Then these dependencies must be compiled from source code first.
  • There may be all kinds of incompatibilities causing a build to fail. For example, the project may use features in the C programming language that only a newer gcc compiler supports. Required libraries may be present, but they may be the wrong versions (for example, they may be too old).
  • Some additional configuration aspects need to be solved. For example, for my Slackware 8.0 installation I deployed a movie player: MPlayer from scratch. The package does not include a .desktop file making it impossible to conveniently launch the program from the KDE and GNOME application launchers. I had to create these configuration files myself.
  • The lifecycle of an external package is not managed by the distribution's package manager. If you run make install the package gets installed, but at some point you may want to remove or upgrade it. Although some projects may also offer a make uninstall target, this is not a guarantee.

    Furthermore, if you remove the source code tree after compiling it, uninstalling can no longer be done automatically. There are ways to properly package something for your distribution's package manager, but these are often not trivial.
  • It may take quite a bit of time for some projects to get build. I still remember that compiling OpenOffice.org in 2002 took me roughly 6 hours.

Nowadays, Linux is quite successful on servers, mobile devices (through Android) and embedded devices, but on the desktop it is a different story, even today.

According to Linus Torvalds, the fact that there is not a strong degree of lasting binary compatibility heavily contributed to the fact that it never reached a big audience on the desktop. It makes it difficult for software developers to ship software for Linux and because relevant external applications are hard to obtain in binary form it is difficult for users to deploy them.

To cope with this problem, the Linux kernel development community adopted a policy stating that they "do not break the userspace". If applications rely on the kernel in a certain way and this has become an established practice, the kernel developers consider its breakage a kernel bug that needs to be fixed.

Although the Linux kernel has adopted a compatibility policy, other critical components did not. For example, the GNU C Library (a.k.a. glibc, the library collection that almost all packages directly or indirectly rely on) had incompatible API changes in the past. The same applies to the standard C++ library in GCC.

Despite the fact that compatibility is still a problem today, work has been done to improve the situation:

  • The Linux Standards Base (LSB) was an effort to provide a standardized sub set of a system to ensure binary compatibility amongst Linux distributions, such as libraries, tools and a file system organization. It defines the RPM package manager to be the standard package management solution. Despite the effort, there were not many distributions willing to adopt it (most likely due to some controversial choices, such as RPM), with the exception of some distributions targeting large companies, such as RedHat Enterprise Linux. The effort was stopped in 2015.
  • Some distributions have much larger package repositories compared to 20 years ago, making it more likely for end-users to obtain the software they want. For example, Debian and Nixpkgs (the package set that NixOS uses) have a substantial amount of packages in their repositories (i.e. thousands) maintained by a large number of contributors.
  • There are universal deployment solutions available that work across multiple Linux distributions. The most prominent example of such a solution is IMO: Docker, that gained widespread adoption. Unfortunately, Docker is mostly used for server software and development tools, not for end-user applications.

    For end-user applications, various universal solutions exist. Some prominent examples are: Flatpak, Snappy, Nix, and Guix. In the past, I wrote blog posts about some of these solutions comparing their properties. Despite having a significant userbase, none of these solutions gained widespread acceptance yet.

Shared libraries


Another compelling reason for me to build a Linux system completely from scratch is the compatibility problems that you may run into when you need to update shared libraries.

Using shared libraries is not a feature unique to Linux (or other kinds of UNIX-like systems). Many operating systems have them -- I was already exposed to the idea of shared libraries in AmigaOS. Moreover, Windows also prominently supports shared libraries.

Using shared libraries have a number of benefits:

  • Reuse. Many applications provide the same kinds of functionality. For example, there are quite a few applications that need to work with image formats, such as JPEG, GIF and PNG. Rather than reimplementing this kind of functionality from scratch, developers can utilize shared libraries that provide encoding and decoding functionality, saving precious development time.

    The same idea applies to applications with graphical user interfaces -- developers typically use shared libraries for implementing the GUI elements, such as buttons, text fields and menus, to save development time and to be sure that the application integrates well with the desktop environment.
  • Resource optimization. Another advantage of using shared libraries is that they reduce the size of the executables. When using shared libraries, functionality is not integrated into the executables, but executables refer to external library files that are stored only once on disk. As a result, it helps to considerably reduce the total disk space consumption.

    In Linux, shared libraries are also only stored in RAM once. As long as the memory pages are not changed, this concept also reduces RAM consumption when multiple applications with common functionality run concurrently.
  • More upgrade flexibility. When a bug or security issue was found, users can replace a shared library with a newer version causing all dependent applications to be fixed, without requiring to modify the executables themselves.

In Linux and many other UNIX-like systems, the amount of reuse between applications is heavily optimized and raised to almost a maximum. In a typical Linux installation, there are quite a few shared library files.

Shared libraries also have drawbacks:

  • Applications are no longer self contained. All required shared libraries must be present on the system before running the application. If any of them are missing an application will most likely not work.
  • Compatibility problems after upgrading. Newer versions of shared libraries may not be backwards compatible with older versions. Sometimes backwards incompatibility is unintentional -- for example, an application requires certain odd behaviour of a shared library (e.g. a bug). Such an application can break if an error corrected library has been installed.

I have seen quite a few incompatibilities between versions of libraries. For example, libpng 1.6 is no longer backwards compatible with libpng 1.2. If the default version on your Linux distribution is version 1.2, you cannot replace the existing libpng installation with version 1.6, or applications that require PNG functionality will break.

It is possible to, for example, to install an incompatible version of libpng alongside an existing version, but doing this is not straight forward: you have to instruct the build scripts of packages to work with libraries residing in a different location than the usual (e.g. normally shared libraries are installed in /usr/lib, but you can install a different version in a different prefix, such as: /opt/libpng-1.6.1/lib and instruct build scripts to use this version). Coping with these unconventional installation set ups is definitely not something you want to bother non-technical end-users with.

Some libraries are really hard to configure to co-exist on a single system. For example, glibc (the standard C library) is a library package that almost any package on a Linux system directly or indirectly relies on (because most of the software is written in C or requires a component that is implemented in C).

Upgrading to a new minor version, e.g. from 2.2.4 to 2.2.5, is a compatible upgrade, but upgrading to a new major version (e.g. from 2.2 to 2.3) is not. As a result, it is impossible to replace the previous version. Installing two versions of glibc next to each other and configuring a set of applications to use another version is technically possible, but extremely hard to do by conventional means.

About the modularity aspect of operating systems


Although Linux systems have, due to their origins, a number of drawbacks when it comes to application deployment (including upgrading), I do appreciate its modularity aspects.

IMO modularity is a valuable operating aspect that I believe has become a forgotten property. Nowadays, the most commonly used consumer operating systems (e.g. Windows, Android, macOS and iOS) have very weak modularity properties and their installations are quite substantial in size.

With modularity you can conveniently grow the features of your software installation, or shrink them if needed. If your operating system has a well-defined core that is not too large and flexible enough, you can even shrink the size of an installation so much so that it can run on a single floppy disk. This used to be a common property of a number of operating systems frequently used in the past.

Moreover, IMO modularity also significantly helps in understanding the architecture of an operating system. If it has a relatively simple core, knowing its essentials can give you a compass to base your knowledge on and finding your way. I consider this to be very valuable -- an operating system should be developed to serve its users, not the other way around.

AmigaOS


I was first exposed to the idea of a modular operating system when I was still actively using AmigaOS as my main operating system. I already knew that AmigaOS consisted of multiple parts such as:

  • The kernel is called Exec. It is a microkernel that is responsible for memory allocation, task management, library management, message passing and interrupt handling.
  • The sub system that handles process management, file systems and the command-line interface is called AmigaDOS.
  • The sub system that does graphical window management is called Intuition.
  • Workbench is the graphical desktop environment.

Most of the above operating system components reside in the Kickstart ROM. They are instantly present after powering up the machine.

The earliest Amiga models, the Amiga 1000 and Amiga 500 (the most popular and frequently used Amiga model), were primarily floppy based systems -- these Amiga models do not include any hard drive by default.

As a result, it was a common practice to work with many kinds of bootable disks: when switching on the machine, it shows a splash screen requesting the user to insert a bootable floppy disk.

AmigaOS was flexible enough for people to create custom boot disks and this feature was frequently used. Most of the available Amiga applications were distributed as a bootable floppies. (As a sidenote: most commercial games typically used custom bootblocks bypassing most of the facilities of AmigaOS).

I have also spend a great amount of time creating customized boot disks and toying around with settings including visual settings, such as adjusting the text, background and window colors and modifying the mouse cursor. I have also created boot disks for my custom developed games with AMOS Professional.

Creating a minimal bootdisk in AmigaOS is very straight forward. On the command-line prompt, I can format a disk in the primary disk drive with the following command:

FORMAT DF0:

and make it bootable by installing a bootblock on it:

INSTALL DF0:

Running the above two commands suffice to have a minimal bootdisk. After bootup, it will show a window with a command-line interface.

A minimal bootdisk can then be enhanced with additional features, such as custom applications.

For example, if you want to boot into the Workbench, you need to copy two additional executables from the Workbench disk to the boot disk and store them in the DF0:C directory:

  • LoadWB loads the Workbench desktop session.
  • EndCLI ends the currently active command-line interface session.

If you want the Workbench to be loaded at boot time, I can create the following script: DF0:S/Startup-Sequence to accomplish this:

LoadWB
EndCLI >NIL:

MS-DOS


Around the same time, I also learned that modularity was common on PCs using MS-DOS (or IBM PC-DOS). Although I did not have a PC myself, I had plenty of experience because of friends and relatives, and the fact that I could emulate a PC on my Amiga using the KCS PowerPC board.

In MS-DOS, you can format a floppy disk with the system flag to make it bootable:

FORMAT A: /S

Alternatively, you can transfer the system files from an existing installation (such as the hard-drive) to an already formatted floppy disk with:

SYS C:\ A:

The above commands suffice to have to create a boot disk showing a command-line prompt.

A bootable MS-DOS floppy disk contains three files:

  • COMMAND.COM is the command-line interpreter program.
  • IO.SYS contains the default MS-DOS device drivers and the DOS initialization program.
  • MSDOS.SYS contains the kernel and is responsible for file access and program management.

In the past, I have used bootable MS-DOS floppy disks for a variety of reasons. For example, I have used it to distribute some of my custom made MS-DOS QBASIC programs.

I typically formatted a bootable floppy disk, copied QBASIC.EXE to it along with my programs, and added a menu to select a program to run (sometimes I used a menu manager program or sometimes a custom develop menu using the CHOICE command-line tool).

For example, to automatically boot from a floppy disk into a QBASIC program, I can create the following AUTOEXEC.BAT file:

@ECHO OFF
QBASIC MYPROG.BAS

Windows 95 and later versions


A couple of years after Commodore's demise (late 1996), I made the transition to the PC platform permanently. By this time, Windows 95 became the main recommended operating system for home PCs instead of MS-DOS.

I was happy with all the new hardware capabilities that my new PC provided me (a faster CPU, more RAM, more storage, better graphics, better sound etc) and software capabilities (such as pre-emptive multi-tasking, better filesystem support etc.), but I was terrified of the costs of running Windows 95.

Windows 95 requires several hundreds of megabytes of storage. Back then, the total capacity of my hard drive was only 2 GiB. Furthermore, it had the option to enable and disable certain components, but even in its most basic form it was quite resource demanding.

Also, it was it was as good as impossible to create small/customized installations anymore. It was also much harder to know of which components Windows 95 consisted and how to control them. The time that it takes for Windows 95 to boot was also a bit disturbing to me.

Although Windows 95 still gave users the option to create bootable MS-DOS disks (because essentially Windows 95 was still a Windows + MS-DOS hybrid), the ability to create custom installations was pretty much lost. In later versions of Windows, that functionality became completely unavailable.

Linux


Fortunately, when I actively started to use Linux in 1999, I gradually learned that Linux has the modularity properties that I used to cared about. While studying the available Linux HOWTOs, I discovered the "From Power Up To Bash Prompt" HOWTO that explains how the boot process of a typical Linux system works and the components that are involved.

Building a minimal Linux-based system is relatively straight forward. As explained earlier, the component that is called Linux is only a kernel. Linux manages hardware resources and provides an interface for operating system services to applications.

When the Linux kernel is loaded and its initialization is done, it loads an external program that is called the init process. By default, Linux tries a number of executables, such as: /bin/init and /sbin/init but the init process can also be specified through the init= kernel parameter. From the perspective of the kernel, the init process can be any kind of program.

Because the init process is flexible, you can, for example, create a very minimal Linux-based system that directly starts the GNU Bash shell (for example, by specifying /bin/bash as the init process). After the initialization of the kernel was completed, you will directly see a shell prompt and you have super-user privileges immediately.

Alternatively, you can also, for example, use busybox which is a toolset providing a collection of useful commands/programs embedded in a single executable.

In conventional Linux distributions, the Linux kernel is never directly loaded after powering up a machine -- a bootloader program is responsible for this. There are a variety of bootloaders available for a variety of platforms. Today, GRUB and systemd-boot are the most common solutions on the PC platform. In the past, LILO was frequently used.

(As a sidenote: older versions of Linux can also directly boot from the Master Boot Record (MBR) of any medium without using a bootloader, but this feature was eventually removed).

Conventional Linux distributions typically use an init program that have a number of responsibilities. In the past, a program called sysvinit was commonly used. Nowadays, it is often systemd.

sysvinit is a process supervisor and primitive service manager that loads a number of processes on startup (defined in a configuration file: /etc/inittab). Typically, this inittab starts an initialization script that sets the boot process in motion (such as loading system services) and a number of terminal session managers:

  • Terminal sessions are often managed by the agetty program. Typically a number of them (e.g. six) are started concurrently. With the Alt+F1-F6 key combinations you can switch between them.
  • agetty typically starts a login manager (/bin/login) showing a login prompt.
  • The login manager starts the login shell of the user if a login was successful (in most conventional Linux distributions, the default login shell of a user is /bin/bash).

Similar to AmigaOS and MS-DOS, Linux also used to be flexible enough to form a minimal system that can be used from a boot floppy.

(As a sidenote: that property was eventually lost because floppy drives have become obsolete and the Linux kernel has grown too large to fit on a floppy disk. However, as of today it is still possible to produce Linux-based systems that are relatively small. Tiny Core Linux is an example of Linux distribution that can provide a FLTK/FLWM desktop that is roughly 16 MiB in size)

Limitations of my Slackware 8.0 installation


I have managed to get quite a few applications running on my Slackware 8.0 installation that were not part of the distribution. Furthermore, I have applied various kinds of configuration changes to make my life more convenient, such as adding XDG desktop files so that applications can be started from the KDE program launcher.

Despite my customization efforts, still not all of my Slackware 8.0 installation is completely how I want it. For example, Slackware 8.0 includes the KDE 2.1 desktop. For my retro-computing project, I prefer to use the latest KDE version in the 2.x version range: KDE 2.2.2. Upgrading to the next Slackware release (9.0) is not an option, because that version includes KDE 3.1.

Although it is technically possible to package KDE 2.2.2 for Slackware 8.0, I know that that I will break some existing applications because libraries will get replaced with incompatible versions and I require newer versions of some existing dependencies. I consider this configuration process to be too tedious and time consuming to do on my Slackware installation.

As a result, I have decided to build my own customized Linux distribution from scratch. In my custom distribution, I can pick all the variants of the packages that I want.

The Linux from Scratch book


Building a custom Linux distribution from scratch may sound very ambitious but it is actually a well documented process -- I did not have to re-invent the wheel.

In 2001, I heavily studied the available HOWTOs on the Linux Documentation Project page. In addition to the "From Power Up To Bash Prompt" HOWTO I also discovered the Linux from Scratch HOWTO.

The first HOWTO's purpose is to cover the components involved in the booting process. The scope of Linux from Scratch (LFS) HOWTO goes far beyond the booting process -- its purpose is construct a fairly complete bare bones Linux system using variants of packages that are commonly used in mainstream Linux distributions.

In addition to assembling a fairly complete bare bones system, the Linux from Scratch book has additional constraints:

  • Ensuring correctness. The purpose of the book is to create a Linux system with exactly the versions of the packages that you have selected, built in the way that you want it (e.g. using your selected optional settings). These packages are typically different than your host system's packages. As a result, you cannot copy binaries from the host system to your target system.
  • Building all packages from source code. As explained earlier in this blog post, it is a common habit that Linux packages are distributed in source form as a universal packaging format. For your own custom distribution, you do not have a package manager at your disposal that offers pre-built binaries. As a consequence, building everything from source code is the only way forward.
  • Making sure that your assembled system is self-contained. When building your custom system, you do not want to retain a dependency on anything that resides on the host system from which the builds are done. Otherwise, your custom Linux system is no longer self contained. Unfortunately, for most packages the problem is that if you follow their regular build procedures, they will have dependencies on shared libraries on the host system. You must follow a strategy to break that dependency.
  • Bootstrapping the GNU build toolchain. You will face a chicken-or-the-egg problem with the GNU Compiler Collection (GCC) package, that includes a C compiler. The paradox is that GCC is also written in C. As a result, you first need the host system's gcc compiler to compile it. However, you also want gcc (and additional build tools) to be built in such a way that the resulting binaries are not influenced by the fact how your host system's gcc compiles. To cope with this, the GCC compiler can bootstrap itself -- it will compile itself multiple times, first by using the host system's compiler, then by using an intermediate compiler that compiles the target compiler.

I also learned that the Linux from Scratch book has its own homepage that typically contains a more up-to-date version of the book.

In addition to the book (describing how to construct a basic system), there are a number of interesting sister projects, such as:

  • The Beyond Linux From Scratch (BLFS) book describes how to extend a bare bones LFS system with software packages to make users more productive, such as utilities, productivity software, desktop environments and server software.
  • The hints project contains various kinds of externally contributed documents describing how to extend or modify your LFS system.

Deploying Linux from Scratch installations


I have successfully deployed many kinds of LFS-based installations between 2001 and 2009.

If your goal is to learn about the composition of Linux systems, then I recommend people to faithfully follow the book and manually type in all the shell instructions. In my own experience, manually typing in the commands helps you in processing and understanding the provided information.

However, if your goal is also to regularly use LFS-based installations as a basis for developing custom systems, then I would recommend you to automate to make the process manageable.

Currently, there are several kinds of solutions available that can help you. For example, there is a sub project called Automated Linux from Scratch (ALFS). In this sub project various kinds of automated solutions were developed.

Currently, jhalfs is the preferred implementation for automating the Linux from Scratch book. However, the main purpose of this tool is extract and execute the shell instructions from the book (the Docbook XML code to be precise), not to facilitate custom installations.

Back in 2001, when I first started to experiment with the Linux from Scratch book, there were no automated solutions. As a consequence, I developed my own. My custom solution went through a number of iterations to become what it is now:

The big script


In the very beginning, I used to type in all commands manually. Quite quickly, I realized that this process is quite tedious if you have to repeat it frequently -- some packages take a long time to build, requiring you to wait behind your computer for a long time. For example, gcc and glibc took over an hour to build on my computer in 2001.

Another drawback is that I sometimes make mistakes, e.g. typos or not faithfully following the versions of packages described in the book (e.g. glibc 2.3.1 instead of glibc 2.2.5). The consequence of these mistakes is that I typically had to reformat my LFS partition and start over again.

The first form of automation that I used was quite simple -- rather than typing in the shell instructions in the console, I put them in a giant shell script. After typing in all the commands, I execute the script in one go.

(As a sidenote: to be accurate, my solution consisted of two monolithic scripts. The first script constructed the bootstrap system. After completing the first script, I had to change the root directory to the LFS partition by using the chroot command. In the chroot environment, I ran the second script that executes the remaining steps).

Although my technique was not particularly elegant, using a script was a big help in the construction of my own basic LFS system. Although I could not prevent mistakes, the advantage of using a script is that the process became repeatable. It was no longer required for me to sit behind my computer for hours. When I made a mistake, I could correct the problem in the shell script, discard my broken LFS installation and repeat the process.

Scripts and sequences


I told various kinds of people about my successful LFS experiences including my cousin. He wanted to borrow my scripts to see if he could roll out LFS on his computer. He was not successful due to some differences on his machine compared to mine. Because my script was huge, he told me that it was quite hard to read and modify it.

I took that feedback into account -- to improve readability and maintainability, I splitted my monolithic shell script into small parts.

The Linux from Scratch book is organized in many sections: for example, each package build and each configuration step (e.g. generating a configuration file) is described in its own section giving the reader a pretty good overview of the components that a system consists of and the progress that was made.

I have decided to organize my refactored scripts in a similar way: each package and configuration step was stored in a separate file. A coordinating shell script (that I would eventually call a sequence script) was used to call these individual scripts in the required order.

In addition to better code quality, this new organization also gave me another benefit -- I could now also more easily automate the deployment of the additional packages that I wanted to use on top of it.

Previously, after successfully constructing a bare bones system with my monolithic shell script, I was still compiling additional packages by hand, such as the X Window System, various image decoding libraries and Window Maker -- a window manager providing a desktop experience similar to NeXTSTEP. Although their deployment procedures are not too complicated, it was still time consuming to repeat it.

With my new organization, I ended up automating the construction of all my packages, including the ones that were not part of the Linux from Scratch book.

As a result of having more convenient automation, I was also able to heavily extend the amount of custom packages -- I ended up packaging many utilities, networking software, the entire GNOME and KDE desktops and various end-user applications, such as XMMS, MPlayer and The GIMP.

Making the process interruptable and resumable


Although my refactorings made it possible to automate package builds more easily and get more software packages deployed, I also became terrified of the costs.

In the beginning, the time to deploy the base system was considerably longer than the custom packages collection, but that gradually changed. After adding desktops (such as GNOME and KDE) to my packages collection, deploying the custom package collection took significantly longer than the base system. When I added the Mozilla Application Suite, another five hours of build time was added.

As a consequence of these additions, the time that it takes to build the custom package sequence was also much more unpredictable compared to the base system.

I used time estimations to plan the right moments to leave my computer behind to automatically build my system. For example, when my system still used to be small I knew that it would take roughly four hours -- I was able to start the build before going to a side job, and when I got back, the construction of my system would be finished.

I ran into the nasty habit of leaving my computer switched on at night to build packages. My computer used to be located in the back of my bed room. This habit was negatively affecting my quality of sleep. :)

Then I came with another feature addition to my build scripts: make the execution of a sequence of scripts interruptable and resumable.

To implement this functionality I had to re-organize my scripts. I developed a tool that did the orchestration -- it processes a white space separated string specifying the scripts to execute. Rather than having scripts that directly execute something, I used functions to encapsulate their executions.

The tool keeps track of the scripts have been executed. I could set a breakpoint after the execution of a specific script in a sequence so that the process would stop. When starting the sequence another time, it skips the scripts that were previously executed and resumes the process where it was previously interrupted. This interrupt feature was immensely helpful to make me sleep better.

Optional execution of scripts


Some time later, I ran into another interesting use case. For a while, I only deployed my custom LFS installation to my desktop machine. Some time later, I wanted to use another, less powerful, PC as a router and file server. For this machine, I also wanted to use my own custom Linux distribution.

Because this second machine had a different kind of purpose, I only wanted to deploy a sub set of packages to it. For example, it was not required to deploy any desktops to it, such as KDE or GNOME.

To make a smaller deployments possible I have added a new feature tool my build tool: optional sequences. In optional sequences only selected scripts are executed. I also developed another tool allowing you to configure the scripts to be executed in a sequence.

Adding simple package management


Earlier in my blog post, I have explained that one of the drawbacks of deploying source packages is that the life-cycle of the package is not managed by the host system's package manager.

After using my custom built Linux distribution for a while, I discovered the tgz.txt hint from the Linux from Scratch hints sub project that provided me a very easy and practical solution for fully managing the life-cycle of a package: using checkinstall in combination with the Slackware package manager.

checkinstall is a tool that you typically run in combination with the make install command.

checkinstall internally uses another tool that is called: installwatch that records file modifications while executing a process. Installwatch preloads a custom library that intercepts a number of library calls that do file modifications.

checkinstall uses the identified files from installwatch of an installation process to automatically create a package for the host system's package manager (Slackware, RPM or Debian). Finally, the package is deployed by using the host system's package manager so that its life-cycle can be managed.

The Slackware package manager is not the most feature-rich or powerful package manager out there. For example, it has no notion of dependencies. Compared to the other two package managers that checkinstall supports (RPM and Debian) it is the most primitive from a feature perspective.

However, despite its limitations it does have an advantage for Linux from Scratch users -- the Slackware package manager only relies on a Bourne compatible shell and common shell utilities, such as tar. As a result, it can be used quite early in the construction process of a system and you do not need many additional software packages beyond the base system.

If you want to use RPM, you need many kinds of dependencies deployed first (such as Berkeley DB, zlib, GNU Privacy Guard etc.) before RPM itself can be built from scratch. If you want to use checkinstall and rpm together, you need to build all these packages twice.

To automatically create Slackware packages for all relevant components, I have developed an abstraction: in each script I need to specify the build and installation procedure. The installation procedure is automatically monitored by checkinstall so that a Slackware package is automatically created.

Abstractions


After adding a primitive package management solution, I did not add any substantial feature changes to my scripts and tools. I did improve the quality of the tools by developing additional abstractions to make the process more convenient. By this time, I already learned much more about shell scripting.

I have noticed that some of the shell commands that I had to execute in my scripts were quite repetitive. For example, when compiling a package I always had to extract the tarball, open the source code directory, apply patches and remove the build directory after completion. In the Linux from Scratch book most of these details are also typically left out in the instructions.

Eventually, these details were abstracted away from my build scripts collection. By using these abstractions, I only had to specify the name of the source tarball and some other meta data, the build procedure and installation procedure for most conventional packages. The remaining steps (such as extracting the tarball, opening the build directory and removing the build directory) were automatically executed by the tool.

Deploying packages and customizations on conventional Linux distributions


Originally, my build tool was developed to deploy custom Linux distributions based on Linux from Scratch.

At some point during my studies, I had a side job where I was a part time software developer and system administrator. In this job, I had to configure a LAMP stack (Linux, Apache, MySQL, PHP) for hosting my custom developed web applications.

I have used various kinds of Linux distributions, such as Mandrake, Slackware, Fedora Core and Ubuntu. I already knew that it was a bad idea to deploy Linux distributions completely from scratch for production systems that also had to be maintained by my colleagues -- there is too much complexity involved in managing a custom built Linux distribution and it is too time consuming.

Although using a conventional distribution saved me time, it was unavoidable for me to still compile certain packages from source code. For example, my Slackware distribution included a copy of the Apache HTTP server, but it was an old version and it did not include a PHP installation with all the database extensions that I want. Moreover, the included MySQL package was also old. As a result, I had to compile these packages from source code, because there were no alternatives.

I have noticed that the process of building custom packages from source code on a conventional Linux distribution was almost identical to building custom packages on my Linux from Scratch installation.

To automate custom package builds in my side job, I have separated the build tool from the distribution so that it can also be used on top of conventional Linux distributions. I also made additional checkinstall configuration properties configurable, for example, to use checkinstall to produce an RPM or Debian package, rather than a Slackware package.

Naming my build tool


Back in 2001-2009 my build tool did not have its own name -- I named it after my custom Linux distribution that had various kinds of names. I will not go into detail about that, but it may be an interesting discussion for a future blog post. :-)

Because the tool can also be used independently, I have decided to give it its own name: CBT (Conservative Build Tool).

Why is it called CBT? It is a very simple tool that is implemented for one kind of job only: building systems from scratch. It does not do full package management -- package deployment is a much more complicated problem than only building a system.

For example, upgrading already deployed systems is a much harder problem than deploying system from scratch -- when upgrading a package you also need to take its dependencies into account and make sure that they do not break. Moreover, ensuring reproducible builds is also complicated and not something the tool helps you with. There are much better tools out there that take care of these additional deployment tasks.

CBT does not have the intention to change to address these additional issues and become a more advanced tool. That is why call it is called conservative -- it is the inertia to change.

A quick demonstration of its features


As explained earlier in this blog post, in my own build solution I came up with the idea to define build processes as sequences of scripts.

The following file shows an example of a script:

#!/bin/bash -e

source $cbtFunctionsDir/deploySourcePackage
source $cbtBaseDir/settings/testsequence

name=hello
version=2.1.1
group=Tools/Development
description="GNU Hello"
src="$name-$version.tar.gz"

showLongDescription()
{
    cat << "EOF"
The GNU Hello program produces a familiar, friendly greeting. GNU Hello
processes its arguments list to modify its behavior, supports greetings in many
languages, and so on.

The primary purpose of GNU Hello is to demonstrate how to write other programs
to do these things; it serves as a model for GNU coding standards and GNU
maintainer practices.
EOF
}

buildPhase()
{
    ./configure --prefix=/usr
    make
}

installPhase()
{
    make install
}

The purpose of the above script is to build an example package: GNU Hello from source code and install it in the usual directory prefix: /usr.

The script defines the following properties:

  • The script includes a file: deploySourcePackage that provides a function abstraction that conveniently manages build process preventing me to specify steps, such as unpacking the tarball, opening the build directory and removing the build directory.
  • The name attribute is mandatory and specifies the name of the script.
  • The group attribute is mandatory and specifies the group where the script belongs to.
  • The description attribute specifies a description of the package that is used in the selection menu.
  • The src attribute refers to the source tarballs that the build needs. The build abstraction function: deploySourcePackage will automatically unpack the tarball and open the resulting build directory.
  • The showLongDescription function generates a long description of the package describing it in more detail. This long description is also added to the package's meta data.
  • The buildPhase specifies all command-line instructions that need to be executed to build the package from source code. The build infrastructure will also run these commands as an unprivileged user in a protected build directory to prevent a build script from modifying the host system.
  • The installPhase specifies all command-line instructions that need to be executed to install the package. These instructions are executed by checkinstall so that a Slackware package is automatically created and deployed.

The following file shows a partial example of a sequence (sequences/extras):

#!/bin/bash -e

optional="true"

scripts="\
Libraries/Multimedia/smpeg \
Libraries/Multimedia/SDL_mixer \
Libraries/Multimedia/SDL_image \
Libraries/Multimedia/SDL_net \
Applications/hello \
Applications/dia \
Applications/OOo \
Games/supertux \
Games/tuxracer \
Games/duke3d \
...
"

The above file (sequences/extras) is a shell script that sets two variables:

  • The optional variable specifies whether the scripts inside the sequence are optionally executed. By default all scripts are disabled and need to be enabled by the user. If this property is false, then all scripts are considered mandatory and are executed by default unless they are disabled by the user.
  • The scripts variable contains a whitespace separated list of paths to scripts that need to be executed. Applications/hello refers to the script shown in the previous example.

I can use the following command to display a TUI allowing me to select the scripts that I want to execute in a sequence:

$ cbt-cfg-seq sequences/extras

First the TUI will show me the available groups:


After selecting a group, I can select the scripts that I want to execute:


After the selection is complete, I can run the following command to execute my selected scripts in the sequence shown earlier:

$ cbt-run-seq sequences/extras

While running the above command, the first terminal shows me the scripts that are currently executed (I can switch to it with the Alt+F1 key combination). The second terminal (that can be displayed with the key combination: Alt+F2) shows me the actual build output.

If the execution of the sequence takes too long, I can set a breakpoint. For example, to set a breakpoint after the current script in execution I can run:

$ cbt-break-next sequences/extras

In addition to the currently executed script, I can also set a breakpoint after any script in the sequence. Running the following command shows a TUI allowing me to select where to set the breakpoint in the sequence:

$ cbt-break-seq sequences/extras

I can clear the breakpoint with the following command:

$ cbt-clear-brk sequences/extras

In 2001-2009 I was using my build tool to roll out up-to-date Linux from Scratch distributions. I only maintained a single collection of source packages. When I updated packages, I used to discard older versions.

However in 2026, my goal is to use CBT to deploy various kinds of legacy distributions for retro-computing purposes, instead of up-to-date Linux from Scratch distributions. I typically store all my source packages (typically multiple versions of them) in a single catalog.

Sometimes, it can be useful to query the required external files, such as tarballs and patches, that you need to compile packages in a sequence so that you can transfer these files to an installation medium such as a CD-ROM or USB flash drive.

For this purpose, I have developed a new tool that can help me querying the required files:

$ cbt-seq-deps sequences/extras
d/dos2unix/dos2unix-3.1.tar.gz
d/unix2dos/unix2dos-2.3.tar.gz
h/hexedit/hexedit-1.2.3.src.tar.gz
S/SDL_mixer/SDL_mixer-1.2.5.tar.gz
S/SDL_image/SDL_image-1.2.4.tar.gz
S/SDL_net/SDL_net-1.2.5.tar.gz
C/ClanLib/ClanLib-0.6.5-1.tar.gz
C/ClabLib/ClanLib-0.6.5-1-glXGetProcAddressARB.patch
...

The above tool shows all the required external files to execute the scripts in the provided sequence. These files are derived from the files, src, srcs and patches variables of a script or any variable that is specified in the dependencyVariables variable.

Discussion


As a teenager I often felt that when I created something, that it was the best thing in the world. Nowadays, I look at my own work more critically.

With CBT, it became doable for me to roll out my own own customized systems based on Linux from Scratch and deploy source-based packages on top of existing Linux distributions.

It gave me the following benefits:

  • It automates the process making it repeatable.
  • It makes the process interruptable and resumable. This feature helped me to sleep better and to use my computer for work when I need it.
  • It invokes checkinstall so that the lifecycle of deployed packages can be managed by the host system's package manager.
  • It makes it possible to customize my system by allowing me to select which scripts in a sequence can be enabled or disabled.
  • It provides abstractions that perform common house keeping tasks saving me implementation time.

However, there are also many drawbacks. To name a few:

  • No notion of dependencies. Not exactly knowing the dependencies of packages have a number of drawbacks. For example, you must order the scripts manually in a sequence in such a way that no dependency will be missed.

    Moreover, packages may have mandatory and optional dependencies. If a mandatory dependency is missing the build of a package will fail. CBT cannot ensure dependency completeness because dependencies are unknown.

    Furthermore, if you exactly know the dependencies of a package, you can also optimize system deployments by executing package builds that have no dependency on each other in parallel on multi-core/multi-processor machines.
  • No purity/reproducibility. The package binary contents depends on the entire state of the system. This means that building a package today may give a different binary package compare to the same package was built yesterday.

    In a previous situation, fewer packages may be installed and certain configuration aspects were different. There are all kinds of artifacts on a system that may influence a build that you are not aware of. For example, a package may have an undeclared dependency on something that accidentally resides on the system.
  • Weak protection. Although the build phase runs as an unprivileged user preventing the build script to touch anything outside the dedicated build directory, there is no protection against possible harm done in the install phase that runs as root user. While installing a package, files belonging to other packages (or to any of the users) may be removed or overwritten, affecting the operation of the system. In bad situations, you may have to redeploy your entire system from scratch again.

The above drawbacks only state weaknesses that are related to deploying a system from scratch. Deployment becomes even more complicated if you intend to upgrade an already deployed system:

  • Builds are very likely to be impure when you upgrade an existing system. As a result, you cannot be sure that an upgrade delivers the same result as deploying a system from scratch.
  • As mentioned earlier in this blog post, most packages are installed in global namespaces, such as /usr/bin and /usr/lib. If you upgrade packages, then binaries in these directories will be replaced. Moreover, if you upgrade to an incompatible newer version of a package that other packages rely on, you may break a system. Although it is possible by having incompatible packages to co-exist on a system (e.g. by installing them in separate directories), doing this is not trivial.
  • If upgrades get interrupted in the middle, then you may end up having a system in an inconsistent state that is difficult to repair.

An interesting anecdote


Building customized Linux systems system has been a valuable skill to me. In my early side jobs it was immensely helpful to enhance Linux server installations, because I typically had to deploy software package that were not part of a Linux distributions' package repository.

Late 2007, I was looking for an assignment for my Master's thesis. I met my (then master's thesis and PhD thesis supervisor) Eelco Visser. One of the ongoing projects in his research group was software deployment related. One of his PhD students: Eelco Dolstra developed the Nix package manager. By then already a small community was built around it. Moreover, a prototype Linux distribution was built around Nix: NixOS.

Meeting Eelco Visser and Eelco Dolstra confirmed to me that package management was a complex subject. I was very surprised and amazed to see that package management (and software deployment in general) was an academic research topic.

Moreover, I learned that there is much more we can do in the area of package management to make package deployments reliable, reproducible and efficient. For example, I learned that Nix executes package builds much more reliably because all packages were stored in isolation in a Nix store and completely performed as an unprivileged user.

Because of my past experience, I was very motivated to do research in this area. So much that there were hardly any doubts after the completion of my Master's thesis to become a PhD student doing research in distributed software deployment.

Moreover, the reason why I started my blog is to supplement my research with practical information.

Why not using Nix?


Since I am familiar with Nix, NixOS and other kinds of solutions in the Nix eco-system, you may probably wonder why I am using my own primitive solution for this project rather than Nix.

This project is a retro-computing project targeting Linux deployments from early 2002. My tool predates Nix. Nix did not exist in 2002 yet -- its development started in 2003.

Furthermore, in 2003 Nix was only a research prototype. It took a few years before it became mature enough to be used from a practical point of view.

Using a modern Nix version is also not an option -- it uses modern C++ practices that cannot be used with the old GCC version that my custom Linux system includes. Moreover, backporting Nix is all but trivial -- Nix is not a small code base and has many complex features.

Moreover, even if I could backport a modern Nix implementation, there are also substantial costs in using its features -- for example, it requires a substantial amount of RAM to evaluate complex package sets. My retro PC does not have enough RAM to do that.

Despite the limitations of CBT, a strong advantage is that it does not require much system resources and complex dependencies.

Conclusion


In this blog post, I have explained my motivation for building customized Linux distributions and explained the most important ingredient to make that process doable: developing my own (somewhat primitive) custom build tool.

I have been actively using this build tool between 2001-2009 to deploy customized Linux distributions based on Linux from Scratch. For many years, I have used my customized Linux distribution on most of my personal systems.

After not having touched the tool in years, I have revived it for my retro computing experiment with the purpose to build a custom Linux distribution from early 2002 for my retro PC. Moreover, I have also used CBT to automate package builds on my Slackware 8.0 installation.

Since I have a use case for it and I have invested a considerable amount of time and effort in it in the past, I have decided to release CBT on my GitHub page. Back in 2001-2009 I always had the intention to release the tool and the distribution as free and open-source software, but the younger version me always procrastinated with one thing: writing decent end-user documentation. I have decided to fix that now.

Although I have not been deploying any modern Linux from Scratch installations in years and I have little motivation to do that now, I can still strongly recommend the Linux from Scratch book to anyone who wants to learn about the structure of Linux systems and how to deploy software from source code.

In my next blog post, I will show the legacy Linux distribution that I constructed for my retro PC.

No comments:

Post a Comment