Lead Image © Author, 123RF.com

Lead Image © Author, 123RF.com

Debian's quest for reproducible builds

Bit by Bit

Article from ADMIN 36/2016
Debian's reproducible builds project tries to meet strict security requirements for binary packages from its archives through the creation of bitwise identical binary packages.

A question that is more often asked is whether or not you can trust software at all. If you look at the backdoors required by state authorities and the software companies that comply, or the army of criminal hackers that attempt to foist malicious software onto users, your answer might be, "No."

Usually you can trust the distribution to deliver packages that correspond to the source code from which they were built. These packages can be difficult to manipulate because the content of the archives bears the signature of the respective package maintainer's GPG key. However, these safeguards do not work all the time. For example, Linux Mint recently fell victim to a manipulated image and delivered it to its users.

Although Debian sets the bar high, some developers asked themselves several years ago what they could do to further improve security. The resulting idea: Users can check at home, bit-by-bit, whether a package corresponds to the underlying source code. As early as the turn of the millennium, some initial suggestions for reproducible binary packages appeared on the Debian Developers list, but the idea was dismissed as infeasible.

Still Experimental

The project, still in the experimental phase, has again taken up this basic idea known as reproducible builds [1]. After about two years of intensive work, by 2017 with Debian 9 "stretch," the project reached a point at which Debian could be built in at least a partially reproducible way. As a final target, the developers look to ensure that all packages can be reproducibly built and that the tools specially created for this purpose find their way into the Debian infrastructure. This is done to ensure reproducibility in the future.

The clear promise of the reproducible builds project is as follows: Anyone can build identical binary packages of a given source with bit precision at any time (Figure 1). If things go the way that reproducible builds' creators intend, the future definition of free software will include mandatory reproducibility (Figure 2).

Figure 1: The promise of the project: always reproducible, identical binary packages [1].
Figure 2: The definition of free software should include bit-precision reproducibility. (Slide by Holger Levsen [2]

Broad Test Base

Debian's reproducible builds team constantly tests in the Debian "Testing," "Unstable," and "Experimental" branches. From the results, it creates not only versatile statistics [3], but it also publishes a weekly news blog [4]. In addition to Debian, other distributions are pushing forward to build binary packages reproducibly, including Arch Linux, Fedora, Subgraph OS, and Tails. Similar initiatives exist in the BSD world (NetBSD, FreeBSD), as well as in projects such as Coreboot, OpenWrt, F-Droid, and Guix.

Debian uses 30 host machines to create reproducible builds. All told, around 180 CPU cores and more than 300GB of RAM are available for amd64 , i386 , and armhf builds (Figure 3). The tests are managed by Jenkins [5], builders of software for the continuous integration of software projects. A total of 41 scripts with around 10,000 lines of code were developed in the context of the project.

Figure 3: The project tests under various conditions, including other distributions. (Slide by Holger Levsen [2])

As part of the test, the system builds around 10,000 packages twice. In May 2016, source packages numbering 21,365 from a total of 24,135 were reproducible in Debian, which corresponds to an average of 88.5 percent in the individual branches. "Testing," with more than 90 percent, does better than "unstable" and "experimental."

As of this writing, 1,489 packages still failed to build reproducibly, and the cause was known for two thirds of them. Nevertheless, the project is still far from its target: One of the Debian developers for reproducible builds, Holger Levsen, said in a recent lecture [6] that at present only zero percent of Debian could realistically be built reproducibly. Thus far, reproducible builds has purely been a feasibility study (Figure 4).

Figure 4: The project is still experimental, but 90 percent of the Debian packages can already be built reproducibly. (Slide by Holger Levsen [2])

Identical Build System

The most important prerequisite for reproducible builds lies in meticulous recording of the build system. It must store details of the tools used in the binary package during the initial build. To do so, it writes the data to a new file called .buildinfo within the structure of the package. In subsequent builds, the srebuild Perl script [7] reads this data back out and makes sure that an identical build system is obtained from snapshot.debian.org [8].

Without the details of the build system, a package cannot be reliably reproduced. Even slightly different instructions (compiler flags) lead to different results at the bit level. The problem is that the extended package structure with .buildinfo adds around 100,000 additional files for each Debian suite. This increases the amount of code by up to 50 percent. To reduce the load on the mirror servers, this additional information will thus only reside directly on the Debian servers.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus