Lead Image © Lucy Baldwin, 123RF.com

Lead Image © Lucy Baldwin, 123RF.com

Baselines are more important than  the  benchmark

Witness Mark

Article from ADMIN 71/2022
Defining I/O baselines helps you determine the highest performance you can expect from your system when configured properly.

"It is too slow": These four words launch nearly every performance analysis. Slow is an ambiguous term, which may equally represent very different concerns, including change in performance compared with a previous versions or how the same software ran on some previous day. Equally, it may represent inadequate performance compared with what performance a system is believed to be capable of delivering.

I have examined this second definition before [1], studying how the library of CPU benchmarks published by the 7-Zip project [2] could be used to compare with observed CPU performance on my system. This month, I explore how to define baselines for I/O, including the storage and network subsystems. The I/O paths are more vulnerable to performance setbacks because of system or software misconfiguration than CPU or memory, having to rely on multiple components to perform optimally – or in the case of the network, external hops through the Internet itself.

The Network Is the Computer

Setting a baseline is essentially figuring out what is the highest performance that can be expected if the system on hand is properly configured. The simplest tool reproducing the tested configuration is always to be preferred, to limit the multitude of variables under consideration. Usually, the application the system is meant for is a much richer and more complex beast, and if you are testing for performance, usually the application has already shown undesirable behavior anyway.

The iperf3 tool [3] is your go-to utility to test a network path's baseline, end to end. Found in the Ubuntu Universe repository (install with apt install iperf3) and in macOS's Brew (brew install iperf3), it measures the effective bit rate between two systems (Figure 1) by taking multiple samples and averaging the results. By default, iperf3 uploads from client to server, but the reverse is also possible (-R option), sending data in the opposite direction to validate asymmetric network paths (see the "Hot-Potato Routing" box). Here, I have set up a server in the cloud (Figure 2) running Ubuntu Focal 20.04: The client is running in a terminal on my MacBook as I write. Ports to Microsoft Windows also exist [5], and testing UDP instead of the default TCP is an option, as well.

Hot-Potato Routing

Peering agreements between large networks may not specify traffic cost settlement or prescribe specific routes, granting operators the freedom to define paths between their networks. In such cases, "hot-potato routing" [4] often results, with a network handing over a packet to its peer network at the closest available peering point, to minimize network load and costs. When the peer network adopts the same practice, different routes may be in use for the two directions of a given network connection, with asymmetric paths through the Internet being not at all uncommon. Although not normally a concern and usually invisible to the end user, it is important to keep in mind while testing network connections.

Figure 1: iperf3 client in action. Note how network results can vary significantly over just a few seconds.
Figure 2: Running the iperf3 server in Digital Ocean's New York region to exercise a path to a known location.

Secure Testing

SSL/TLS connections are another common case of networking baseline because they place a significant compute load on the remote endpoint and because of the complexity of the server architectures involved. Often spanning multiple servers on the remote end, an SSL benchmark is the ultimate test of what takes place in practice, as opposed to what theory predicts. The OpenSSL project incorporates two useful benchmarks: openssl speed [6], which tests cryptographic performance in abstract on the local machine without any network access, and the s_time command [7], which performs an all-encompassing, end-to-end test. The s_time benchmark can be used to examine a server's capacity to handle connections in a given time span and how long it took to serve secured content.

Figure 3 shows as example test with Google's search engine as the endpoint, historically one of the Internet's snappiest endpoints. Results are provided for new sessions as well as cached session IDs, which performed significantly better. Another consideration that needs to be kept in mind is that the test is against Google as a whole, their primary domain running in an obvious round-robin setup (see the "Round-Robin DNS" box), and further schemes such as load balancers or Anycast routing [8] are probably in use.

Round-Robin DNS

In a round-robin DNS setup, multiple addresses are offered as destinations for the same domain name. For example, today at my office in Westford www.google.com resolves to the following six different IP addresses:

$ nslookup www.google.com
Non-authoritative answer:
Name:   www.google.com
Name:   www.google.com
Name:   www.google.com
Name:   www.google.com
Name:   www.google.com
Name:   www.google.com

The ordering of these answers will differ from query to query, and the DNS client will pick one address (likely the first) and connect to it. Whether repeated connections to the same domain name result in connections to the same IP is implementation dependent, something a clever performance engineer may well choose to disambiguate explicitly with an IP address, depending on the ultimate objective of the test.

Figure 3: Timing the ability of Google.com to serve SSL sessions, without differentiating whether one or multiple servers are answering the requests.

The Storage with the Most

I have covered a lot of I/O testing tools in this column over the years, and many can be used to define a baseline, which is a technique, rather than a tool. I/O systems are noisy, and benchmarking them can be exceedingly difficult. In some tests, it may be beneficial to eliminate storage altogether and perform the test directly against a RAM disk. Provisioning a half-gigabyte RAM disk is trivially simple in Linux:

# modprobe brd rd_nr=1 rd_size=524288
# ls /dev/ram0

You can use this approach to evaluate the performance effect of encrypted partitions without having to worry about the noise of the underlying storage medium, simply comparing the performance of RAM disk access, with and without making use of encryption. A similar, complementary technique is writing the file directly to disk, without the intervening filesystem layer affecting measurements, as I have demonstrated previously [9]. To test encryption without a filesystem, you have to use a detached header to store encryption keys, lest your benchmark accidentally overwrite them because they are stored on the same drive by default. Listing 1 details the setup process, resulting in /dev/ram0 directly accessing the RAM disk, whereas /dev/mapper/encrypted-ram0 is first encrypted by LUKS [10] before storing to the same memory. Listing 2 then shows the simplest possible benchmark, with dd [11] to compare block performance in both modes. The raw device performs three times as fast as encrypted access to the same storage. The critical finding is that encryption will not be the performance bottleneck in this setup, as long as the storage medium is not capable of more than 210 MB/s of sustained throughput.

Listing 1

LUKS Encryption Overlay Set-up

root@focal:~# # Allocate a half-GB RAM disk
root@focal:~# sudo modprobe brd rd_nr=1 rd_size=524288
root@focal:~# ls /dev/ram0
root@focal:~# fallocate -l 2M header.img
root@focal:~# echo -n "not a secure passphrase" | cryptsetup luksFormat -q /dev/ram0 --header header.img -
root@focal:~# # Open ram0 as an encrypted device
root@focal:~# echo -n "not a secure passphrase" | cryptsetup open --header header.img /dev/ram0 encrypted-ram0
root@focal:~# ls /dev/mapper/encrypted-ram0

Listing 2

Encrypted vs. Plain Text

root@focal:~# dd if=/dev/zero of=/dev/ram0 bs=4k count=100k
102400+0 records in
102400+0 records out
419430400 bytes (419 MB, 400 MiB) copied, 0.535233 s, 784 MB/s
root@focal:~# dd if=/dev/zero of=/dev/mapper/encrypted-ram0 bs=4k count=100k
102400+0 records in
102400+0 records out
419430400 bytes (419 MB, 400 MiB) copied, 1.99686 s, 210 MB/s

The Author

Federico Lucifredi (@0xf2) is the Product Management Director for Ceph Storage at Red Hat, formerly the Ubuntu Server Product Manager at Canonical, and the Linux "Systems Management Czar" at SUSE. He enjoys arcane hardware issues and shell-scripting mysteries and takes his McFlurry shaken, not stirred. You can read more from him in the O'Reilly title AWS System Administration .

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.