Tuning loops – from loop unrolling to Duff's device

Performance Tuning Dojo

Article from ADMIN 88/2025

By Federico Lucifredi

Optimize your code with loop transformations.

Algorithms are quite literally full of loops, repetition constructs meant to recycle code with iterating control variables. In artificial intelligence, machine learning, and computer graphics, these constructs are even more prevalent (if that is even possible!), with matrix operations commonly expressed as nested loops of scalar operations indexed over a matrix's dimensions.

Because of the inherent outsize effect of a loop's body on any algorithm's performance, a multitude of strategies to optimize loop execution time has emerged over the years. Often coupled to the hardware, and always dependent on the programming model and compiler in use, these optimizations are so well understood as to carry their own names.

Loopy

David Spuler's recent works [1][2] feature the taxonomy of loop manipulation strategies that most closely reflects my own thinking. As I read through David's admirable effort to put encyclopedic order to the many code optimization strategies available, I am narrowly focusing this article on loop tuning. I must note that optimization is always to be performed after a code is working correctly and deemed fit for purpose: Sir Tony Hoare's famous quote stating that "premature optimization is the root of all evil" sums up this point of view neatly [3]. The reasons for such a statement are particularly apparent in the many mangling strategies one can choose to apply to loops (Table 1). These strategies have emerged over the decades. The definitions can be very narrow and specific, because many are meant for use by automatic compiler optimizations.

Table 1

Optimizing

...

Use Express-Checkout link below to read the full article (PDF).