
Tuning loops – from loop unrolling to Duff's device
Do the Spidey
Algorithms are quite literally full of loops, repetition constructs meant to recycle code with iterating control variables. In artificial intelligence, machine learning, and computer graphics, these constructs are even more prevalent (if that is even possible!), with matrix operations commonly expressed as nested loops of scalar operations indexed over a matrix's dimensions.
Because of the inherent outsize effect of a loop's body on any algorithm's performance, a multitude of strategies to optimize loop execution time has emerged over the years. Often coupled to the hardware, and always dependent on the programming model and compiler in use, these optimizations are so well understood as to carry their own names.
Loopy
David Spuler's recent works [1][2] feature the taxonomy of loop manipulation strategies that most closely reflects my own thinking. As I read through David's admirable effort to put encyclopedic order to the many code optimization strategies available, I am narrowly focusing this article on loop tuning. I must note that optimization is always to be performed after a code is working correctly and deemed fit for purpose: Sir Tony Hoare's famous quote stating that "premature optimization is the root of all evil" sums up this point of view neatly [3]. The reasons for such a statement are particularly apparent in the many mangling strategies one can choose to apply to loops (Table 1). These strategies have emerged over the decades. The definitions can be very narrow and specific, because many are meant for use by automatic compiler optimizations.
Table 1
Optimizing
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
