Fast code gets faster, costs go lower, creativity is unlocked, and freedom rings.

In the public sphere if not behind every closed door, the software community preempts any suggestion of a focus on performance with an ubiquitous and misunderstood quote by Donald Knuth: "Premature optimization is the root of all evil." While I have as little love for speculative micro-optimizations as Knuth, careful attention to software performance comes with significant benefits and is subject to feedback loops that can radically improve a project on several levels.
My own experience has been accrued in the field of 3D image processing, where the data are very large. A typical input image is a 512³ voxel unsigned 64-bit image cutout (~1 GB RAM) that is part of a larger image that can consist of hundreds of thousands of these tasks. Without careful attention to the design of each algorithm, the run times of individual tasks can run into hours while the RAM usage explodes.
There are certainly engineers with more experience in high Performance computing than I. However, over the past few years, I’ve had a few small insights into the benefits of speedups and memory pressure reduction. Maybe you’ll find them useful.
Positive Feedback on the Flip Side of Amdahl’s Law
Amdahl’s Law, paraphrased in English, says that speeding up a section of code can at most improve the overall speed by the proportion of time that code is taking to begin with. In other words, if a program takes 50% of the time in section A and 50% in section B, a 100% speedup of section B will result in at most a 2x improvement in the total run time as section A is untouched. Amdahl’s Law is usually referenced with regard to parallelization of code with both serial and parallel components, but it equally applies to single threaded code that is not fully improved.
When I first encountered Amdahl’s Law, it had something of a pessimistic character. No matter how ingenious the solution for a given section of code was, its ultimate potential was limited by everything around it. After spending time improving algorithms and working iteratively with the profiler, I modified this view.
Speedups in one area make speedups to other parts of the code more valuable. To use the previous example, if section A takes 50% of the time, then on either "side" of that equation, any improvement I make will at most be a factor of two. If I double the performance of section A, it now takes up 33% of the total time. This means that section B’s contribution increased from 50% to 66%. If I speed up section A 4x, the section B’s contribution is 80%. This magnifies the effect of additional optimizations in the rest of the program and makes previously seemingly minor contributors suddenly worthwhile to attack.
Put another way, the more radically you increase the performance of your target section, the richer the field of potential new targets becomes. If you only increase the performance 2x versus 10x, your target region will continue to camouflage the other contributors. Your options for increasing performance will appear to be more of a muddle as your improved code continues to be a bottleneck.
You Become a Better Engineer
High performance is the territory where knowledge of algorithms and intricate understanding of systems becomes crucial. While the stuff of day-to-day engineering is of making things that work at all, solving physical problems of execution time, memory capacity, or resource contention forces a more systematic and practical study of the underpinning scientific principles.
In engineering, math is mainly useful when you are starting to operate within striking distance of the limits of what your tools and materials can withstand. When building a typical LEGO tower, one doesn’t need to know the elastic modulus or tensile strength of the plastic bricks as they are so much stronger than the forces being applied. However, if you try to build a habitable structure out of them, you’ll need to know much more about their properties and environmental interactions.
The same goes for computing. When memory and computational loads start to reach the physical limits or interact with other elements of the system, planning and systematic analysis are called for. That’s when you are forced to consider the literature, calculate worst case loads, and tweak algorithms. Furthermore, if you study a problem intensively enough, you might discover something new about its general character and become a scientist.
High performance computing (HPC) is far from the only topic area that produces improved skills. For example, natural language processing, search, motion planning, and computer vision are only tangentially related to performance. However, HPC is very accessible to practicing engineers and its implications touch on every project.
Unlocking Creativity and Avoiding Problems
Every program has an implied time budget in developer patience, user patience, energy, and cost. For the developer, faster code allows more rapid iteration on the surrounding design. For the end user there is reduced program latency, and so the program gains the opportunity to transition into a natural and flexible extension of their will.
Increased efficiency also enables the addition of features that would have otherwise been unimaginable. If a library is improved only to the point that it meets the current performance criteria and no more, as soon as the problem changes, adding features becomes burdensome. Any significant decrease in the performance profile immediately moves the operating regime into the yellow or red. Especially in scientific or other pioneering work, a successful first approach will give way to the desire for more complex methods. Those complex methods may require additional headroom to fit.
As an example, I worked on improving large scale skeletonization, a method for extracting stick figures from 3D labeled images of neurons. I estimated an early but serious attempt would have required about a $500,000 dollars in rented computation time on our largest dataset. After about two years of attacking the problem on and off, the cost was reduced to less than $1000 based purely on improved software. This allowed us to run this algorithm whenever we wanted, iteratively fix mistakes, and improve quality without significant consequence.
I had a similar experience when I wrote the connected components 3d algorithm, which was many times faster than the most common 3D version available. Suddenly, it was possible to use this algorithm joyously and frivolously instead of carefully planning each application. Developer iteration speed increased.
From another perspective, performant code simply avoids problems. Avoiding problems means no roadblocks to plans, no meetings, no diverted resources, and possibly less stress in an organizational context. Unfortunately, it’s harder to demonstrate that a problem was a avoided than a problem was fixed. Still, you can show that your solution is significantly better than a comparable solution or that a chosen approach would be impossible if not for said code fairly effectively.
Reducing Physical Wear and Tear
While software is often thought of as an abstract system or mathematical object, it’s also an active process of electrical flows that run on a physical system. Sometimes, software also operates simple mechanical devices or guides complex robotics.
When considering the physical world, efficient software can be thought of as that which minimizes the expenditure of energy. In pure computational systems, this most often means reducing electrical consumption by the CPU or GPU. Energy efficient computations can reduce the physically degrading thermal cycling of computer components and reduce battery drain on mobile devices. For battery powered devices, low power code does you one better. By reducing drain on the battery in the first place, not only is the battery’s current charge extended, but the increased run time per charge decreases the cycle rate of the battery, increasing the total lifespan of the component. A similar argument applies to the write endurance and total memory capacity of flash memory hard drives. Less writing means more space and longer endurance.
In the mechanical world, software that chooses the shortest route (all else being equal) reduces fuel costs and wearing of mechanical parts. A simple example is a road map where a shorter or faster route can reduce fuel consumption and extend the life of a car by reducing the mileage per trip while also making the passengers happier on average (so long as they are not trying to enjoy the scenic route or are delaying some dreaded encounter at their destination).
Fitting into Tight Spaces
Certain optimizations, such as SIMD intrinsics aren’t always portable. I found that out the hard way when I assumed Intel x86_64 would dominate forever and purchased an ARM64 laptop the following year. However, library code that is written to give good performance using portable methods can fit in many more spaces than you could have foreseen.
If your code is reasonably memory efficient, it can fit in an embedded device. More commonly though, in my experience, I’ve seen people write Github issues to troubleshoot running one of my libraries on huge images ranging over ten gigavoxels when my design criterion was for hundreds of megavoxels. Usually, we can get this to work (such as by supporting 64-bit memory addressing) and then I never hear from anyone ever again because now it just works. If the single threaded performance of the code had been too slow to begin with, scaling to such sizes would have been painful.
On a related note, high performance code also avoids infrastructure perfusion. At a certain point, a machine has a physical limit to how many requests it can serve or numbers it can crunch and how much data it can store. However, these limits are usually somewhat flexible as an engineer can adjust the amount of work each operations requires and trade them against space or time.
Efficient code slows the vertical growth requirements of machines (vertical meaning more RAM, CPUs, SSDs, faster network cards) and it also slows horizontal growth as well (meaning more machines). Vertical growth is expensive and doesn’t scale forever. Horizontal growth scales very well, but complicated at many different levels. Operators seek to keep deployments up to date, machines from encountering resource contention, the physical challenges of building data centers such as running wires, cooling, and moving heavy objects, and related headaches. Infamously, the SEC described how Knight Capital’s failure to update all machines correctly lead to the stock market Flash Crash of 2012.
While Knight Capital probably was pushing their machines to the limit and their issue was more along the lines of change control, if your code can push the limit, you’ll spend less on hardware and might have fewer problems orchestrating the resulting fewer machines. There’s a big difference in difficulty between administering 1, 2, 20, 200, and 2000 servers. Of course, like Knight, you’ll still be left with the problem of designing the interfaces between the different services in your application no matter how fast each component is.
Performance is Liberating
Efficient tools are simpler because they increase the operating envelope for single machine computation. An efficient algorithm can move a program from a large cluster to a smaller cluster or from a smaller cluster to a laptop.
If the program is designed for commodity hardware, this is no small thing. Upton Sinclair wrote in "The Twelve Principles of EPIC (End Poverty In California)": "Private ownership of tools, a basis of freedom when tools are simple becomes a basis of enslavement when tools are complex."
Computers are a powerful engine of computation, a means of production that is often individually owned. Computations that become too complex, by virtue of requiring other people’s equipment and code, become a way for others to exert power over you. For example, if you require hundreds of GB of RAM, an exotic GPU, or dozens of CPU cores, suddenly you’ll need a contract with an industrialist to get the job done. The bigger the computational job, the more decisions about that task (whether to do it, when to do it, how to do it, and at what cost) will be dictated by the wishes of those who have the equipment, money, relationships, and labor pool to make it happen.
Efficient code places more decision making power over computations in the hands of individuals and less wealthy groupings of people, all of whom may have different motivations from a corporation’s simple drive to accumulate. One aspect of freedom the ability to make decisions that materially affect the course of your life in positive ways. While software performance is far, very far, from being the sole determinant of such things, it is neglected compared with arguments for open source and free software licenses which give end-users the option to audit or change software. High performance software makes it physically possible to run your code the way you want to.
In my field of neuroimaging, the stick figure skeletons I mentioned before are a basic structure in analysis. If creating them in large quantities on usefully large datasets costs tens to hundreds of thousands of dollars and requires a cloud contract, this task will be restricted to the most well funded and well connected players. Even benevolent scientists have limited time, attention, and money to help others out. By reducing the cost two to three orders of magnitude, suddenly smaller laboratories, graduate students, and curious outsiders can do it themselves.
What about Environmental Impact?
The climate crisis threatens human civilization on a grand scale and electricity is frequently produced from fossil fuel. Electricity is then consumed by the operation of computer code. Google alone used 12.8 Terrawatt-hours in 2019 according to its 2020 environmental report, more than many countries and more than several US states. It seems intuitive that improving code efficiency should reduce environmental impact. Does it?
The answer is not at all clear and must be considered on a system by system basis. The main reason is induced demand.
Induced demand is not even an accidental side effect. A major reason we improve the efficiency of software is so that it becomes appropriate to run more frequently in more contexts. If you speed up a library by ten times, but it is run a thousand times more frequently, from an environmental perspective have you really gained anything? If code is improved a hundred-fold but is run only ten times more frequently, that’s a win. The ratio of induced demand to improved performance has to be less than one for it to be environmentally beneficial. Sometimes, fast code enables downstream code to be run more extravagantly, in which case even an extremely low ratio might not result in a benefit.
It behooves engineers and scientists designing programs that are intended to run on a major industrial scale to avoid undue excesses. However, it is in the nature of organizations that they will find ways to use up the spare data center capacity so long as the organization is alive and growing. If all teams write performant code, it will adjust the growth trajectory, but I suspect it won’t usually trend towards shrinking.
Thus, it is decisions about purchasing capacity and the size and appetites of consumer bases that are a stronger driver of environmental impact. If only a limited capacity is available, some code will be improved, some plans will be shelved, and others will be curtailed. In other words, total energy usage is not something that can be strongly affected by an individual engineer except in special circumstances, such as for frequently running programs in mobile or embedded devices.
Nonetheless, if the energy used is 100% renewable, should we worry about it? Some large corporations like Google claim they have transitioned to 100% renewable energy or have pledged to do so like Microsoft. Interestingly, both have pledged to eventually become carbon negative, presumably by buying carbon offsets.
Zooming out, the mix of overall energy production across America changes slowly over years or decades. There are some ways to be truly green, such as by tapping a stranded source, like a hydro plant that is disconnected from the grid or underutilized. Another way is to build new power generation without interfering with other construction (e.g. by using up limited parts or labor). In the main though, the pie is reasonably fixed in a given year. Therefore, in the absence of those factors, any single organization claiming to run a large industrial process on renewable energy is simply reallocating a piece of a slowly changing pie and forcing other market participants to use dirty energy.
In the below chart provided by the US Energy Information Agency, the US energy mix is shown to change very slowly with major changes occurring over the course of about a decade. The largest change has been the relatively rapid replacement of coal with natural gas in the past decade. Starting in 2007, the total power generation mostly leveled off after growing steadily for at least sixty years. Renewables and nuclear energy represent ~40% of the current mix, natural gas ~40%, and coal + petrol ~20%.

To a certain degree, purchases of renewable energy do stimulate a market for additional renewable capacity. However, the fact that companies can start claiming they’re using renewables is probably more testament to the genuinely optimistic news that substantial amounts of new renewable energy generation are coming online. The claims of individual organizations using 100% renewable power mainly serve to get employees, suppliers, funders, and customers feeling good about what they’re doing without a view of the larger picture.
Consumer choices, even by corporations, are limited in their ability to improve our lot. I feel that to solve the climate issue, it will take a centrally coordinated process at both national and international levels that can allocate energy to different industrial and consumer sectors and manage the energy production mix on behalf of the whole of society. Only then we can solve this problem in a rational way that doesn’t depend on luck.
In fairness, software companies aren’t power companies, it’s a little ridiculous to expect them to build their own generation. What we can more expect from them is to pay taxes, divest from fossil fuel companies if they hold securities, and hold their total energy consumption under a ceiling set by a regulator while the society changes the energy mix as fast as possible.
The benefits that accrue to high performance code are quite substantial, both for the person developing it and for the community of people using and adapting it. Counterintuitively, developing efficient code is not always good for the environment, so without careful context dependent reasoning and measurements, environmentalism should not be used as a justification.