# Posts tagged optimization

## 10 reasons my O(n²) algorithm is better than your O(n) algorithm

OK, so it’s a controversial title, but the point I want to make is that programmers often get confused about the time complexity, i.e. order O() of an algorithm and performance.  They will happily declare that their O(n) algorithm is so much better than someone else’s O(n²) algorithm as they rip out some simple, stable, working code and replace it with their vastly more complex, un-tested, but ‘optimized’ code, often without even measuring to see if there is in fact any performance gain.  The fact is that O() notation refers to the asymptotic performance of an algorithm and that in a typical environment with typical data you aren’t going to be anywhere close to asymptotic behavior.  Performance is, in fact, influenced by many other factors besides the time complexity of an algorithm.

So what factors can make an O(n²) algorithm better than an O(n) algorithm?

1. It might take a very large N before the O(n²) algorithm is slower than the O(n) algorithm.  And we aren’t talking n=10, n=100, … here, but perhaps millions or billions before the cross-over in performance is achieved.

2. O() refers to time-complexity not to how long the algorithm actually takes to execute on a particular CPU.  For that you need to look at the mix of instructions used and how long each of those instructions takes.  But that in itself is a gross-simplification as we shall see next …

Even if the two algorithms have a very similar number of operations using similar CPU instructions, they still are not going to run at the same speed:

3. The compiler might be able to apply optimizations to one algorithm that it cannot apply to the other, dramatically boosting performance.

4. The CPU itself might be able to optimize the instruction execution sequence through pipelining or other techniques to dramatically boost performance.  A dumb scan of an array for example can be accomplished really quickly by most CPUs because they have been optimized to handle such frequent, repetitive tasks.

5. A simpler O(n²) algorithm might have tighter loops in it than the more-complex O(n) algorithm allowing it to run entirely in on-chip memory, dramatically boosting performance.

Even if two algorithms have similar operations and similar raw CPU performance they may have different space requirements.  The O(n) algorithm might require 10x as much data space to run or maybe a different order of space requirement to the other, e.g. O(n²) space requirement vs O(n) space requirement.

6. The algorithm that requires least data space in the inner loop may find that its data is being held in registers or on-chip RAM during execution and thus receive a significant performance boost.

7. The more space hungry algorithm might have its data paged to disk during operation incurring a huge additional time penalty.

Some algorithms are inherently more parallelizable than others:

8. As we move to a world of multi-core processors it’s going to be more important to be able to parallelize code to get performance gains of 4x, 8x, … PLINQ and other parallel extensions in modern languages can easily be applied to ‘dumb’ nested for-loops to gain significant improvements in performance.  Applying those same speedups to your ‘optimized’ tree-based code might be much harder or even impossible.

‘Better’ is a very subjective word, better performance is just one of many issues

9. The better algorithm is probably the one that’s easiest to write, easiest for others to read, easiest to test and easiest to maintain.  A simple nested for-loop may be 1000x better than some O(n) algorithm when best is defined like this.

10. Even if your algorithm beats mine with all of these factors taken into account that’s no guarantee it will do so in the future.  Compilers evolve, CPUs get smarter and much of that effort is devoted to making dumb code faster.

Conclusion

Premature optimization is always a bad idea.

Many ‘optimizations’ can actually lead to worse not better performance.

Always measure, then optimize, then measure again.

It’s often cheaper to buy another server than it is to have someone ‘optimize’ your code.

O(n) is a much misunderstood topic

On modern servers space complexity is nearly always more important than time complexity these days.  If you can run in on-chip RAM you have a huge advantage.

Optimization is the root cause of much unmaintainable code. It is also the root cause of many bugs. My advice to the vast majority of programmers is to stay away from it: if you have a good architecture you should not need to optimize it at this level.

Personally I divide optimization into three buckets: 1. Optimization measured in seconds, 2. Optimization measured in milliseconds, 3. Optimization measured in microseconds.

#1 and #2 are worth doing, option #3 is rarely worth doing and most novices will often make performance worse not better when they attempt it.

For #1, start by reducing round trips to the server, set caching headers correctly, compress your JS and CSS files, … and follow the other advice Yahoo’s YSlow plugin offers.

#2 is usually concerned with your database architecture and the queries you are making. Novices often fetch too much data either breadth-wise (columns) or depth-wise (rows). Linq-to-Entities is a great help here for many developers because it defers the actual database work to the latest possible moment when all of the query parameters are known and thus does the least possible amount of work.

Many developers think that caching is the answer on #2 but I advise you to stay well away from trying to implement your own caching scheme. Use what’s provided for you in Linq-to-Entities or some other ORM but don’t try to roll you own, you are likely to introduce all manner of bugs related to stale data and concurrency issues. And if your queries aren’t efficient to begin with you application will not survive a restart under heavy load.

#3 is the hardest of all and the least likely to produce any benefit to the end user. Often it will be detrimental. My advice is to stay well away unless (i) you understand AMD and Intel processor architecture, and (ii) you have a loop that’s executed a billion times or more. Microsoft, Intel, AMD and others have spent decades optimizing inefficient code at the CPU and compiler level. Sadly this means that the cleverest algorithm can sometimes be significantly slower than the simplest ‘dumb’ algorithm. A tight loop doing a dumb linear search in O(n) can perform better than some clever tree search that theoretically runs in O(log(n)) time because the former can fit entirely in on-chip cache and can execute many times faster than the latter. If you are tempted to use a tree or other complex data-structure to ‘optimize’ access to a small number of items, think again: a simple array might be better and the code will probably be more maintainable and easier to read.

Overall, in terms of optimization, developers need to understand that CPU time is no longer a premium. These days the priorities are A) Network access, B) Disk access, C) Memory access, D) CPU time. If you can use a million CPU cycles to avoid some expensive disk access then it’s probably worth it. Too often I see people thinking they are ‘optimizing’ their code by stuffing a private member variable with some calculated value so they don’t have to recalculate it. Guess what, the calculation will take microseconds but the extra RAM you just used to store a million extra private member variables has now caused your application to page more often (both on-chip to off-chip and RAM to disk).

Before you do any optimization you should get a tool like JetBrain’s dotTrace, and for memory usage profiling, the Scitech .NET Memory Profiler. These will give you an insight into what needs fixing first and will help you understand where you can trim memory consumption.

Another key rule of optimization is to compare your development (and maintenance costs) for optimized code to the costs of a new server. You could easily spend more trying to get 10% performance gain than it would cost to just get a new quad-core server that will run 20% faster!