Fire Demo - Optimisations

Back Home.

Get the demo source.

Browse my FTP area.

E-mail me.

For comparison the original version of the algorithm is in a comment block in the source code. The best way to see how I've improved the calculation speed is to look at the code, but in summary:

  1. The palette has been put in the 1K data cache.
  2. Where elements from the working array were accessed using an index variable, the access has been restructured to use a constant offset from a single base pointer.
  3. Certain elements in the current array would be read twice, the value being unchanged between each accesses - these are now cached in registers.
  4. Register variables are used wherever possible.
  5. The algorithm calls for division by five and conditional subtraction of two from the result, both are combined into a single look-up table.
  6. Rather than recalculating the base pointers at the start of each line using the indices, it is calculated once, outside the loop and has a constant value subtracted from it at the end of each line.

In the original version of the algorithm the inner loop made 9 reads from and 5 writes to main memory. After applying the above optimisations this became 3 reads from and 5 writes to main memory plus one read from the data cache.

It can be seen from the table, below, that depending on the optimisation options chosen, the program generated by the GCC runs 2-4 times faster than the one produced by CodeWarrior! At the higher optimisation levels, the most aggressive options of each compiler were used. For reference, the unoptimised version takes 566 HSyncs, compiled with GCC and optimisation level 2.

Optimisation
Level

Time (HSyncs)

GCC

CodeWarrior
0 412 1644
1 270 671
2 266 671
3 266 548
4 n/a 548

The timings for GCC clearly show that there is no tangible benefit to be had by using level 3 optimisations. GCC's level 3 also has some undesirable optimisations, such as loop unrolling. Although presented alongside one another, the table should not be read as implying that the optimisation levels in each compiler are equivalent - they are not.