: Scott Evans
: Code profiling
This technical note describes how a simple profiler can be used to help visualise how well (or
bad) your code is doing. Code is provided for such a profiling tool and an example of using the
profiler also with source is available.
The theory behind the profiler is really simple. Root counter 1 is set by default to count
scanlines. This gives us an easy way to time our code. All we need to do is record the value of
the root counter before the start of the code we want to profile and then again at the end of the
code. The end time-start time will give us approximately how long the code has taken to
execute in scanlines. The profiler then scales this value and uses it to draw a bar on the
screen. You can then see how long a particular piece of code is taking which is very useful
when trying to decide which functions need to be optimised.
The time taken by the GPU to draw the primitives is a little harder to determine. Normally you
can do this very easily. You can get the GPU to generate an interrupt when it has finished
drawing. A function is set to record the value of root counter 1 and this function is called by the
interrupt handler when the GPU interrupt is generated. Since the Yaroze libraries are very
limiting this cannot be done since the function DrawSyncCallback() which can be used to
set up the previously mentioned interrupt is not available. At the moment I am not sure how to
get around this but I am working on it. For the moment the drawing time is set to the maximum
So enough theory lets get down to business.
The profiler is contained in its own source file profile.c. To add the profiler to your projects
you will need to add profile.c to your build and include profile.h in any files that reference
the profile functions. You will also need to add the files gtypes.h (type definitions) and fp.h
(fixed point macros) to your project.
Before the profiler can be used it needs to be initialised once at the start of the program.
Calling PROFILE_Initialise() will initialise the profiler. It takes three parameters, screen
width and height which are used to scale the profile bars so they fit the screen and the number
of frames to display.
To initialise the profiler for a 320x256 display and set the maximum profile time to 3 frames.
The initialise stage sets the default position of the profile bars and calculates the number of
scanlines in one frame. It also sets a scale value to scale the profile bars so they fit the current
Once the profiler has been initialised you must call PROFILE_Start() at the beginning of
each frame. This is usually just after the call to VSync().
This marks the start of a frame and sets an internal variable start_count to the current
value of root counter 1. We do this so all our times will be based on this reference count and
we do not need to reset the root counter every frame.
The function PROFILE_Read() is used to record the time taken for a piece of code to
execute. It takes three parameters which are the colour of the profile bar. Depending on the
number of readings taken the CPU profile bar is split into sections each with its own colour.
This means you can time lots of different functions and assign a different colour to each.
So in the above example the red section of the CPU profile bar will be the time taken to
execute the function TestFunction1() and the green section is the time taken for
TestFunction2() to execute. The black section will represent the time taken for the code
between PROFILE_Start() and the 1st PROFILE_Read().
There is a limit to the maximum readings that can be taken in one frame. It is set to 128 by
default but can be changed by setting the macro MAX_READINGS in profile.h.
The last thing to do is draw the profile bars. The bars are drawn using the GsBOXF primitive.
The function PROFILE_Draw() adds the profile bars to the ordering table which is passed in
as a parameter.
Right then that is all there is to it. You will find the complete source code to the profiler and the
example program that demonstrates the profiler in action. If you have any problems let me
know. Likewise if you have any suggestions or improvements or know of a better way to do
this the let me know.
The source code and an example.
This document in word format.
This document in text format.