-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- OPTIMIZATION TECHNIQUES FOR THE NETYAROZE First Edition (Updated 29/01/01) Http: http://www.netyaroze-europe.com/~harveyc/ Mail: harvey.c@lineone.net Written by Harvey Cotton Information about DCache from Scott Evans -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -------------------------------------------------------------- INTRODUCTION -------------------------------------------------------------- This is a brief document containing information about how to make you Yaroze programs run more efficiently. Some of the information listed here may be applied to other formats, whilst other information is PSX specific. You can test the efficiency of your program by displaying the VSync. This returns the number of hsyncs per cycle. The lower the number, the better. Most of the content is technically correct to the best of my knowledge, however if you find any mistakes, please let me know. -------------------------------------------------------------- BEGINNERS -------------------------------------------------------------- Technique: REDESIGN YOUR ALGORITHMS!! How: This isn't really a specific technique, but it is the most important optimization you can do. Examine functions and routines in your program that are called the most frequently. You should try to: * reduce the amount of variables used * reduce the amount of operations (multiplying etc) * use pointers where possible, instead of duplicating * keep the code short and simple * simply maths expressions and precalculate all you can Technique: LOOKUP TABLES How: The idea is to precalculate as much as possible before your program starts. This means during the loop, the processor does less work. This is commonly used to create sin/cos tables because calculating sin/cos is very slow on the PSX. Disadvantage: Memory is traded for speed. Technique: UNROLLING LOOPS How: I won't really cover this because it is of no use on the PSX. Loop unrolling is only useful in low-level or very very small loops. Disadvantage: Ugh. Technique: LEVEL3 OPTIMIZATION How: Alter your makefile and add -O3 to ensure the compiler performs maximum optimization. The effectiveness varies. Disadvantages: Can sometimes make your program more unstable, especially if you use O3. -------------------------------------------------------------- ADVANCED -------------------------------------------------------------- Technique: MACROS How: This is an effective technique for improving the efficiency of your programs. The idea is to replace frequently called functions with macros. Why? Because it is slower to call functions. Arguments must be passed and returned. E.g. long add_numbers (long x,long y) { return (x+y); } becomes #define add(_x,_y) (_x+_y) Remember to use underscore so compiler doesnt confuse the macro symbols with your variables! What happens in your program though? When the compiler gets to your macro in your program, it simply substitutes the values. E.g. if (add(10,20)==0) is converted into if ((10+20)==0) Disadvantage: Can increase the size your executable. Technique: PADDING How: Replace chars and shorts with LONG. This can have a significant impact on the speed of your functions. The reason why this works is that the PSX is 32-bit. Longs are 32-bit so the processor can work with them straight away. However, when dealing with chars (8-bit) or short (16-bit), the processor must first pad out the data before processing. E.g. convert 16-bit to 32-bit by padding 16 extra bits. Disadvantage: You are basically trading speed for memory. Longs use the most memory. Restrict longs to time critical situations. Technique: REGISTER How: This is an excellent way to speed up your programs. When declaring single local variables, use register before declaring the type and name. E.g. long index; becomes register long index; What this does is to assign your variable to a CPU register (if one is available) instead of assigning your variable in RAM. This means the CPU has direct access to it. Using this register variable is extremely effective in loops. Disadvantage: Obviously, you cannot use this with global or static variables. Also remember that the CPU only has so many spare registers. Use too many and your variable will be assigned to memory by default. Technique: FIXED POINT How: Read my fixed point tutorial found at my website. This is the process of replacing floats with integers. The PSX doesn't support floating point so it has to emulate it (which is very very slow). If your game depends on floating points, then using fixed point maths can speed up your program by 200%-300%. Disadvantage: It does make your program a little bit more confusing, until you get the hang of it. Technique: DCACHE How: The PSX has an area of memory 0x1f800000-0x1f800400 which is 1K in size. This area can be accessed by the CPU 5-6 times faster than normal memory. As with Register, you can only use this with local variables. The idea is to take your data structure or array and allocate space in dcache for it. E.g. BULLET bullet; bullet.x=100; becomes register BULLET *bullet=(BULLET *)getScratchAddr(0); bullet->x=100; This technique makes a huge difference when used in frequently called functions. Disadvantages: For starters, this can only be used with local variables. In addition you are limited to 1K - so make sure you data does not exceed this. You must also be careful because some functions in the libps.h use the dcache as well. Technique: BITSHIFTING How: The idea is to try to replace multiplications and divisions with bit shifting. This is achieved using the << and >> operators which can be processed far quicker by the CPU than multiplication or devision processes. Here are some examples to get you started: <<1 the equivalent of (2^1)=2 <<2 the equivalent of (2^2)=4 >>1 the equivalent of (2^-1)=0.5 >>2 the equivalent of (2^-2)=0.25 Thus A*2=X; becomes A<<1=X; A/2=X; becomes A>>1=X; You can also combine operations together to avoid multiplying. E.g. A*255=X becomes (A<<8)-A=X; Disadvantages: None really. Technique: HIERARCHICAL COLLISION DETECTION How: How you do this depends on your program. The idea is to reduce the amount of collisions you should be checking by eliminating possibilities and building a hierarchy. The most common way, for example is when checking collision with the player, is to only check objects that are on the screen as opposed to all objects in the world. Disadvantages: Lots of work involved! -------------------------------------------------------------- GRAPHICS -------------------------------------------------------------- The redrawing of images on the screen is the most cpu intensive of most programs. Some tips on using graphics more efficiently; * Use less polygons/vertices (e.g. replace triangles with quads). * Avoid drawing polygons or sprites that are off screen. * Avoid using fogging (fogging is quite cpu intensive). * Avoid light calculations on polygons that don't need it (e.g. ground tiles). * Deactivate unused attributes of sprites. E.g. if you don't use rotation or scaling, turn off the attribute on the sprite. * Level Of Detail (LOD) This technique is one of the most underused techniques on the Yaroze. It is generally the most effective way of drawing huge amounts of detail without slow down. The idea is to draw objects at different levels of detail depending on the distance from the camera. This is done by calculating the distance of the object from the camera and adjusting the object's subdivision based on that distance.