Path: chuka.playstation.co.uk!news From: alex@teeth.demon.co.uk (Alex Amsel) Newsgroups: scee.yaroze.programming.3d_graphics Subject: More Pointless Optimisation Tricks (sin, cos Date: Fri, 17 Oct 1997 03:39:56 GMT Organization: Into Beyond Lines: 394 Message-ID: <3446d0a3.18235002@news.playstation.co.uk> References: <344647ee.14591228@news.playstation.co.uk> <625jjg$e3p24@chuka.playstation.co.uk> Reply-To: alex@teeth.demon.co.uk NNTP-Posting-Host: teeth.demon.co.uk Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Newsreader: Forte Agent .99g/32.335 On Tue, 14 Oct 1997 07:50:59 +0100, "Graeme Evans" did quoth at me: >> >>Pointless Optimisation Trick 1: >> >>#define SWAP(a,b) {a ^= b; b ^= a; a ^= b;} >> >Alex yr a star. You may want to place ()'s around the a's and b's. Probably shouldn't need to in this case but better safe than sorry. Slightly less pointless optimisation trick 2, 3 and some others. Read all the way through as I'll just keep putting more in until I fall asleep! For rotating around one axis when YOU KNOW the matrix is either the identity matrix or only bits you will overwrite have 'damaged' the identity use the following code. You can adjust this code to take account of the identity. I also have a software replacement for RotMatrix if anyone wants it (why?). Calculated by hand! Also following are Sin/Cos/InvTan source files which are faster than those provided in the demos. The actual sintable isn't present as you will have it in one of many demos from scee already. Just paste it in. [optimisations appreciated, these are all C and no assembler] Here goes [see later for what UNFIX is all about and ignore my game code splattered around the source]: int a; short (*m)[][3]; a = UNFIX(MapBlockRot[i].angle); // Point m to your matrix. The first coord is a pointer to a GsDOBJ2. m = &Object[mapblock->object].coord.coord.m; // I think this code is written in optimal order for a good compiler but I haven't checked the results switch(mapblock->subtype2 & (MBS2_ROT0 | MBS2_ROT1)) { case MAPBLOCK_ROTX : (*m)[1][1] = rcos(a); (*m)[2][2] = rcos(a); (*m)[2][1] = rsin(a); (*m)[1][2] = -rsin(a); break; case MAPBLOCK_ROTY : (*m)[0][0] = rcos(a); (*m)[2][2] = rcos(a); (*m)[2][0] = rsin(a); (*m)[0][2] = -rsin(a); break; // default is actually Z Axis rotation default : (*m)[0][0] = rcos(a); (*m)[1][1] = rcos(a); (*m)[1][0] = rsin(a); (*m)[0][1] = -rsin(a); } Object[mapblock->object].coord.flg = 0; } The following code prints out your Global Pointer, Stack Pointer and Frame Pointer. Presumably you could write all 3 commands separated by a semicolon or something. I never tried! Anyone? volatile long r28, r29, r30; #define PrintRegs \ __asm__ volatile ("usw $28, r28"); \ __asm__ volatile ("usw $29, r29"); \ __asm__ volatile ("usw $30, r30"); \ D(logf("\n%-11s %04d Reg: GP:0x%08x SP:0x%08x FP:0x%08x\n\n", __FILE__, __LINE__, r28, r29, r30)); Wassis __FILE__ and __LINE__ malarchy then? They are macros showing the current source file and line number. Very useful for logging and errors. Here is some more useful debug code. Note that PAUSE does a little loop so the ps-x can wait before overloading the PCs serial buffer. Otherwise you lose half of your characters and it all looks like spaghetti. #define logf printf #define ERRORMAC(x) NEWLINE; BREAK; PAUSE; logf("Runtime Error in File: %s Line: %d\n", __FILE__, __LINE__); x; BREAK; NEWLINE; exit(0); #define MALLOCMAC(x, y) if (!(x)) {ERRORMAC(logf("Error: Memory allocation failure\n%s\n", y))} #if DEBUG_LEVEL == 0 #define D0(x) {x}; #define E(x) {ERRORMAC(x)} #define M(x, y) MALLOCMAC(x, y) #endif This means that id DEBUG_LEVEL is 0 then the following commands are available: D0(x) : This runs whatever 'x' is. e.g. D(logf("Use the force Luke, and Loop counter is %2d", i)); This would print the message on your PC. If DEBUG_LEVEL != 0 then I define D0(x) as follows: #ifndef D0(x) #define D0(x) ; #endif As you can see, x is never run. very useful for having (multiple) debug versions. E(x) is special terminal error code. Note it makes a point of printing file and line information, and also allows me my own command (the 'x' bit) where I almost always just print further information on current variables. M(x,y) is a memory debug macro. I'm often lazy. When I malloc stuff I don't test for success. So today I got around to writing a macro to cover myself (not fully tested but seemed ok in a few tests today). e.g. M(MapBlockMesh[mesh][lod] = calloc(size, sizeof(u_char)), "mapblockMeshNorm: Norm Block Mesh allocation"); In M(x,y) 'x' is my memory allocation which is tested for success. 'y' is an error message for if things fail. Oooh what else. Oh yes, remember I said I'd explain FIX(x)? The PS-X is crap at handling floating point numbers (float and double). Don't ever use them, at least not in anything game loop. Use 'fixed' point numbers instead. You all already use them when you deal with angles. Instead of 0-360 you use 0-4095 because it gives you better 'resolution'. Effectively you have got a 'fractional component' so you can get e.g 1.5 degrees rather than just 1 degree or 2 degrees. I use a couple of macros to keep my code flexible and portable (i.e. one machine may use type FIXED but it'll be defined float, another will define it as an int). #define PRECISION 12 typedef signed long FIXED; #define FIX(x) ((FIXED)((x) << PRECISION)) #define UNFIX(x) ((int)((x) >> PRECISION)) Precision gives me 2^12 (4096) numbers between each whole integer. So 1.5 would be 4096 + (4096/2) = 6144. To ensure portability and clarity I always refer to fixed point variables with a typedef FIXED, and I use macros to convert between normal and fixed. Be warned, multiplying 2 FIXED point numbers will give you a 'doubley fixed' result so you should UNFIX one first, watch out for overflows, or use long longs (64 bits on the ps-x, longs[ints] are 32 bits). NTSC vs PAL One problem with NTSC and PAL is that they run at different speeds. They also have a different screen size but that is easily accounted for in your game design. But if they run at different speeds, how can you get your game running at the same speed. Imagine you want to move an object. You could: 1) Move it every vertical blank with movement stated in units/vblank Accurate, but buggers up in NTSC/PAL and processor intensive for non 50/60 fps projects. 2) Move it every frame with movement stated in units/frame Inaccurate when rendering drops below 50/60 fps and also buggers up in NTSC/PAL 3) Count the frames that have passed between each render, and move the object that amount of times in units/vblank Not a bad idea. Moving it multiple times rather than just multiplying the number of frames by the speed can help prevent processing errors. For example, using the alternative method (below) when a low frame rate occurs you may move right over a collision area rather than stopping within (depending on how you deal with collisions). In this way you trap each 'frame' of movement. Problems again with NTSC & PAL 4) As above but multiply the speed by the frame count The problems have already been discussed, but it does run faster than the above as you don't loop the same function repeatedly. Why do it 5x when you can do it once?! Still mess up in NTSC/PAL. 5) Work in milliseconds, and then use methods 3 and 4 as appropriate Well this is my current choice, but I'll take other ideas quite happily. This method solves the NTSC t multiple times rather than just multiplying the number of frames by the speed can help prevent processing errors. For example, using the alternative method (below) when a low frame rate occurs you may move right over a collision area rather than stopping within (depending on how you deal with collisions). In this way you trap each 'frame' of movement. Problems again with NTSC & PAL 4) As above but multiply the speed by the frame count The problems have already been discussed, but it does run faster than the above as you don't loop the same function repeatedly. Why do it 5 times when you can do it once?! Still mess up in NTSC/PAL. 5) Work in milliseconds, and then use methods 3 and 4 as appropriate Well this is my current choice, but I'll take other ideas quite happily. This method solves the NTSC and PAL problem by calculating everything in milliseconds. This is easy to work with for all concerned. Your artist doesn't want to deal in frames per second, he wants to deal in seconds! So how do you do this? * Test for if you are in NTSC or PAL * Calculate, using FIXED POINT or float or something, the number of milliseconds per vblank. For Pal its 1000/50 and NTSC 1000/60 * Once you know the length of a vblank and the time it took for the last frame to be rendered, then you know exactly how much time has passed since the last movement took place. For those of you unsure of how to count vblanks, try writing a little Callback routine for the vertical blanking interrupt. All it has to do is count. Don't worry about breaking anything - I know a lot of people get scared off by callbacks and interrupts and all that shite. They are really quite easy on the Yaroze. Hardly noticeable even. sincos includes follow, but if anyone is sadistic enough you can win a prize of..errr..well..nothing...if you can work out what the following section of code from my renderDaemon in FiskBal does: Object[i].handler.attribute |= 0x3000; col = mapblock->subtype1 & (MBS1_P0 | MBS1_P1); j = ((MilliTimer - mapblock->data) > MAPBLOCK_FADETIME) ? MAPBLOCK_FADETIME : MilliTimer - mapblock->data; r = MapPaintCols[col].r * j >> MAPBLOCK_FADETIMEP; g = MapPaintCols[col].g * j >> MAPBLOCK_FADETIMEP; b = MapPaintCols[col].b * j >> MAPBLOCK_FADETIMEP; paintcol = (int) r | (int) g << 8 | (int) b << 16 | (F_4 | ABE) << 24; num_faces = 0; for (j = 1; j < 64; j <<= 1) if (!(mapblock->blk & j)) num_faces++; ptr = ((u_long) Object[i].handler.tmd) + sizeof(TMD_OBJ) + sizeof(PRIM_HDR); dx = sizeof(PRIM_HDR) + sizeof(TMD_F_4); dy = (dx * num_faces) + ptr; for (; ptr < dy; ptr += dx) *((u_long *)ptr) = paintcol; [apologies for any text being screwed by the mailer] /* ** Filename: sincos.c ** Version: 1.00 ** Date: 23/07/97 ** ** Sine, Cosine Macros (see .h file) and Inverse Tan function ** Rewritten from a Yaroze demo somewhere... ** ** © Copyright 1997 Tuna Technologies ** All Rights Reserved ** ** Revision History: ** ** 1.00 23/07/97 Created. ** */ #include "types.h" #include "sincos.h" /**************************************************************************** * FUNCTION: rinvtan * DESCRIPTION: Inverse Tan function * If x = y = 0 then 0 is returned [result is undefined] * PARAMETERS: int x, int y * RETURNS: int angle (4096 = 360 degrees) ****************************************************************************/ int rinvtan(int x, int y) { int t; if (x == 0 && y == 0) return 0; if (abs(x) > abs(y)) { t = (y << 8) / x; if (t >= 0) return Tinvtan[t & 255] + (x < 0 ? 2048 : 0); return (x < 0 ? 2048 : 4096) - Tinvtan[(-t) & 255]; } t = (x << 8) / y; if ((x == y) && (t >= 0)) return (y < 0 ? 3072 : 1024) - 512; if (t >= 0) return (y < 0 ? 3072 : 1024) - Tinvtan[t & 255]; if (x == y) return 512 + (y < 0 ? 3072 : 1024); return Tinvtan[(-t) & 255] + (y < 0 ? 3072 : 1024); } /****************************************************************************/ /* Sin Table (4096 = 1.0) */ short SinTable[] = { INSERT SIN TABLE FROM YAROZE DEMOS HERE }; /* Inverse Tan Table (512 = 90 degrees) */ short Tinvtan[] = { INSERT TAN TABLE FROM YAROZE DEMOS HERE }; /* ** Filename: sincos.h ** Version: 1.00 ** Date: 23/07/97 ** ** Sine, Cosine Macros and Inverse Tan function ** ** © Copyright 1997 Tuna Technologies ** All Rights Reserved ** ** Revision History: ** ** 1.00 23/07/97 Created. ** */ #ifndef SINCOS_H #define SINCOS_H /**************************************************************************** * FUNCTION: rsin (MACRO) * DESCRIPTION: Sine function * PARAMETERS: angle 0-4095 (4096 = 360 degrees) * RETURNS: short sin(angle) << 12 [0-4096] ****************************************************************************/ #define rsin(a) SinTable[a] /**************************************************************************** * FUNCTION: rcos (MACRO) * DESCRIPTION: Cosine function * PARAMETERS: int angle (4096 = 360 degrees) * RETURNS: short cos(angle) << 12 [0-4096] ****************************************************************************/ #define rcos(a) SinTable[a + 1024] extern short SinTable[]; extern short Tinvtan[]; extern int rinvtan(int x, int y); #endif Regards, Alex Amsel + Tuna Technologies + Telephone & Fax +44 (0)114 221 0686 + + For all your Win95/NT/Console Game and Tool Development + + And we say, "A good programmer always blames Microsoft" + + "Just say NO to Big Fat Ron", say I to Sir Jack Hayward +