Path: chuka.playstation.co.uk!news
From: alex@teeth.demon.co.uk (Alex Amsel)
Newsgroups: scee.yaroze.programming.3d_graphics
Subject: More Pointless Optimisation Tricks (sin, cos
Date: Fri, 17 Oct 1997 03:39:56 GMT
Organization: Into Beyond
Lines: 394
Message-ID: <3446d0a3.18235002@news.playstation.co.uk>
References: <344647ee.14591228@news.playstation.co.uk> <625jjg$e3p24@chuka.playstation.co.uk>
Reply-To: alex@teeth.demon.co.uk
NNTP-Posting-Host: teeth.demon.co.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Newsreader: Forte Agent .99g/32.335

On Tue, 14 Oct 1997 07:50:59 +0100, "Graeme Evans"
<evans@fourny.demon.co.uk> did quoth at me:

>>
>>Pointless Optimisation Trick 1:
>>
>>#define SWAP(a,b) {a ^= b; b ^= a; a ^= b;}
>>

>Alex yr a star.

You may want to place ()'s around the a's and b's. Probably shouldn't
need to in this case but better safe than sorry.

Slightly less pointless optimisation trick 2, 3 and some others. Read
all the way through as I'll just keep putting more in until I fall
asleep!

For rotating around one axis when YOU KNOW the matrix is either the
identity matrix or only bits you will overwrite have 'damaged' the
identity use the following code. You can adjust this code to take
account of the identity. I also have a software replacement for
RotMatrix if anyone wants it (why?). Calculated by hand!

Also following are Sin/Cos/InvTan source files which are faster than
those provided in the demos. The actual sintable isn't present as you
will have it in one of many demos from scee already. Just paste it in.

[optimisations appreciated, these are all C and no assembler]

Here goes [see later for what UNFIX is all about and ignore my game
code splattered around the source]:

	int a;
	short (*m)[][3];
	a = UNFIX(MapBlockRot[i].angle);
// Point m to your matrix. The first coord is a pointer to a GsDOBJ2.
	m = &Object[mapblock->object].coord.coord.m;

// I think this code is written in optimal order for a good compiler
but I haven't checked the results

	switch(mapblock->subtype2 & (MBS2_ROT0 | MBS2_ROT1))
	{
		case MAPBLOCK_ROTX	:	(*m)[1][1] = rcos(a);
						(*m)[2][2] = rcos(a);
						(*m)[2][1] = rsin(a);
						(*m)[1][2] = -rsin(a);
						break;

		case MAPBLOCK_ROTY	:	(*m)[0][0] = rcos(a);
						(*m)[2][2] = rcos(a);
						(*m)[2][0] = rsin(a);
						(*m)[0][2] = -rsin(a);
						break;

// default is actually Z Axis rotation
		default			:	(*m)[0][0] = rcos(a);
						(*m)[1][1] = rcos(a);
						(*m)[1][0] = rsin(a);
						(*m)[0][1] = -rsin(a);
		}					
		Object[mapblock->object].coord.flg = 0;
	}

The following code prints out your Global Pointer, Stack Pointer and
Frame Pointer. Presumably you could write all 3 commands separated by
a semicolon or something. I never tried! Anyone?

volatile long r28, r29, r30;

#define PrintRegs
\
	__asm__ volatile ("usw $28, r28");		\
	__asm__ volatile ("usw $29, r29");		\
	__asm__ volatile ("usw $30, r30");		\
	D(logf("\n%-11s %04d Reg: GP:0x%08x SP:0x%08x FP:0x%08x\n\n",
__FILE__, __LINE__, r28, r29, r30));


Wassis __FILE__ and __LINE__ malarchy then? They are macros showing
the current source file and line number. Very useful for logging and
errors. Here is some more useful debug code. Note that PAUSE does a
little loop so the ps-x can wait before overloading the PCs serial
buffer. Otherwise you lose half of your characters and it all looks
like spaghetti.

#define logf printf
#define ERRORMAC(x)	NEWLINE; BREAK; PAUSE; logf("Runtime Error in
File: %s Line: %d\n", __FILE__, __LINE__); x; BREAK; NEWLINE; exit(0);

#define MALLOCMAC(x, y) if (!(x)) {ERRORMAC(logf("Error: Memory
allocation failure\n%s\n", y))}

#if DEBUG_LEVEL == 0
#define D0(x) {x};
#define E(x) {ERRORMAC(x)}
#define M(x, y) MALLOCMAC(x, y)
#endif

This means that id DEBUG_LEVEL is 0 then the following commands are
available:

D0(x) : This runs whatever 'x' is.
e.g. D(logf("Use the force Luke, and Loop counter is %2d", i));
This would print the message on your PC. If DEBUG_LEVEL != 0 then I
define D0(x) as follows:

#ifndef D0(x)
#define D0(x) ;
#endif

As you can see, x is never run. very useful for having (multiple)
debug versions.

E(x) is special terminal error code. Note it makes a point of printing
file and line information, and also allows me my own command (the 'x'
bit) where I almost always just print further information on current
variables.

M(x,y) is a memory debug macro. I'm often lazy. When I malloc stuff I
don't test for success. So today I got around to writing a macro to
cover myself (not fully tested but seemed ok in a few tests today).

e.g. 	M(MapBlockMesh[mesh][lod] = calloc(size, sizeof(u_char)),
"mapblockMeshNorm: Norm Block Mesh allocation"); 

In M(x,y) 'x' is my memory allocation which is tested for success. 'y'
is an error message for if things fail.

Oooh what else. Oh yes, remember I said I'd explain FIX(x)?

The PS-X is crap at handling floating point numbers (float and
double). Don't ever use them, at least not in anything game loop. Use
'fixed' point numbers instead. You all already use them when you deal
with angles. Instead of 0-360 you use 0-4095 because it gives you
better 'resolution'. Effectively you have got a 'fractional component'
so you can get e.g 1.5 degrees rather than just 1 degree or 2 degrees.

I use a couple of macros to keep my code flexible and portable (i.e.
one machine may use type FIXED but it'll be defined float, another
will define it as an int).

#define PRECISION	12
typedef signed		long	FIXED;
#define FIX(x)		((FIXED)((x) << PRECISION))
#define UNFIX(x)	((int)((x) >> PRECISION))

Precision gives me 2^12 (4096) numbers between each whole integer. So
1.5 would be 4096 + (4096/2) = 6144.
To ensure portability and clarity I always refer to fixed point
variables with a typedef FIXED, and I use macros to convert between
normal and fixed.

Be warned, multiplying 2 FIXED point numbers will give you a 'doubley
fixed' result so you should UNFIX one first, watch out for overflows,
or use long longs (64 bits on the ps-x, longs[ints] are 32 bits).

NTSC vs PAL

One problem with NTSC and PAL is that they run at different speeds.
They also have a different screen size but that is easily accounted
for in your game design.

But if they run at different speeds, how can you get your game running
at the same speed. Imagine you want to move an object. You could:

1) Move it every vertical blank with movement stated in units/vblank

Accurate, but buggers up in NTSC/PAL and processor intensive for non
50/60 fps projects.

2) Move it every frame with movement stated in units/frame

Inaccurate when rendering drops below 50/60 fps and also buggers up in
NTSC/PAL

3) Count the frames that have passed between each render, and move the
object that amount of times in units/vblank

Not a bad idea. Moving it multiple times rather than just multiplying
the number of frames by the speed can help prevent processing errors.
For example, using the alternative method (below) when a low frame
rate occurs you may move right over a collision area rather than
stopping within (depending on how you deal with collisions). In this
way you trap each 'frame' of movement. Problems again with NTSC & PAL

4) As above but multiply the speed by the frame count

The problems have already been discussed, but it does run faster than
the above as you don't loop the same function repeatedly. Why do it 5x
when you can do it once?! Still mess up in NTSC/PAL.

5) Work in milliseconds, and then use methods 3 and 4 as appropriate

Well this is my current choice, but I'll take other ideas quite
happily. This method solves the NTSC

t multiple times rather than just multiplying the number of frames by
the speed can help prevent processing errors. For example, using the
alternative method (below) when a low frame rate occurs you may move
right over a collision area rather than stopping within (depending on
how you deal with collisions). In this way you trap each 'frame' of
movement. Problems again with NTSC & PAL

4) As above but multiply the speed by the frame count

The problems have already been discussed, but it does run faster than
the above as you don't loop the same function repeatedly. Why do it 5
times when you can do it once?! Still mess up in NTSC/PAL.

5) Work in milliseconds, and then use methods 3 and 4 as appropriate

Well this is my current choice, but I'll take other ideas quite
happily. This method solves the NTSC and PAL problem by calculating
everything in milliseconds. This is easy to work with for all
concerned. Your artist doesn't want to deal in frames per second, he
wants to deal in seconds!

So how do you do this?

* Test for if you are in NTSC or PAL

* Calculate, using FIXED POINT or float or something, the number of
milliseconds per vblank. For Pal its 1000/50 and NTSC 1000/60

* Once you know the length of a vblank and the time it took for the
last frame to be rendered, then you know exactly how much time has
passed since the last movement took place.

For those of you unsure of how to count vblanks, try writing a little
Callback routine for the vertical blanking interrupt. All it has to do
is count. Don't worry about breaking anything - I know a lot of people
get scared off by callbacks and interrupts and all that shite. They
are really quite easy on the Yaroze. Hardly noticeable even.

sincos includes follow, but if anyone is sadistic enough you can win a
prize of..errr..well..nothing...if you can work out what the following
section of code from my renderDaemon in FiskBal does:

Object[i].handler.attribute |= 0x3000;

col = mapblock->subtype1 & (MBS1_P0 | MBS1_P1);

j = ((MilliTimer - mapblock->data) > MAPBLOCK_FADETIME) ?
MAPBLOCK_FADETIME : MilliTimer - mapblock->data;

r = MapPaintCols[col].r * j >> MAPBLOCK_FADETIMEP;
g = MapPaintCols[col].g * j >> MAPBLOCK_FADETIMEP;
b = MapPaintCols[col].b * j >> MAPBLOCK_FADETIMEP;

paintcol = (int) r | (int) g << 8 | (int) b << 16 | (F_4 | ABE) << 24;

num_faces = 0;
for (j = 1; j < 64; j <<= 1)
	if (!(mapblock->blk & j)) num_faces++;

ptr = ((u_long) Object[i].handler.tmd) + sizeof(TMD_OBJ) +
sizeof(PRIM_HDR);

dx = sizeof(PRIM_HDR) + sizeof(TMD_F_4);
dy = (dx * num_faces) + ptr;
for (; ptr < dy; ptr += dx)
	*((u_long *)ptr) = paintcol;

[apologies for any text being screwed by the mailer]

/*
**		Filename:			sincos.c
**		Version:			1.00
**		Date:				23/07/97
**
**		Sine, Cosine Macros (see .h file) and Inverse Tan
function
**		Rewritten from a Yaroze demo somewhere...
**
**		© Copyright 1997 Tuna Technologies
**			All Rights Reserved
**
**		Revision History:
**
**	1.00	23/07/97	Created.
**
*/

#include "types.h"
#include "sincos.h"

/****************************************************************************
* FUNCTION:		rinvtan
* DESCRIPTION:	Inverse Tan function
*				If x = y = 0 then 0 is returned
[result is undefined]
* PARAMETERS:	int x, int y
* RETURNS:		int angle (4096 = 360 degrees)
****************************************************************************/

int rinvtan(int x, int y)
	{
	int t;

	if (x == 0 && y == 0)	return 0;

	if (abs(x) > abs(y))
	{
		t = (y << 8) / x;
		if (t >= 0)	return Tinvtan[t & 255] + (x < 0 ?
2048 : 0);
		return (x < 0 ? 2048 : 4096) - Tinvtan[(-t) & 255];
	}

	t = (x << 8) / y;

	if ((x == y) && (t >= 0))	return (y < 0 ? 3072 : 1024) -
512;
	if (t >= 0)	return (y < 0 ? 3072 : 1024) - Tinvtan[t &
255];
	if (x == y)	return 512 + (y < 0 ? 3072 : 1024);
	return Tinvtan[(-t) & 255] + (y < 0 ? 3072 : 1024);
	} 
	
/****************************************************************************/

/* Sin Table (4096 = 1.0) */
short SinTable[] = 
{

INSERT SIN TABLE FROM YAROZE DEMOS HERE

};      

/* Inverse Tan Table (512 = 90 degrees) */
short Tinvtan[] =
{

INSERT TAN TABLE FROM YAROZE DEMOS HERE

};


/*
**		Filename:			sincos.h
**		Version:			1.00
**		Date:				23/07/97
**
**		Sine, Cosine Macros and Inverse Tan function
**
**		© Copyright 1997 Tuna Technologies
**			All Rights Reserved
**
**		Revision History:
**
**	1.00	23/07/97	Created.
**
*/


#ifndef SINCOS_H
#define SINCOS_H
	
/****************************************************************************
* FUNCTION:		rsin (MACRO)
* DESCRIPTION:	Sine function
* PARAMETERS:	angle 0-4095 (4096 = 360 degrees)
* RETURNS:		short sin(angle) << 12 [0-4096]
****************************************************************************/

#define rsin(a) SinTable[a]
	
/****************************************************************************
* FUNCTION:		rcos (MACRO)
* DESCRIPTION:	Cosine function
* PARAMETERS:	int angle (4096 = 360 degrees)
* RETURNS:		short cos(angle) << 12 [0-4096]
****************************************************************************/

#define rcos(a) SinTable[a + 1024]

extern short	SinTable[];
extern short	Tinvtan[];

extern int	rinvtan(int x, int y);

#endif


Regards, 

Alex Amsel

+ Tuna Technologies + Telephone & Fax +44 (0)114 221 0686 +
+ For all your Win95/NT/Console Game and Tool Development +
+ And we say, "A good programmer always blames Microsoft" +
+ "Just say NO to Big Fat Ron", say I to Sir Jack Hayward +