Vectors and Matrices Mini Tutorial

James Russell - SCEE

If you've been programming with 2D graphics and want to break into 3D, you're going to have to learn about a few of the concepts involved with 3D graphics. There are many books on the subject, but some assume you're a maths genius or you've sat through Linear Algebra 101. This document is aimed at those people who haven't encountered vectors and matrices before. It doesn't go into huge detail, but it should be enough to help you understand the concepts behind the maths you have to perform. It won't tell you the inner workings of matrices, but it will tell you what they are and how to use them.

Vectors and Origins

When you draw a 2D graph, you usually have an X and Y axis, and where they meet is the called the origin. At the origin, the values of X and Y are both 0. When you specify a point on the graph, you specify it relative to the origin. So a point that's 3 units to the right and 4 units up from the origin would be (3,4). We call this (3,4) set a 2-tuple, so called because it has 2 elements. It's more commonly known as a position vector, because it describes the position of something relative to the origin.

On a graph, a position vector is represented by a line with an arrow at the end. The line begins at the origin (0,0) and ends at wherever the position vector is pointing at (3,4). The arrowhead is at the end of the line.

3D Objects

3-Dimensional objects are so called because they don't just have an X and Y component, which specifies how far along and up the object is. A 3D object also specifies how far in or out the object is. This is the Z component.

A 3D object is generally made up of lots of triangles and quadrangles, and the positions of the corners of these polygons is specified by a 3-tuple. For example, if one of the corners of a triangle is 100 units along, 200 units up and 50 units 'in', then we would describe this corner as being at position (100,200,50). A 3D object is made up of a list of the positions of these corners (more commonly known as vertices), and a list of how to connect them to make up the various polygons. A vertex is described by a 3-tuple specifying its position along the X, Y and Z axis.

Drawing a 3D object

You can't just draw a 3D object onto a 2D screen. It would be great if there were such things as 3D TV screens, but there aren't. So for every vertex in the object, you've got to transform all the 3D points in the object (X,Y,Z) into 2D points (X,Y) that you can put onto the screen. The graphics processing unit (GPU) in the Playstation doesn't understand anything about 3D, all it knows how to do is draw 2D polygons.

So how do you get from a 3D point to a 2D point? There are lots of ways, and they're all very different.

You could ignore the Z component, and just use the 3D point's X and Y. This is really simple, but doesn't look very 3D on screen.
You could move the X and Y along a diagonal line according to the Z. If the Z was negative, the X and Y move down/left, and if the Z is positive, the X and Y move up/right. The amount they move is proportional to the size of Z. This will give you an isometric view, which looks a little more 3D but still isn't very good.

The best way to do it is to pretend that the origin of the X/Y/Z system is at the center of your TV screen. When Z is positive (bigger than 0), that point is 'behind' the screen. When Z is negative (smaller than 0), the point is 'in front' of the screen, and when Z is 0, the point is 'touching' the screen. Lets go to the wonderful world of ASCII art. Here we are viewing you watching your TV screen from the side.

                   ^ Y axis (negative)
                   | <--(TV Screen)
                   |
E-------Z-Axis-----O----------R--Z-Axis----> (positive)
    \_____         |          |
          \______ |          |
                  \D______    |
                   |      \___P
                   |
                   V Y Axis (positive)

Now 'E' is where we pretend your eye is. We're assuming you're a pirate with a pegleg and a parrot, so you've only got one eye. Every game assumes this, it makes the maths a lot simpler. The eye sits on the Z axis at a fixed distance away from the screen. Let's say it's 1000 units away.

Now P is our point in 3D. In my ASCII art diagram we can't see how far along P is on the X axis, since we're looking at it from the side. But we'll ignore it for now and come back to it later. But let's say P is at X=100, Y=200, Z =250 (100,200,250). You might have noticed that, contrary to the 2D graphs you drew at school, the Y axis is pointing down, not up. This is the way the Playstation handles it.

So we have this point P and we need to turn it into a 2D point on the screen so we can use it for a vertex (corner) of a polygon. Now imagine putting your eye where 'E' is, and looking towards P. We want a point on the TV screen which looks (to our eye) as if it is at point P. And this new point on the TV screen is point D. So we have to figure out how to calculate point D from point P.

To do this, we use the principle of similar triangles. This says that if two different sized triangles share the same internal angles, then one triangle is just a scaled up version of the other. This can help you work out unknown lengths of things. We have two similar triangles here: E-D-O and E-P-R. They share the same internal angles (and overlap a bit too). What do we know about triangle EDO? Well, we know where E and O are, so we know the length of the line EO (its 1000). But we don't know the length of DO, and that's what we want to find out. What do we know about triangle EPR? We can work out the length between E and R (1000 + P.z = 1000 + 250 = 1250), and the length of RP is just the Y component (200).

So according to the similar triangles principle:

Length DO Length RP
--------- = ---------
Length EO Length RE

We know 3 of the terms already. Rearranging the equation a bit, we get the unknown to one side.

Length DO = Length RP * Length EO
---------------------
Length RE

So the Y component of point D is going to be:

Y.screen = 200 * 1000 / (1000 + 250) = 160

That gives us how far along the Y component on the screen is going to be. What about the X? Well, in a similar fashion, we calculate the X. If you imagine the diagram above represents the X axis instead of the Y, and re-apply all the equations accordingly, you'll get a new X value for the screen too. So:

ScreenX = (P.X * Distance from Origin to Eye) / (Distance from Origin to Eye + P.Z);
ScreenY = (P.Y * Distance from Origin to Eye) / (Distance from Origin to Eye + P.Z);

These two equations are called the perspective transformation. It transforms a 3D point into a 2D one that the GPU in the Playstation can understand. The perspective transformation has to be applied to every point in the object just before the object is drawn. The processor then uses these 2D points to decide where to put the 2D polygons.

The perspective transformation is the last step we apply before drawing.

Translation

Once you've described your 3D object, you probably want to move it around. This transformation is called translation. Going back to our 2D example, if we had the point (3,4) and we want to move it 12 to the right and 5 up, we just add 12 and 5 to the respective components: (3+12,4+5) = (15,9). If we do this for every point in our 2D graph, the whole graph will appear to have shifted 12 units along and 5 up.

The same applies to 3D objects, except we adjust 3 components (X,Y and Z) instead of just 2 (X and Y). We specify in a direction vector (which looks exactly like a position vector) how far we want to move our object. Say we wanted to move it 100 along, 0 up and 200 'in', the direction vector would be (100,0,200). If we call this vector 'T', then the functions for translation are:

newX = oldX + T.X;
newY = oldY + T.Y;
newZ = oldZ + T.Z;

Once again, we have to apply this to every point in the object if we want the entire object to move.

Scaling about the origin

Sometimes we want our object to get bigger or smaller. It seems fairly obvious that if we moved each point in our object out twice as far as they already were, our object would appear twice as big (the polygons inbetween would be stretched twice as much).

And similarly, if we wanted to halve the size of our object, we'd just move all the points further in towards the origin so that they were only half the distance away.

This transformation is called scaling, and assuming you want to make the object n times as big, the functions required to do it are:

newX = n*oldX;
newY = n*oldY;
newZ = n*oldZ;

You have to apply this to every point in the 3D object in order to see the results. So far, easy peasy.

Rotation about the Origin

If we want to rotate a 3D object about the Z axis, there are some simple formulae that help up do it. Don't worry about how these formulae were figured out, just know that a Clever Person invented them.

Assuming you want to rotate a 3D point around the Z axis by t degrees, the formula is:

newX = x* cos(t) + y *sin(t)
newY = x * -sin(t) + y*cos(t)
newZ = z

Since the value of Z doesn't change when you're rotating around the Z axis, there's no special formula for it. The formulae for rotating around the other axes is similar:

Y axis:

new X = x* cos(t) + z*sin(t)
new Y = y
new Z = x * -sin(t) + z*cos(t)

X axis:

new X = x
new Y = z* -sin(t) + y*cos(t)
new Z = z* cos(t) + y *sin(t)

Most applications want to rotate about all three axes. To do so, you've got to first rotate about one axis, then rotate the result about the next axis, and then rotate the result about the last axis. The order that you do the rotation is important too. Rotating about the Z, then the Y, then the X axis has different results than if you rotate about the X, the Y, then the Z axis.

As an example, take a ballpoint pen and hold it in front of you as if it is lying on the X axis. Now rotate it 90 degrees about the Z, then 90 degrees about the X. The pen should now be aligned with the Z axis. Hold it in front of you along the X axis again, and rotate it 90 degrees along the X axis, the 90 degrees along the Z axis. The pen should be aligned along the Y axis! See how the order of rotation is important?

Thes equations rotate points about a particular axis. Sometimes we want points rotated about different axis. The equations for those are a lot more complicated, so we won't go into them here. The equations above will suffice most of the time.

Once again, we apply these transformation to all points in the 3D object.

Matrices

What we've described in the previous sections are various transformations. Transformations are things you can do to a point to turn it into another point at a different position. The common ones we use in 3D graphics are the perspective transformation (so we can translate our 3D points to a 2D screen), translation (so we can move it around), scaling (so we can make things bigger or smaller) and rotation (so we can rotate them around the origin).

We've also described the equations necessary to do all these, and none of them looked very nice. You can write programs which use all those equations above, but eventually your programs will become very difficult to understand. We don't want to think of them as sets of equations, because that's too confusing and the maths gets horrible very quickly when you want to do something complex. We just want to think of them as transformations. This is where matrices come in.

A matrix is a 2 dimensional array of numbers that represent a transformation (rotation, scaling, translation, perspective, etc). A matrix is called an mxn matrix when it has m rows and n columns. When you multiply a vector (3D point) with a 3x3 matrix, the result is another vector that has been transformed by the matrix. You don't have to know how you multiply the vector with a 3x3 matrix, the libraries and the GTE in the Playstation do that for you.

For example, if I have a matrix M which represents a scaling matrix that scales by a factor of 2, and I have a bunch of 3D points to transform stored in an array of SVECTORs V (where V[i] = (point.X, point.Y, point.Z)) then I can write a simple loop which uses the GTE to calculate the transformed points:

For i = 0; i < length(V); i++
V[i] = V[i] * M; // GTE operation.

Of course, this is just psuedocode. At the end of this, all the points V[i] will have been scaled by 2.

Very important! Multiplying a 3D point V by a matrix will get different results depending on the order of multiplication (V * M does not equal M * V!). In the following examples, we're assuming that V is on the far left of the equation.

Different types of transformation matrix

Clever People have developed various 3x3 matrices for different sorts of transformation. For example:

a 0 0
0 a 0
0 0 a

is a scaling matrix - if you multiply a vector by this matrix, the vector will be scaled by factor 'a'.

The rotation matrices are the interesting ones. To rotate around the Z axis:

cos(t) sin(t) 0
-sin(t) cos(t) 0
0 0 1

The Y axis:

cos(s) 0 -sin(s)
0 1 0
sin(s) 0 cos(s)

and finally the X axis:

1      0      0
0    cos(r) -sin(r)
0    sin(r) cos(r)

When you multiply a vector V (which represents a 3D point) by a matrix M, the result of V * M is another vector V' which has been transformed by M. You can plug in this new vector V' into another transformation, and get back yet another vector.

Just out of interest, if you want to do absolutely nothing to your point (i.e. what you put in is what you get back), then you can use the identity matrix. If you multiply a vector V by the identity matrix I, you get back V. It's a bit pointless really, since the original point doesn't change, but later on in life you'll need to know what the identity matrix is:

1 0 0
0 1 0
0 0 1

Back to the examples: Say we wanted to scale all our points by a factor of 3, then rotate it about the Z axis by 90 degrees. First we create the necessary matrices S (for scaling) and Rz (for rotation about Z). Using the 'templates' described above:

3 0 0
S = 0 3 0
0 0 3

cos(90) sin(90) 0 0 1 0
Rz = -sin(90) cos(90) 0 = -1 0 0
0 0 1 0 0 1

So for all our points in the object we apply S and then Rz. Say V is a single point:

V' = V * S
V'' = V' * Rz

or more succintly:

V'' = V * S * Rz

V'' will contain the new point firstly transformed by scaling, then transformed by a rotation by 90 degrees about the Z axis.

As mentioned above, the ORDER you do transformations is important. Translating a point and then scaling it will have drastically different effects than if you scaled it and then translated it.

As you may have gathered, we have to apply a transformations to each point in the object to get a result. And we know that the GTE can multiply an arbitrary 3x3 matrix by a 3D point to get us another 3D point. That's an advantage - the GTE is really good (i.e. fast) at multiplying a matrix by a 3D point. Much faster than the CPU, in fact. And since we can represent rotation and scaling by a 3x3 matrix, we're going to get a speed increase if we use matrices rather than getting the CPU to 'manually' do the equations given at the start. But that's not the real reason we use matrices.

(Pedant alert: The GTE is sort of halfway between a 3x3 matrix multiplier and a 4x4 multiplier. It can do any 3x3 matrix multiplication, but only special cases of 4x4 multiplication, and we'll see why don't care about that later)

Combining transformations

Now you may be wondering at this point "What's the point? What can matrices do for me that the equations in the first few sections can't?"

The reason is embedded in the lines:

V' = V * S
V'' = V' * Rz
and
V'' = V * S * Rz

As all of you know, the key to making a fast program is keeping the number of mathematical operations to a minimum. And you may have been disturbed by the fact that in the above example we were doing two matrix multiplies for every point! This is baaaaad, since we didn't have to.

For example, say we wanted to perform the following fictional piece of code:

a = getA();
b = getB();
for(i = 0; i < 1000; i++) {
d[i] = c[i] * b * a;
}

Assuming your C compiler is terrible and can't see the obvious optimisation, you'd be doing 2 multiplies 1000 times, which is hard (and more importantly, slow) work for the CPU. A better way of writing it would be:

a = getA();
b = getB();
q = b * a;
for(i = 0; i < 1000; i++) {
d[i] = c[i] * q;
}

That's only one multiply, so it's a lot quicker. In a similar way, instead of doing this:

S = createScalingMatrix();
Rz = createRotationAboutZMatrix();
For(every Point in the Object) {
TempPoint = Point * S;
NewPoint = TempPoint * Rz;
}

we'd rather do this:

S = createScalingMatrix(scaleFactor);
Rz = createRotationAboutZMatrix(angle);
Q = S * Rz;
For(every Point) {
NewPoint = Point * Q;
}

A lot quicker, right? Now imagine you're not doing just 2 transformations, but 10! Instead of doing 10 transformations for every point, you can always initially calculate Q, the combination of all those transformations, and apply just Q instead. This is a huge speed saving! If you were doing the same with the original equations, you could do some simplification of the maths, but it would be error prone and difficult. Matrices are a great way to organise sets of transformations.

That's the cool thing about transformations represented by matrices - you can combine all sorts of transformations together into one matrix. If you have a transformation in matrix A and another transformation in matrix B, and matrix C is A * B, then multiplying a point V by A and then multiplying the result by B is the SAME as multiplying V by matrix C:

C = A * B;

V * C = V * A * B

and an even more extreme example:

F = A * B * C * D * E

V * F = V * A * B * C * D * E

and so on.

A very important point that I will repeat again: If A and B are matrices (read: transformations), then A * B does NOT always equal B * A. This is why the order you do your transformations is so important. The first transformation should be on the LEFT, and the last transformation should be on the RIGHT. The vector V is always multiplied from the far left, as in the above examples. (Sidenote for pedants: It is possible to represent these equations where the vector to multiply is on the right and the matrix is on the left. To do this, you have to mirror (tech term: transpose) the transformation matrix given above about it's diagonal line from top left to bottom right). Most graphics books use the format I'm using, where the vector to be transformed is on the far left of the equation.

The really cool thing about matrices is that you can multiply two matrices together to form a new matrix which is the combination of the original two. So once we create our 3 rotation matrices (one each for X, Y and Z rotation), we can multiply them all together to create one new matrix, and then multiply that with our points instead. It doesn't just have to be rotation matrices either - we can combine scaling, translation and perspective transformations in there too.

As an example, the Playstation function RotMatrixX() will take an angle and create a matrix which is a rotation by that angle about the X axis. Similar functions are RotMatrixY and RotMatrixZ. The function RotMatrix() will create a single matrix that is the combination of a rotation about the Z followed by a rotation about the Y followed by a rotation about the X. You pass in a blank matrix and a vector containing the 3 angles (for X, Y and Z), and it creates the 3 different rotation matrices described above (Rx, Ry and Rz) and multiplies them together to get Rzyx in your blank matrix. Once again, order is important. Multiplying points by this new matrix Rzyx is the same as if you'd rotated them about the Z axis first, then the Y axis, then the X. RotMatrix() doesn't handle any different orders of transformation, only Z-then-Y-then-X. You will have to make your own version of RotMatrix if you want a different order (You can use the RotMatrixX/Y/Z functions though!).

RotMatrix psuedocode (SVECTOR angles, MATRIX outputMatrix) {
Matrix XRotate, YRotate, ZRotate;

    CreateXRotationMatrix(angles.vx, &XRotate);
    CreateYRotationMatrix(angles.vy, &YRotate);
    CreateZRotationMatrix(angles.vz, &ZRotate);

outputMatrix = ZRotate * YRotate * XRotate; // Matrix multiplication, not normal multiplication!
}

Translation and Perspective Transformation Matrices

No matter how hard they tried, the Clever People couldn't figure out how to create a 3x3 matrix which could perform a perspective or translation transformation.

"This buggers things up a bit" they thought. But because they were Clever People, they eventually found a way around it. Instead of using 3x3 matrices, they'd use 4x4 matrices. A 4x4 matrix has 4 rows and 4 columns instead of 3 rows and 3 columns. The Clever People found an easy way to convert the existing 3x3 scaling matrices into 4x4 matrices - the new bits of the array were filled with zeroes, and the bottom right corner has a 1 in it. For example, the new scaling matrix is:

a 0 0 0
0 a 0 0
0 0 a 0
0 0 0 1

The bits in bold are the new numbers. The old 3x3 rotation and scaling matrices have these new bits tacked on, and there you have it, a 4x4 version of the same thing.

The Clever People figured out that the translation matrix would look like this:

1 0 0 0
0 1 0 0
0 0 1 0
x y z 1

This matrix will move a 3D point x units to the right, y units up and z units 'in'.

Assuming E is the distance from the eye to the origin, the perspective transformation matrix looks like this:

1 0 0 0
0 1 0 0
0 0 1 -1/E
0 0 0 1

Nice and simple!

However, you can't multiply a 4x4 matrix by a 3D point (don't worry about why, you just can't).In order to multiply a 3-tuple vector by a 4x4 matrix, we simply concatenate a 1 on the end of the vector! So if you had a 3D point (100,200,300), you simply make it (100,200,300,1). Sorted.

So now we have the complete set - a set of 4x4 matrices which can perform rotation, scaling, translation and perspective transformations. We can combine them in any order we like (perspective last, remember).

"Now wait a minute", I hear you say. "I was looking at libps.h, and the Matrix structure they had there was 3x3, with 3 't' entries on the end for translation. That is not 4x4 matrix, it's a 4x3!" And you are correct. When the Sony engineers were trying to optimize for space, they realized that 4th column of most of the 4x4 matrices were all zeroes, with a 1 at the bottom. Since this is always true of a scaling, rotation, and translation (but NOT a perspective) transformation matrix, and is also true of any combination of scaling, rotation and translation (but NOT a perspective) transformation matrix, they decided not to store the last column. Simple as that.

The GTE knows this too, and automatically 'fills in the gaps' of the 4th column with the correct numbers (0's and a 1) when you load a matrix into the GTE. This means the GTE can't multiply an arbitrary 4x4 matrix, but we don't care as long as we're only doing rotation, scaling, translation or perspective.

Matrices and the GTE

The Playstation uses lots of matrices and combinations of them to get results onto the screen. So what is happening when you perform a command like GsSetLs(&myMatrix)?

Let's look at the MATRIX structure:

typedef struct {
short m[3][3];
short t[3];
} MATRIX;

When you perform the GsSetLs(&myMatrix) call, what it does is load in that specified matrix (myMatrix) into the GTE. The m[3][3] fills in the first 3 rows of the first three columns, and the t[3] fills in the last row. The GTE knows that the last column is going to be all zeroes and a one, so it fills those in internally.

Here's a neat optimisation trick. When you combine the perspective transformation matrix with any of the other 4x4 transformation matrices described above, the resulting matrix will be exactly the same as the original, except the second to bottom term in the 4th column will be -1/E. The GTE knows this, so it fills that term in with -1/E too (you previously specified E to the GTE when you called GsSetProjection()). So this is the same as combining the matrix you've loaded in with the perspective transform.

After all that loading, you'll probably be calling GsSortObject4() with your 3D object. The Playstation will use the matrix you specified (which has been loaded into the GTE) to transform all the points in your 3D object. The results are also going to be 3D points. It uses the X and Y parts of the result as the screen coordinates, and the Z part of the result to decide how far in/out the point is (used for clipping and for inserting polygons into the Ordering Table).

So the biggest question is: "How do I get the myMatrix MATRIX all set up?"

Getting your object onto the screen

With a 2D system, you're probably used to a 'blank canvas' that you draw all your 2D sprites on. This blank canvas has an origin of (0,0) in the top left corner. The larger the Y value, the further down the screen the sprite is, and the larger the X value, the further along it is. By changing the X and Y coordinates of your 2D sprite, you can place it accurately in the 2D world.

With a 3D system, the blank canvas looks the same, but the origin (0,0,0) is in the middle of the screen, not the top left. Positive Y is still in the down direction though. And once again, by translating the 3D coordinates of your object, you can place the 3D object on the screen. This is just a simple translation transformation. So to move a 3D object in 3D space, we need to apply the translation matrix.

What we have just done is changed between coordinate systems. A coordinate system consists of an origin and 3D points which are defined relative to that origin. We define our 3D objects to have their own coordinate system, and all the 3D points in that object are defined relative to the center of that coordinate system. We have also created the world coordinate system (where the origin is in the middle of the screen). We need to convert between our objects local coordinate system to the world coordinate system. With 2D games, this is a simple matter of translation. With 3D systems, converting between coordinate systems can mean a translation, rotation, scaling or all 3! It usually consists of just translation though.

Most of the time, if we perform an appropriate translation on our local coordinate system, we're effectively converting that object's coordinate system to the world coordinate system. All objects have to be converted to the world coordinate system before they can be drawn.

You can think of the world coordinate system as the mother of all the local coordinate systems. You are converting between the local coordinate system and it's parent, the world coordinate system. It's possible for the local coordinate systems to have children too (we'll go into this later), so for now it's best to think that we're converting between a child and it's parent coordinate system. The world coordinate system is the ancestor of all coordinate systems, and has no parents.

The GsCOORDUNIT2 structure contains a matrix that will be applied to your object to get it into its parent coordinate system (not necessarily world). This matrix is called matrix. The matrix.m[3][3] part of matrix contains the 9 upper left values of the 4x4 matrix that will be applied to your object before translation. The translation itself is stored in matrix.t[3]. This means that you can do as much rotation and scaling as you like (specified in matrix.m[3][3]) to your object before it finally gets translated by the amount in matrix.t[3]. If you don't want to do any rotation or scaling, you have to have the identity matrix in there (told you it would be useful), where all the diagonal entries from top left to bottom right in matrix.m[3][3] are 1 and everything else is 0.

A B C 0 = m[0][0], m[0][1], m[0][2], 0
D E F 0 = m[1][0], m[1][1], m[1][2], 0
G H I 0 = m[2][0], m[2][1], m[2][2], 0
J K L 1 = t[0] , t[1] , t[2], 1

A-I are 9 values which will contain your 3x3 rotation/scaling matrix. J-L contain the translation which follows the rotation/scaling. If you want translation followed by rotation, you'll have to perform the matrix multiplication yourself - this would involve setting up two separate matrices (one for translation, one for rotation) and multiplying them together. This new matrix would contain the transformations in the order you desire.

When the libraries come across a GsCOORDUNIT2 structure, they have to create a matrix which converts this coordinate system to world. If your GsCOORDUNIT2 structure has a super value of WORLD (which indicates that this coordinate system's parent is world), then nothing much needs to be done. It simply copies your matrix into the workm matrix structure and sets the flg flag to 1 to say that workm is a valid matrix which can be used to change this coordinate system into world coordinates. If the super value isn't world, then it's got to do some calculations, as we'll see in a later section.

Setting up the matrix

Setting up matrix is simple. If you know how much you want to rotate your object before you translate, then call RotMatrix() which takes 3 angles (in an SVECTOR) and creates a RotX * RotY * RotZ 3x3 matrix (which, as you may recall, rotates first about the Z, then the Y, then the X axes). If all the angles are 0, then you'll get the identity matrix as a result anyway. So the common method is to store an SVECTOR with the 3 rotations, and before you draw the object, call RotMatrix to set up the rotation matrix.

In order to do scaling, you'd have to then apply the scaling transformation matrix by multiplying it with the new rotation matrix. But there's a simple trick. If you want to scale X by a factor of x, Y by a factor of y and Z by a factor of z, then all you need to do (after calling RotMatrix) is to multiply the diagonal entries in matrix by x, y and z:

matrix.m[0][0] *= x;
matrix.m[1][1] *= y;
matrix.m[2][2] *= z;

Since you usually want to scale the object by an equal amount in every direction, it's common for x, y and z to all be the same.

Setting the t[3] translation parts is easy too. t[0] is how much you want to move this object along the X axis, t[1] is for the Y, and t[2] is for the Z.

Finally, set the flg flag to 0. This tells the libraries that you've changed something, and so it knows to re-calculate workm (see below).

Objects with multiple coordinate systems

3D objects these days are fairly complex beasts. An object describing an animal or human is especially hard, because of all those dangly limbs. It's often easier to specify such a creature in many different parts that can all move separately. Each part has its own coordinate system, and each coordinate system has a parent. For example, say you specified a coordinate system where the upper arm was, and there was a child coordinate system used to describe the lower arm. If this child-parent relationship is kept, then whenever you move the upper arm, the lower arm will move too. But whenever you move the lower arm, it won't affect the upper arm (because the lower arm is a child of the upper arm, and so it doesn't affect it).

If you have a creature described by multiple coordinate systems with a child-parent relationship, then you can easily control the motion of all the limbs. To draw a child coordinate system, you have to convert its coordinate system to its parents. And the parent's coordinate system has to be converted to its parent's coordinate system. And so on ad infinitum until the you converted the coordinate system all the back up to the world coordinate system.

To convert between coordinate system X and world, the formula is defined recursively:

LocalToWorldMatrix(X) {
    if super == WORLD
        return X.matrix;
    else
        return X.matrix * LocalToWorldMatrix(X.super);
}

While matrix contains the matrix to convert between this coordinate system and its parent, workm contains the matrix to convert between local and world. If flg is set to one (and this is only ever done by the libraries), it means that workm contains the product of this matrix and all its ancestors matrices. That's why you have to reset it to zero whenever you change the matrix structure, because that product will have to be recalculated.

The Camera Coordinate System

That explains how to get objects into a single coordinate system (world), but there are still some unanswered questions - like "What matrix do I use to set up the GTE?"

Our model described before assumes that our eye is on the Z axis, at a distance of E. Of course, most people don't want this, they want an eye that they can point in an arbitrary direction, looking from an arbitrary position. Well, this is possible, but you don't want to know the maths. So we fake it instead.

For years, people thought that the Sun went around the Earth, when the opposite was true. But how do you know for sure that the Sun doesn't go around the Earth? It's a valid question, because the Sun going around the Earth and the Earth going around the Sun both look the same to someone standing on Earth. Similarly, if you moved all your objects in 3D space 10 units to the left, it would look the same if you moved the eye 10 units to the right.

We use this principle to change world coordinates into screen coordinates. We set up the position of the camera and the direction it is pointing, and the call GsSetRefView2() will create an appropriate matrix for us that converts from world coordinates and screen coordinates. So rather than moving the eye in relation to the world, we're going to move the whole world in relation to the eye!

So now the process is essentially complete.

You create a matrix LP (Local to Parent)that converts points in the local coordinate systems to the parent coordinate system. LP is stored in an object's GsCOORDUNIT2.
The libraries (GsGetLw() and GsGetLws() specifically) create another matrix LW (Local to World) using LP that converts between the local coordinate system and the world coordinate system.
The libraries (GsSetRefView2() and GsSetView2() specifically) also create a internal matrix WS (World to Screen) that converts between world coordinates and screen coordinates.
The GTE knows how to mung a matrix so that it contains the perspective transformation. But for simplicity, lets say it creates another one internally, P (Perspective), which is the perspective transformation.

So what you want is:

For(every point V in this local coordinate system) {
V' = V * LW * WS * P

    Screen X = V'.X
    Screen Y = V'.Y
    OT Z = V'.Z
}

There are library functions to calculate all these matrices for you:

P is taken care of by the GTE. Set up the value of E with GsSetProjection(). Call this once at the start of your program and forget about it.
LW is created by GsGetLw() or GsGetLws(). When you're doing lighting, the lighting matrix (another kettle of fish) has to be set up with LW. If you're not doing lighting, forget about it, you won't need it.
WS is created by GsSetRefView2 or GsSetView2. This takes an easy to understand camera postion/direction and turns it into a matrix that combines all the right transformations to get world coordinates into screen coordinates. WS is stored internally by the libraries. You only need to call GsSetRefView2()/GsSetView() if you want to change WS, which will only happen if you change the camera position/direction.
LS = (LW * WS) is created by GsGetLs(). This function will calculate LW for you and then multiply it by WS, then pass the result back to you. LS is the matrix you set up the GTE with.
Sometimes you want LW (for the lighting) and LS (for the GTE). GsGetLws() will calculate both at the same time for you.

MATRIX LocalToScreenMatrix;
GsGetLs(&myCoordUnit2, &LocalToScreenMatrix);
GsSetLs(&LocalToScreenMatrix);

You simply call GsSetLs() with LocalToScreenMatrix, and that's the matrix you need! So what did GsGetLs() do?

It checked to see if the workm matrix in the passed GsCOORDUNIT2 was valid. If it wasn't, then it would calculate the product of this GsCOORDUNIT2's matrix and all its ancestors matrix's to arrive at the correct result for workm, then it would set the flg to 1.
Now that it has a valid workm, and that's the LW matrix. It multiplies this by WS to get LS = LW * WS.
It returns LS.

What exactly is the GTE doing?

Let's assume you've set up your matrix LS in the GTE, and it looks like this:

A B C 0 = m[0][0], m[0][1], m[0][2], 0
D E F 0 = m[1][0], m[1][1], m[1][2], 0
G H I 0 = m[2][0], m[2][1], m[2][2], 0
J K L 1 = t[0] , t[1] , t[2], 1

Then the libraries (GsSortObject4) load in the a 3D point V:

(X Y Z 1) = V.vx, V.vy, V.vz, 1

It then performs a matrix multiply (don't worry about how this works, it's also optimising for the fact that there are lots of zeroes in the 4th column):

SX = A*X + D*Y + G*Z + 1*J
SY = B*X + E*Y + H*Z + 1*K
SZ = C*X + F*Y + I*Z + 1*L

That's the multiply over and done with. SX, SY and SZ now contain the point after it's been translated from local coordinates to screen. Now we do the perspective transform:

ScreenX = SX * E / SZ
ScreenY = SY * E / SZ
OTZ = (Average value of SZ for the last 3 points if it's a triangle, or last 4 points if it's a quadrangle)

ScreenX and ScreenY are what you use to put your polygons on the screen. OTZ is used to decide where in the ordering table this polygon should go.

You may have noticed that the perspective transform didn't look like the way I described it - there's a few maths operations missing there! This is an optimisation to reduce the number of maths ops involved. Instead of doing the perspective transformation as described near the top of this document, the GTE assumes the eye is at the origin and the screen is the distance E away along the Z axis. It makes the maths much simpler. You don't have to worry about this.

A typical loop

Onetime initialisation:
GsSetProjection(1000); // Set up E
GsSetRefView2(&MyReferenceCamera); // Set up internal WS matrix

For each object A{
    For each coordinate system B in object A {
        RotMatrix(&B.rotation, B.GsCoordUnit2.matrix);    // Set up rotation for this coordinate system in the matrix
        B.GsCoordUnit2.matrix.t[0] = B.position.x;            // Set up the translation for this object
        B.GsCoordUnit2.matrix.t[1] = B.position.y;
        B.GsCoordUnit2.matrix.t[2] = B.position.z;
        B.GsCoordUnit2.flg = 0;                                        // Tell Gs that we've changed the matrix.
        GsGetLws(&B.GsCoordUnit2, &myLWmatrix, &myLSmatrix);
        GsSetLight(&myLWmatrix);                                // Sets up the light matrix with the LW matrix
        GsSetLs(&myLSmatrix);                                    // Sets the main GTE matrix with the LS matrix

        For each TMD C which shares this coordinate system {
                GsSortObject4(C);                                    // Draw each object which uses this coordinate system
        }
    }
}

Conclusion

That's matrices and how to use them. For simplicity and practicality, it's not a 100% accurate tutorial on how matrices work, but you should now understand what the libraries are doing and what you should be doing when it comes to 3D.