Path: chuka.playstation.co.uk!news
From: James Russell <James_Russell@scee.sony.co.uk>
Newsgroups: scee.yaroze.programming.libraries
Subject: Re: D-Cache
Date: Mon, 20 Jul 1998 14:21:57 +0100
Organization: Sony Computer Entertainment Europe
Lines: 77
Message-ID: <35B34475.D898F644@scee.sony.co.uk>
References: <359E82E9.1134@dial.pipex.com> <01bdafbf$5c177f20$f2e832a2@gbain.wav.scee.sony.co.uk> <35AF051F.708C@dial.pipex.com> <35AF1307.C1D4198B@scee.sony.co.uk> <35B33151.6DE1D612@easynet.co.uk>
NNTP-Posting-Host: mailgate.scee.sony.co.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 4.5b1 [en] (Win95; I)
X-Accept-Language: en

Philip Gooch wrote:
> 
>   Anyone mind if I jump in here? Just taking up on your comment here, about putting the stack on the
> D-cache - in what cases would you want to do this, and when? Would you put in this code:
> 
>         __asm__ volatile ("sw $29,(savesp)");
>         __asm__ volatile ("la $29,0x1f8003f0");
> 
>    before a particular function call, and this:
> 
>         __asm__ volatile ("lw $29,(savesp)");
> 
> afterwards?

Yes, that looks right. The first bit saves the current stack pointer and loads in the new one, and
the
second bit restores the old stack pointer.

> For what sort of functions would I want to do this?

Well, for starters, as a general rule I wouldn't call any Yaroze library functions while your stack
is on the D-Cache, which probably rules out a few functions that you want to speed up. Many of the
library functions don't change the stack to the D-Cache, but some do to get extra speed. If your
program is running with a D-Cache stack and you call a library function which resets the stack to
the D-Cache too, your program will crash and burn because the new stack will overwrite the old one.

GsSortObject4 doesn't reset the stack (to my knowledge), but takes as a parameter a 'scratch' area
to use for its intermediate workspace. If you've followed the sample code, you'll see that they use
getScratchAddr(0) for this scratch area, which is a macro that points to the start of the D-Cache.

To be honest, I can't think of any obvious examples where using the D-Cache _as_a_stack_ would bring
you a huge speed increase. But here's 3 reasons:

1) If you're writing a function that uses a lot of local variables (more than the number of
registers available), then those variables will be allocated on the stack (and hence on the
D-Cache), and therefore they'll go a bit faster.
2) If you are doing some major processing on a local array which is less than 1K, then having the
stack on D-Cache will (generally) increase the speed of that function.
3) If you are doing a tree traversal (depth/breadth first, that sort of thing) which involves a lot
of recursive function calls, then having the stack on D-Cache will be faster. The only proviso is to
make sure that there aren't too many local variables and/or the tree is not too deep, or you'll
overflow the D-Cache!

The D-Cache isn't a true cache in the usual sense of the word. A normal cache will _transparently_
store the most recently used lines of RAM to increase speed. The D-Cache is more like a really fast
area of memory, but it's only 1K long. Thus it's up to the programmer to explicitly load and store
parts of this memory, which is why most people set up their stack on it, because it gives an instant
speed increase to local variable access.

If you want to process a global/static array, it's going to be stored on the heap and so you'll have
to transfer it to D-Cache before you start, and transfer it back after you finish. This transfer
overhead is only worth it if you're going to be accessing each element of the array more than twice.
This is certainly the case if you're doing some image processing (like the flame/water effects).


The first heuristic of optimisation is to optimise the biggest timewaster. Back in the days when I
was writing Unix database code, I managed to speed up a debugging function that was used twice in
every function by a factor of 8. But since 90% of the time was spent preparing and parsing the SQL,
the speed increase from the new function hardly made a dent in the performance. The lesson there is
that you should concentrate on optimising the component which takes the longest time to complete.

If you want to time various parts of your code, use the VSync(-1) call or the Root counters. Run
various important pieces of code in a loop a million times and see how many VSyncs each part takes.
That will give you some idea of the proportion of time that code is taking.


Cheers,

James

-- 
== James_Russell@scee.sony.co.uk                +44 (171) 447-1626
== Developer Support Engineer - Sony Computer Entertainment Europe

"Weaseling out of things is what separates us from the animals!...
        Except the weasel." -- Homer