Path: chuka.playstation.co.uk!news From: James Russell Newsgroups: scee.yaroze.programming.libraries Subject: Re: D-Cache Date: Mon, 20 Jul 1998 14:21:57 +0100 Organization: Sony Computer Entertainment Europe Lines: 77 Message-ID: <35B34475.D898F644@scee.sony.co.uk> References: <359E82E9.1134@dial.pipex.com> <01bdafbf$5c177f20$f2e832a2@gbain.wav.scee.sony.co.uk> <35AF051F.708C@dial.pipex.com> <35AF1307.C1D4198B@scee.sony.co.uk> <35B33151.6DE1D612@easynet.co.uk> NNTP-Posting-Host: mailgate.scee.sony.co.uk Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 4.5b1 [en] (Win95; I) X-Accept-Language: en Philip Gooch wrote: > > Anyone mind if I jump in here? Just taking up on your comment here, about putting the stack on the > D-cache - in what cases would you want to do this, and when? Would you put in this code: > > __asm__ volatile ("sw $29,(savesp)"); > __asm__ volatile ("la $29,0x1f8003f0"); > > before a particular function call, and this: > > __asm__ volatile ("lw $29,(savesp)"); > > afterwards? Yes, that looks right. The first bit saves the current stack pointer and loads in the new one, and the second bit restores the old stack pointer. > For what sort of functions would I want to do this? Well, for starters, as a general rule I wouldn't call any Yaroze library functions while your stack is on the D-Cache, which probably rules out a few functions that you want to speed up. Many of the library functions don't change the stack to the D-Cache, but some do to get extra speed. If your program is running with a D-Cache stack and you call a library function which resets the stack to the D-Cache too, your program will crash and burn because the new stack will overwrite the old one. GsSortObject4 doesn't reset the stack (to my knowledge), but takes as a parameter a 'scratch' area to use for its intermediate workspace. If you've followed the sample code, you'll see that they use getScratchAddr(0) for this scratch area, which is a macro that points to the start of the D-Cache. To be honest, I can't think of any obvious examples where using the D-Cache _as_a_stack_ would bring you a huge speed increase. But here's 3 reasons: 1) If you're writing a function that uses a lot of local variables (more than the number of registers available), then those variables will be allocated on the stack (and hence on the D-Cache), and therefore they'll go a bit faster. 2) If you are doing some major processing on a local array which is less than 1K, then having the stack on D-Cache will (generally) increase the speed of that function. 3) If you are doing a tree traversal (depth/breadth first, that sort of thing) which involves a lot of recursive function calls, then having the stack on D-Cache will be faster. The only proviso is to make sure that there aren't too many local variables and/or the tree is not too deep, or you'll overflow the D-Cache! The D-Cache isn't a true cache in the usual sense of the word. A normal cache will _transparently_ store the most recently used lines of RAM to increase speed. The D-Cache is more like a really fast area of memory, but it's only 1K long. Thus it's up to the programmer to explicitly load and store parts of this memory, which is why most people set up their stack on it, because it gives an instant speed increase to local variable access. If you want to process a global/static array, it's going to be stored on the heap and so you'll have to transfer it to D-Cache before you start, and transfer it back after you finish. This transfer overhead is only worth it if you're going to be accessing each element of the array more than twice. This is certainly the case if you're doing some image processing (like the flame/water effects). The first heuristic of optimisation is to optimise the biggest timewaster. Back in the days when I was writing Unix database code, I managed to speed up a debugging function that was used twice in every function by a factor of 8. But since 90% of the time was spent preparing and parsing the SQL, the speed increase from the new function hardly made a dent in the performance. The lesson there is that you should concentrate on optimising the component which takes the longest time to complete. If you want to time various parts of your code, use the VSync(-1) call or the Root counters. Run various important pieces of code in a loop a million times and see how many VSyncs each part takes. That will give you some idea of the proportion of time that code is taking. Cheers, James -- == James_Russell@scee.sony.co.uk +44 (171) 447-1626 == Developer Support Engineer - Sony Computer Entertainment Europe "Weaseling out of things is what separates us from the animals!... Except the weasel." -- Homer