Path: chuka.playstation.co.uk!news From: "Alex Herbert" Newsgroups: scee.yaroze.freetalk.english Subject: Re: How the texture cache works Date: Thu, 13 Sep 2001 17:52:22 +0100 Organization: PlayStation Net Yaroze (SCEE) Lines: 110 Message-ID: <9nqo7h$2qp5@www.netyaroze-europe.com> References: <99r1f2$dkq1@www.netyaroze-europe.com> NNTP-Posting-Host: host213-123-132-129.in-addr.btopenworld.com X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.50.4522.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Hi Martin, I didn't spot this post until Derek pointed us here (see 'Misc NY short questions', 13/09/01). Very interesting indeed... I have a few questions: Do you have timings for 4-bit textures? When you give timings for 'all misses', is that just the t-cache or does that include the CLUT cache? i.e. did you switch the CLUT position when you switched the texture? I'd be interested to see the actual textures used in the tests, especially the paletted ones. Did the 8-bit textures actually reference all 256 colours? The reason I ask is because a 16x16x8 sprite with only 2 colours actually referenced will generate 254 CLUT cache hits and only 2 misses, whereas if all colours are used, there will be 0 CLUT cache hits and 256 misses. That would make a big difference to the rendering time. Oh, this all assumes the CLUT wasn't already in the cache of course. Alex "Martin Keates" wrote in message news:99r1f2$dkq1@www.netyaroze-europe.com... > Hi all, > > I've been doing some experimenting to try and figure out the inner > workings of the texture cache (probably other people have done > so before me - is there any documentation on the website about > it at all?). I'm not saying that this is right, just what I've inferred from > my timings and the behaviour I've encountered. > > Anyway: the t-cache size is 64x64 pixels for 4-bit, 64x32 for 8-bit > and 32x32 for 16-bit. There ends the documentation that I could > find... > > So how does it work? As far as I can tell, each row and column of > the cache keeps track of the texture page offset last used, but they > all work to MOD cache size. So, supposing we are using 8-bit > textures and have a cache size of 64x32 pixels. A 16x16 texture at > (0,0) on a texture page would map to (0,0) on the cache, but so > would a texture at (0,32), (0,64), (64,32), (64,64) etc. etc. so > changing between textures at those points will cause a lot of cache > misses. On the other hand, a texture at say (16, 108) would map to > (16,12) in the cache which isn't a conflict with (0,0) to (15,15) and > so we could alternate two textures with optimal performance. > > So, for 16x16 textures we can have 8 textures anywhere in the texture > page as long as they MOD to distinct cache addresses. And it's > cleverer than that: textures can wrap around to the other side of the > cache no problem (e.g. with a 64x32 texture at (10,10) the texture > would look pretty mangled mapped to (0,0) but it doesn't matter), and > only the offending pixels are changed during conflicts (e.g. two 16x16 > textures at (0,0) and (0,18) would only have 2 lines conflicting and > would have much less of a performance hit than two completely > conflicting textures). > > All jolly nice, but when do you get cache misses? Well, apart from > overlapping textures as described above, changing the texture page > invalidates the entire texture cache, and changing between texture > depths causes misses too. > > Why worry about this then? Because it can triple your rendering > time if you get it wrong! Actual rendering times are very sprite > specific (probably dependent on the the amount of pixels rendered), > but for drawing 1000 sprites using two textures I get (times in hsyncs): > > 32x32x16 -> no misses: 438, all misses: 1455 > 32x32x8 -> no misses: 366, all misses: 906 > > 16x16x16-> none: 150, all: 433 > 16x16x8-> none: 114, all: 263 > > 4-bit renders at the same speed as 16-bit with no cache misses, but > isn't as bad with them (it's much easier to keep all your textures in the > cache using 4-bits anyway). > > I did some tests using zoom/rotate as well: > > 32x32x16 (mag*0.5)-> none: 183, all: 739 > 32x32x8 (mag*0.5)-> none: 177, all: 470 > > 16x16x16 (mag*2)-> none: 542, all: 840 > 16x16x8 (mag*2)-> none: 525, all: 650 > > The set up time for these tests was 585, so generally you're going to > be waiting for the CPU rather than the GPU if you use a lot of zoomed > sprites. > > Note that the performance hit drops off pretty fast - if you can string > 5 or 10 cache hits in a row together you'll only be about 20-30% off > optimum rather than 200%. > > So... is this right? > Was this all obvious? Did everyone just know it anyway? > Is there any documentation about this on the web already? Or a post > in one of the newsgroups? > > cheers, > Martin. > > > > > > >