Path: chuka.playstation.co.uk!news
From: "Martin Keates" <martin@jabadaw.fsnet.co.uk>
Newsgroups: scee.yaroze.freetalk.english
Subject: How the texture cache works
Date: Tue, 27 Mar 2001 22:42:57 +0100
Organization: PlayStation Net Yaroze (SCEE)
Lines: 83
Message-ID: <99r1f2$dkq1@www.netyaroze-europe.com>
NNTP-Posting-Host: modem-136.sodium.dialup.pol.co.uk
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300

Hi all,

I've been doing some experimenting to try and figure out the inner
workings of the texture cache (probably other people have done
so before me - is there any documentation on the website about
it at all?). I'm not saying that this is right, just what I've inferred from
my timings and the behaviour I've encountered.

Anyway: the t-cache size is 64x64 pixels for 4-bit, 64x32 for 8-bit
and 32x32 for 16-bit. There ends the documentation that I could
find...

So how does it work? As far as I can tell, each row and column of
the cache keeps track of the texture page offset last used, but they
all work to MOD cache size. So, supposing we are using 8-bit
textures and have a cache size of 64x32 pixels. A 16x16 texture at
(0,0) on a texture page would map to (0,0) on the cache, but so
would a texture at (0,32), (0,64), (64,32), (64,64) etc. etc. so
changing between textures at those points will cause a lot of cache
misses. On the other hand, a texture at say (16, 108) would map to
(16,12) in the cache which isn't a conflict with (0,0) to (15,15) and
so we could alternate two textures with optimal performance.

So, for 16x16 textures we can have 8 textures anywhere in the texture
page as long as they MOD to distinct cache addresses. And it's
cleverer than that: textures can wrap around to the other side of the
cache no problem (e.g. with a 64x32 texture at (10,10) the texture
would look pretty mangled mapped to (0,0) but it doesn't matter), and
only the offending pixels are changed during conflicts (e.g. two 16x16
textures at (0,0) and (0,18) would only have 2 lines conflicting and
would have much less of a performance hit than two completely
conflicting textures).

All jolly nice, but when do you get cache misses? Well, apart from
overlapping textures as described above, changing the texture page
invalidates the entire texture cache, and changing between texture
depths causes misses too.

Why worry about this then? Because it can triple your rendering
time if you get it wrong! Actual rendering times are very sprite
specific (probably dependent on the the amount of pixels rendered),
but for drawing 1000 sprites using two textures I get (times in hsyncs):

32x32x16 -> no misses: 438, all misses: 1455
32x32x8 -> no misses: 366, all misses: 906

16x16x16-> none: 150, all: 433
16x16x8-> none: 114, all: 263

4-bit renders at the same speed as 16-bit with no cache misses, but
isn't as bad with them (it's much easier to keep all your textures in the
cache using 4-bits anyway).

I did some tests using zoom/rotate as well:

32x32x16 (mag*0.5)-> none: 183, all: 739
32x32x8 (mag*0.5)-> none: 177, all: 470

16x16x16 (mag*2)-> none: 542, all: 840
16x16x8 (mag*2)-> none: 525, all: 650

The set up time for these tests was 585, so generally you're going to
be waiting for the CPU rather than the GPU if you use a lot of zoomed
sprites.

Note that the performance hit drops off pretty fast - if you can string
5 or 10 cache hits in a row together you'll only be about 20-30% off
optimum rather than 200%.

So... is this right?
Was this all obvious? Did everyone just know it anyway?
Is there any documentation about this on the web already? Or a post
in one of the newsgroups?

cheers,
Martin.