Path: chuka.playstation.co.uk!toby From: toby@angst.forefront.com.au (Toby Sargeant) Newsgroups: scee.yaroze.freetalk.english Subject: Re: Speed Optimisation Date: 24 Feb 1998 04:32:15 GMT Organization: PlayStation Net Yaroze (SCEE) Lines: 124 Message-ID: References: <34F1D481.5FFF@mdx.ac.uk> <34F22032.A62E0C6C@netmagic.net> NNTP-Posting-Host: ns.forefront.com.au X-Newsreader: slrn (0.9.4.6 UNIX) On Mon, 23 Feb 1998 17:19:46 -0800, Elliott Lee wrote: >[...] >> - Only draw the part of the world that is immediately visible >> (in 3D games). > >If you can divide your world into logical units/buckets, you'll be able >to save rendering time by identifying only those within a certain visual >range. You could set the fog paramter to obscure the farther objects so >they don't "pop" into the visual space. I liked Tomb Raider II's >approach---distant objects go completely black. Has anyone considered implementing a BSP renderer for the yaroze? The big trick is going to be getting away, at least partially, from those all pervasive ordering tables. One big question I guess, is whether BSP is geared towards platforms that are render bound, rather than compute bound. If it can be done, though, the results could be very nice indeed. Freedom from z sorting artifacts and intersecting polygons is a definite plus. >> - Define small often called functions as macros wherever >> possible. > >That's great if you don't mind larger code. Used sparingly, yeah, that >works good. > >> - Use variables which are the same size as the registers (eg >> unsigned long) where ever possible. >> - Use lookup tables to replace calculations where calculations >> are expensive (eg: floating point, trig functions etc) >> - Otherwise avoid any floating point arithmetic. > >You mean things like fixed-point tables? > > [...] >> And its possible that some traditional optimisation techniques may be >> inappropriate: >> - eg: loop unrolling was a good optimisation on primitive >> architectures but not neccessarily more modern ones. >> >> Peter. > >Actually, some compilers (when you specify certain optimisation flags) >will do a few tests on their own and unroll loops up to a certain >threshhold. If you are really desperate for speed, I suppose you could >do a little loop unrolling... I would imagine that you can get some speed increases out of loop unrolling on an R3000, because of the delay slots introduced by the branch. if it's a loop that iterates over a small amount of code a large number of times, that delay slot can have a very large effect on the speed of execution, if the compiler can't find anything useful to do with it. You have to weigh this up against the size of the instruction cache, though, and I haven't found any information about the on chip caches of the R3000. >Something commonly overlooked is pointer dereferencing in things like >arrays. If you're going to be doing lots of testing, it's usually >best to store the values into temporary variables. e.g. > >Unless the compiler is really smart, every test must calculate >the offset in the ground[][] array. Do all the dereferencing once >and get some good savings: If your compiler doesn't pick this up and optimise it to death, then switch compilers. Using gcc -O3 the following code produces ix86 code that looks pretty much spot on in terms of possible optimisations (at least to my somewhat untrained eye). There's certainly no duplicated calculation. the difference between using and not using -O3 is pretty dramatic. each line in the inner loop was cut from 39 lines of assembler to 3. In fact, adding explicit veriables to hold the addresses of a[y][x] and the value of a[x][y], the compiler produced _worse_ code at optimisation -O3 (but much better code with optimisation off). Depending on your target CPU, a bigger issue is cache hit rate. Effective use of the D-Cache should be very important. I'm a bit wary of the belief that sticking the stack in the D-Cache is the best use it can be put to. If a section of code does a lot of manipulation of an array, for example, it would be a big advantage to have that array stored in the cache, whereas parameters passed to functions can often be kept in registers. having your stack in the cache speeds up all of your code by a little bit, but theoretically, using it to store data can speed up small sections of your code a lot. and given the of adage that 10% of the code takes 90% of the time, speeding up that 10% a lot is much more useful. The other thing is, of course, that 90% of the possible optimisations in almost any code are at the level of algorithms and data structures. There's very little point going after the 10% until you're sure that you've already optimised the first 90%. When optimising the last 10%, gcc -S is your friend. It'll produce machine code from your .c files in the corresponding .s files. matching the two together, it becomes pretty obvious where the compiler is producing bad code. Then, just play around until the compiler generates better code, or rewite it yourself in assembler. main() { int x,y; static int a[10][10]; for (x=0;x<9;x++) { for (y=0;y<9;y++) { a[x][y]=(x+y)%5; } } for (x=0;x<9;x++) { for (y=0;y<9;y++) { if (a[x][y]==0) a[y][x]++; if (a[x][y]==1) a[y][x]+=2; if (a[x][y]==2) a[y][x]+=3; if (a[x][y]==3) a[y][x]+=4; if (a[x][y]==4) a[y][x]+=5; if (a[x][y]>5) a[y][x]=0; } } } > >My $0.02, >- e! > tenchi@netmagic.net > http://www.netmagic.net/~tenchi/yaroze/ and mine.. Toby. (S, rather than H :) )