Path: chuka.playstation.co.uk!news
From: Craig Graham <c_graham@hinge.mistral.co.uk>
Newsgroups: scee.yaroze.programming.3d_graphics
Subject: Re: GsLinkObject4 Confusion - On to GsSortObject4 - And Now CW Asm
Date: Mon, 31 Aug 1998 15:44:20 +0100
Organization: PlayStation Net Yaroze (SCEE)
Lines: 91
Message-ID: <35EAB6C4.26AD842B@hinge.mistral.co.uk>
References: <35EAA9A9.782E@funnytown.com>
NNTP-Posting-Host: d1-s34-66-telehouse.mistral.co.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 4.05 [en] (Win95; I)


Gerrit Goossen wrote:

> >Gerrit Goossen wrote:
> >
> >> >Ho hum...  I need direct GTE access and the ability to sort 2D primitives, and I
> >> >need it NOW!
> >>
> >> I actually spent quite a bit of time trying to access the GTE with
> >> inline ASM in CW, but for some reason CW doesn't want to let me do it! I
> >> wonder if this is also a problem with GCC, although it doesn't really
> >> matter since I'm on a Mac anyway...
> >
> >You cann't use GTE inline from the Yaroze version of CW. Or the
> >Yaroze GNU either.
>
> I love it when I do stuff that people tell me isn't possible!
>
> As it turned out, the only reason my code was crashing was because I
> wasn't putting a final nop after ja'ing. *None* of the examples in the
> CW manuals do this (and so they won't work as they appear in the
> manual!) and if I had been using GCC I believe it would have added this
> for me (?), but I finally I noticed that Metrowerks used an extra nop in
> _psstart.c and decided to give it a try... (So I'm relatively new to asm
> :^)

Ok, I'll re-phrase that. Doing GTE inline in Code Warrior or GNU C
is easy. However, Sony will jump on your balls for doing it when you
are working on the Yaroze, as the information to do it has either been:
1) Reverse engineered by someone very talented....
or
2) Ripped off from a pro-developers documentation / compiler setup.

either way, you're violating the Yaroze license and could concievably
get your yaroze membership revoked.


> Anyway, for any other members interested in R3000 ASM or using the GTE
> directly, here are a few places you might want to check out:

R3000 asm is pretty easy to find out about - buy the book'MIPS RISC Architecture' from
MIPS/SGI which documents it.


> <Page 13 of Your User Guide> - Page 13 explains register assigments
> which are of course extremely useful even though CW doesn't always
> appear to follow them (!?)

It does. Always. Believe me on this....

> <http://www.in-brb.de/~creature/codehack.htm> - A "sketchy" site with
> valuable info on how to access GTE functions. (From my understanding of
> our Yaroze license, we are free to get info from sites like these we
> just can't contribute to them. Sounds fair to me!)

I think you'll find you're wrong there, or else everyone would be usingthe last version
of EZ-O-RAY with the ripped off PsyQ debugger and stuff.
Someone at SCEE like to answer this one - I'm guessing here...

> FWIW, I've been able to speed up my gouraud fogging from over 400 hsyncs
> for 500 primitives to under 250 hsyncs. It's still too slow, but I'm
> making progress. If anyone's interested in the code that does this, say
> the word and I'll be happy to post it here. (*Especially* if you think
> you might be able to help me speed it up. There's *plenty* of room for
> optimization. ;)

Tips:1) Pre-transform all the vertices into the scratch pad, then do a lookup in there
when generating the output shading levels. Just save the transformed screen facing
Z distance (to save space if you've got lots of vertices, shift down and use a byte
for each screen to vertex distance).
2) Keep the calculation in a tight loop with the correct code address alignment
to fit into the I-cache.
3) In a second loop (still kept tight and I-cache aligned), just do lookups
to fill in the shading details in the TMD.

I guarantee you'll get better performance using this than you're getting now
I've got some environment mapping code that can do better than you're
getting from your fog code WITHOUT inlining the GTE commands at all,
and the algorithm is very similar - get a transformed vector for each poly vertex.
In the case of the env mapper, it's a transformed vertex normal and you have to save
two components of the result vector (one for U, one for V), in the case of
the fogging it's the transformed vertex position, and you only have to save one
component of the result vector (the Z depth) - so the fogging should be a bit
faster, and be able to do larger objects whilst still holding the pre-translated
object details in the scratch pad.

> - Gerrit

Craig.