Article 9833 of comp.graphics.rendering.renderman:
Path: news.lava.net!news.Hawaii.Edu!logbridge.uoregon.edu!news.maxwell.syr.edu!news.mel.connect.com.au!news.unimelb.edu.au!cs.mu.OZ.AU!bromage
From: bromage@cs.mu.oz.au (Andrew Bromage)
Newsgroups: comp.graphics.rendering.renderman
Subject: Re: Quest for Larry (or anyone else) regarding BMRT Interpreter....
Date: 9 Jul 1999 06:05:50 GMT
Organization: Computer Science, The University of Melbourne
Lines: 141
Message-ID: <7m43fu$n8j$1@mulga.cs.mu.OZ.AU>
References: <931208767.6390.0.nnrp-04.d4e417a3@news.demon.co.uk> <931388725.21530.0.nnrp-03.d4e417a3@news.demon.co.uk> <7m0q98$qrb$1@sherman.pixar.com> <3784662E.DDCB6C52@zorro.ruca.ua.ac.be>
NNTP-Posting-Host: mundook.cs.mu.oz.au
Xref: news.lava.net comp.graphics.rendering.renderman:9833

G'day all.

Stefan Wils <stefan.wils@zorro.ruca.ua.ac.be> writes:

>Just a question : is a stack-based VM really efficient for interpreting ?
>Since stack-based or 0-address machines generally have more instructions
>per program, and if the execution of each instruction (like mulPP or pushP)
>would be considered as *about* equally slow/fast, it might be reasonable
>to create an assembler set in which the number of instruction per program
>is kept to a minimum.

>I had the idea of a RISC-like machine with an "unlimited" number of
>registers.

IMO, your idea is a good one.  More thoughts follow this disclaimer.

Disclaimer: My academic background is in compiler implementation
(graphics being more of a hobby), so if I go into jargon here, my
apologies.  (BTW, yes, I'm half-heartedly working on a shading language
compiler plus virtual machine in my copious(!) spare time, and I'm
seriously considering a VM like this as the target language.)  In
addition, this is only my thinking on the matter to date, and I reserve
the right to change my mind at whim. :-)

The two most common interpreted VM formats are stack machines and
register transfer machines.  (BTW, "RISC" is a misnomer here, since
you can buy processors which use a RISC stack machine architecture.)

Most people use a stack machine because it's very easy to implement.
Using a stack machine not only makes implementing a simple working
compiler a hell of a lot easier (though it can make certain kinds of
optimisation harder), it also solves a number of run-time issues such
as:

1. Temporary storage
2. Saving temporary storage locations over function/procedure calls
3. Where to place arguments for function/procedure calls and any
   return values
4. Operations with different numbers of arguments

Issue (1) is straightforward to deal with if you don't have an upper
limit on the number of "registers".  (BTW, you don't need an infinite
number of registers.  You just need to know how many registers are
needed in advance so the interpreter can allocate sufficient storage
in one go; the compiler can work out this information easily enough.)

Since the RenderMan(tm) shading language doesn't really have functions
or procedures (they are instantiated at compile time and recursion is
disallowed), (2) and (3) are non-issues.

Issue (4) is dealt with below.

>The compiler would keep a register pool, to see which registers
>are free. If they are all currently in use, a new register would be created.

Reasonably good register allocation schemes can be almost as simple as
that.  If you really want to limit the number of registers (and it's a
good idea even in an interpreter because you want to minimise cache
misses) but are happy with a reasonable attempt, you can reorder the
code in as good an ordering as possible, then use registers just as if
they were stack elements (i.e.  allocate a new register where in a stack
machine you'd perform a "push" operation, and deallocate the register
where in the stack machine you'd perform a "pop" operation).

Anyway, this is all covered in any standard compiler text, such as the
dragon book[1] or the tiger book[2].

>Then something like (this might or might not be pure SLC :-)
>	
>	pushF r[0]
>	pushF r[1]
>	mulFF
>	popF r[2]

>Could be replaced by:

>	mulFF r[2], r[1], r[0]

Issue (4) is a problem here if you insist on VM instructions which
have a fixed number of arguments each.  Consider how you'd implement
spline() or printf(), for example.  They can have arbitrarily many
arguments coming from all sorts of different sources (such as compile-
time constants, shader parameters, computed values; this is important
because if the parameters to spline() were all compile-time constants,
you could get away with just pointing to a table of static values).  In
a stack machine, it's easy to implement since you can just push as many
arguments as you have.

In a register transfer language, there are several solutions I can
think of:

	- Generate a call to a function to handle it.  This seems
	  like an unnecessary complication to add to a language
	  which doesn't need function calls for any other reason.
	- Implement some kind of aggregate structure (e.g. a mutable
	  array-like type) and pass that as one argument.  Again, it
	  seems a bit wasteful to add it to a language which otherwise
	  doesn't need it.
	- Have some opcodes with variable numbers of arguments.  Since
	  we're not even pretending to want to generate assembly
	  language, this seems the most acceptable solution.

>When the shader is then loaded, it could be parsed:

>	- each instruction being replaced by the address of the executing
>	  function, to minimize runtime overhead

This is called "threaded code", and it's a very common compilation
scheme.  It's also a way to build a (fairly) cheap JIT compiler: simply
generate code at interpretation-time to do the function calls.

>	- each instruction parameter being replaced by an index

Even better, replace it by a pointer to the register contents.

>	- oneexecution environment is prepared for all instances

You'd need to be able to have more than one instance of a given
shader active in an implementation which uses ray tracing because the
trace() function could invoke a shader of the same type, and execution
has to be resumed when trace() returns.  It would also apply if you
wanted to implement an extension to the shading language so that
shaders can invoke other shaders.  (IMO, a sorely missing feature.)

>Of course, this is just an idea; I've never tried this or anything else :-)
>Anybody have an opinion on this design ???

I think it's a good one, so long as you realise that stack machines
solve a lot of small niggly problems which using a register transfer
design need to be addressed.

As always: IMHO.  Contents may settle in transit.  YMMV.

Cheers,
Andrew Bromage

   [1]	"Compilers -- Principles, Techniques and Tools"
	by Aho, Sethi and Ullman, Addison-Wesley, 1988

   [2]	"Modern Compiler Implementation in {ML,Java,C}"
	by A.W. Appel, Cambridge University Press, 1998