The Virtual Reality Modelling Language (VRML)
and Quality
Speed of change in computer hardware
It is a characteristic of developments in computer hardware that their
speed can be so great that the "shelf life" of any software product can
be very limited. This is in itself an innovation to which we must
become accustomed if the fruit of work such as the Borobudur Project is
to have a reasonable life-span. But it is a characteristic only of
those realms touched by advanced technologies. Compare the motor
industry: a prized Land Rover Series I (made in 1956) has more-or-less
the same functionality as current models, incorporates greater
robustness, is still going strong, and finding spare parts is no
problem, because of the forward compatibility built into the
development cycle. There are plentiful arguments why this could not
work in the computer industry; but few people would wish to be
categorical about if (let alone when) some kind of stability (let alone
statis) might be reached which would make the following paragraphs
unnecessary.
Developments in functionality are especially acute in areas of
computing that rely for their sophistication on devices where
price/performance is improving rapidly almost from month to month.
Thus VRML, which depends for its efficiency and reach on memory, disk
storage, and network speed, could not even have been invented in the
days when 64kb was a lot of memory for a minicomputer to possess. Any
more-or-less exacting description of the real world, such as VRML by
its very name suggests it can deliver, requires substantial resources.
Why VRML is so expensive in resources
VRML promises to construct a replica of the world (indeed, its files
have the suffix ".wrl" and are called'worlds") by the careful
description of 3D forms, their surface characteristics, and their
relative positions in space. Specific details of just how this are
done are available in any VRML programming manual, and the following
account is no more than a brief sketch addressed to non-specialists.
It is a characteristic of the world that we see with our eyes and
interpret with our brains (neither touch nor smell are yet part of bthe
VRML protocols!) that it is detailed (i.e. susceptible to microscopic
examinatgion), and subtly modulated by colour, texture, lighting
(natural and artificial). With outdoor scenes, as atmospheric
conditions change, then so do colouring, texture and lighting. As we
move around our physical world, eyes and brain take in ever-new and
ever-changing fields of view on objects and perspectives, and adjust to
them in various ways.
All these factors can indeed be taken into account in constructing an
artificial world - consisting of one or more .wrl files - in VRML. If
the "building blocks" of the real world are three-dimensional
molecules, then those of the VRML world - the smallest discrete
elements that VRML can describe - are two-dimensional polygons, which
can be coloured and textured. In its turn, any polygon is constructed
from lines of a certain length, thickness and direction, called
vectors. Any 3D object is made up of a number of such polygons, so
arranged according to the rules of perspective to give the appearance
of depth and solidity. The more complicated the shape, the more
polygons are required: a young child's building brick, viewed from any
angle, requires no more than three polygons; whereas the same child's
plastic ball, a honeycomb so that small hands may grip it, would
require several hundred. Nor can we emulate set designers working in
proscenium-arch theatres, and cut corners by not painting the backs of
our sets, or bulking them out into three dimensions because we know the
theatre audience cannot get behind the sets, which are in any case
painted and arranged to appear suitably three-dimensional from the
fixed angles restricted by the theatre seating. The VRML user is a free
agent, so everything must be three-dimensional, capable of being
manipulated in space - i.e. visited - by the user.
The more complicated the objects-in-space are, the more work the
computer has to do in order to present them correctly in the window of
the web browser for the user. The world is more complicated than a
child's building brick, or even a child's ball. So how would VRML get
on with Cezanne's bowls of fruit, or with Monet's haystacks? "Fixed" on
canvas in a particular perspective, either would require thousands if
the result were faithfully to represent what was seen. Now imagine
those changes of light which intrigued both artists, a change of
viewpoint, or even one of the pieces of fruit falling to the floor, or
parts of the stacks blowing away in a high wind.
Film can represent with ease any rapid actions, such a falling apple,
by photographing many frames per second (25 or 30, up to several
thousand if necessary). VRML is computer-intensive because it must
represent every change of position, angle, texture and lighting as any
object changes its position - and each time the user manipulates the
viewpoint represented by the "window" of the web browser, every single
object seen changes position relative to every other object, with some
now being partly concealed, and others now becoming visible for the
first time. Compare the distinction between a digital still photograph
and digital video: for the former, the computer has only to load and
display one image; whereas for the latter, 25 must be loaded and
displayed every second (in PAL - Phase Alternate Line, the
British/Australian encoding for TV and hence video). If VRML is to get
anywhere near the fluency of 25 feet-per-second video, remember that it
must do all the work that digital video requires - but also calculate
the relative position, orientation and texturing of every polygon for
every slightest change (the equivalent of video "frames") it sends to
the viewer. Even without calculating the exact number of polygons
required, it may intuitively be assumed that the VRML user who knocks
the whole of Cezanne's bowl of fruit off the table is in for a
substantial wait as the program computes what each does - and what the
viewer sees - as they fall to the floor, bounce, and eventually roll to
rest.
This is not a case of our waiting around for software which will
somehow "conquer" the problem. It will not. There is no doubt that the
old adage applies here, namely that "if actions can be adequately
described, then the computer can be programmed to execute them". VRML
offers a syntax for defining the world as Virtual Reality (which is not
to say that updates and competitors to VRML will not be faster and
easier to program). But the real questions are ones simply of speed for
the computer and convenience for the user: at what processing cost may
the world we see be represented? With what time delay? And with what
illusion of reality? Are we happy with it being "virtually" (in the
slang sense) like our world, or do we demand Virtual Reality in the
sense of something we cannot tell from the Real Thing? That is, a kind
of Turing Test for computer graphics, an update to Turing's experiment
to see whether human beings could tell whether they were conversing
with a hidden person or an equally hidden computer.
VRML pushes current machines to their limit
As machines get relatively cheaper and more powerful, and as networks
get faster aqnd connectivity more effortless (thanks largely to web
browsers and their plugins), we might imagine the Nirvana of a perfect
operating environment to be within our grasp. Nevertheless, the
opportunities presented by VRML programs for the conjuring of Virtual
Reality still remain restricted not by the flexibility of the program
and its protocols, but by the limits of what current machines and
networks can provide. Even a moderately "rich" VRML environment will
display slowly and jerkily on one of today's moderately powerful
machines - say, a 233Mhtz Pentium with 64kb of memory and a 10mbs
ethernet connection.
Hence a dilemma for the producers of a VRML environment such as
Borobudur: what target are they trying to hit? The makers of the 1956
Land Rover had no such problems: the aimed at a robust, long-lasting
product, and were successful. But are we, for the Borobudur Project,
designing a product that works well on the Pentium 233 noted above? Or
are we aiming at a product that will survive the test offered by the
recent Pentium 500Mhtz, or my the year-after-next's Pentium 2000Mhtz
machines?
Divide and Rule: Building "Elasticity" into the Borobudur Project
The answer is easy to state but difficult to implement. Of course
Borobudur should run on today's moderately powerful machines; but we
should also contrive a setup which will allow it to perform respectably
and interestingly on more powerful machines next year or the year after
that. Readers who are un convinced that there is indeed a dilemma here
should look at a few CDROMs (perhaps of encyclopaedias) from the
mid-1990s: not only are the data frequently restricted in scope, and
any hypermedia objects and links crude; but the interface will probably
appear quaint and old-fashioned. This is a function of the pace of
change in hypermedia, all the more startling when compared with - say -
the mono- or duo-media book: the Renaissance productions of Aldus
Manutius have lasted their 500 years well, their difference from modern
productions being merely style, and neither functionality nor reach.
The prescription for the "survivability" of Borobudur as VRML must
therefore be flexibility. It must work well enough today, but also
tomorrow and next year. development has of course been on a fast
Pentium 400 machine, but with testing across the (nominally) 10Mbs
ethernet on a Pentium 122Mhtz (decidedly the machine for the rest of
us).
There are several salient features of Borobudur which begin to suggest
solutions for how to tackle the "elasticity" techniques:
- The stupa is large, complicated and intricate, but just about
regular in its construction (it doesn't have true biaxial symmetry, but
not far from it). This meant it was imperative to divide the stupa
gallery by gallery and section by section, since ordinary computers
cannot for the forseeable future handle the whole stupa as one file.
The stupa's regularity makes such division easily understandable -
hence the division of each of the four galleries into four quarters,
each with a main wall and a balustrade wall individually displayed
(i.e. 32 sections), plus stupa terrace and basement (another 8
sections), making 40 sections in all;
- The host of Buddha statues, those in niches on each gallery, and
within those the pierced stupas on the circular terrace are each
members of a "family" of such similarity that the same bitmap could
conceivably be used time and again, changed only for the terrace
Buddhas where the gesture depends on the statue's N-S-E-W orientation.
Much the same applies to the other 3D elements - namely the Makara
heads, the other various beasts, and the doorways;
- All the myriad relief panels are rectangular, except where some of
the component blocks have been lost. Equally important, sizes and
proportions (although different from gallery to gallery) are consistent
within each gallery, whether on the main wall or on the balustrade
wall. Since all the reliefs are photographs (bitmaps - where every
picture element or dot on the computer monitor has a separate value),
this means that slotting them into the structure is easy, because no
calculations have to be performed for the task.
- There are large quantities of 2D decorative panels interspersed with
the figured reliefs. Each one in a sequence is often subtly different
from its immediate neighbours (reflecting, in part, their carving by
different teams). So little would necessarily be lost from the impact
of the work (and much processing would be saved) if "typical" panels
were inserted as bitmaps where required;
- The profile of Borobudur, from wherever it is viewed, is very
sophisticated - Buddha niches, Makara heads, exquisite balustrades, and
complicated frameworks for the reliefs. This, the structure, must be
constructed from vector graphics - from the polygons discussed above.
So the more sophisticated (read: "accurate") one wants to make the
structure, the more polygons and hence vectors are required.
From these five features the following conclusions can be reached:
- The stupa must in any case be "chopped up" for processing
and hence for viewing, because it is too big to deal with as
just one file;
- 3D statues, because they are "types", may be represented as
by one of their number, thus avoiding an exact relationship
with reality and saving on processing and display time;
- The figured reliefs slot in as easily at one resolution as
at another - so we can vary the dimensions of the reliefs
delivered to accord with the network and the user's machine;
- The sophisticated profile of Borobudur must be in vectors (-
the skeleton on which the "flesh" of the bitmaps is hung) and
will consume enormous processing resources if we attempt to
represent it with complete accuracy.
- and these lead in their turn to our "elastic" solution, which in
essence allows the product to grow in complication to satisfy the
appetites of more powerful machines:
- The bitmaps are available in four different resolutions: for
sophisticated viewing, with the largest being the size at which the
reliefs were photographed, namely 1.6 megapixels - roughly A3 size on a
computer monitor; and also in demonstration mode with very small
bitmapped images, suitable for use across the World Wide Web in those
instances where response times are not good. This should allow those
interested in the Borobudur Project to get some weak flavour of its
contents, and then view the full-strength Project when they obtain the
CDROM;
- The armature of the stupa (the "framing" for the bitmaps) is
presented on the Web and on the CDROM in a form sufficiently
abbreviated to be viewable even on a 233Mhtz machine. But alternative
.wrl files are also available which model the profiles of the various
stupa areas much more accurately GIVE DETAILS;
- Experiments are being conducted with loading bitmaps only when they
are required, rather than in bulk at the beginning of a "gallery run".
Reliefs visible at the start of a run would disappear as the viewer
turned the corner; as this happened, the reliefs behind would
disappear, and those to the front would be displayed. For obvious
reasons, this technique works only on guided tours, or where the user
restricts the viewpoint to within one gallery AJAY: HAVE I GOT THIS
CORRECT?
- In addition to the division into sections enumerated in A above, it
is possible for the viewer to process along a gallery and view the main
wall to the left and the balustrade wall to the right. This is
evidently for machines and networks with the power to handle the
processing required;
We may therefore picture four "specimen" types of users, two at the
publication date of the Borobudur Project (May 1999), and two a year or
more thence:
- USER ONE is viewing Borobudur across the web, and has chosen to
manipulate a small set of figured reliefs within the simplest possible
framework for the structure;
- USER TWO has a Pentium 400, and can handle much larger figured reliefs,
whether across the web or from the CDROM;
- USER THREE in July 2000 has access to both a fast machine and a fast
network, and is easily able to manipulate those tours of the galleries
which incorporate both main and balustrade walls - "wrap-around"
Virtual reality, as it were - but with the software switching on aqnd
off those sequences of reliefs not yet in sight, or out of sight behind
the user;
- USER FOUR has access to projection technologies which allow Borobudur
to be projected onto adjacent right-angled screens and, with a computer
and software sensitive to the movement of the user's head-up display,
turns on not just the highest level of quality for the vector
structure, but also the full 1.6 megapixel figured reliefs for display
within the VRML model. The same user views main and balustrade reliefs
at the same time, and these are visible on the VRML model all the time
to cater for rapid movements of the head-up display.