The Virtual Reality Modelling Language (VRML) and Quality

Speed of change in computer hardware

It is a characteristic of developments in computer hardware that their speed can be so great that the "shelf life" of any software product can be very limited. This is in itself an innovation to which we must become accustomed if the fruit of work such as the Borobudur Project is to have a reasonable life-span. But it is a characteristic only of those realms touched by advanced technologies. Compare the motor industry: a prized Land Rover Series I (made in 1956) has more-or-less the same functionality as current models, incorporates greater robustness, is still going strong, and finding spare parts is no problem, because of the forward compatibility built into the development cycle. There are plentiful arguments why this could not work in the computer industry; but few people would wish to be categorical about if (let alone when) some kind of stability (let alone statis) might be reached which would make the following paragraphs unnecessary.

Developments in functionality are especially acute in areas of computing that rely for their sophistication on devices where price/performance is improving rapidly almost from month to month. Thus VRML, which depends for its efficiency and reach on memory, disk storage, and network speed, could not even have been invented in the days when 64kb was a lot of memory for a minicomputer to possess. Any more-or-less exacting description of the real world, such as VRML by its very name suggests it can deliver, requires substantial resources.

Why VRML is so expensive in resources

VRML promises to construct a replica of the world (indeed, its files have the suffix ".wrl" and are called'worlds") by the careful description of 3D forms, their surface characteristics, and their relative positions in space. Specific details of just how this are done are available in any VRML programming manual, and the following account is no more than a brief sketch addressed to non-specialists.

It is a characteristic of the world that we see with our eyes and interpret with our brains (neither touch nor smell are yet part of bthe VRML protocols!) that it is detailed (i.e. susceptible to microscopic examinatgion), and subtly modulated by colour, texture, lighting (natural and artificial). With outdoor scenes, as atmospheric conditions change, then so do colouring, texture and lighting. As we move around our physical world, eyes and brain take in ever-new and ever-changing fields of view on objects and perspectives, and adjust to them in various ways.

All these factors can indeed be taken into account in constructing an artificial world - consisting of one or more .wrl files - in VRML. If the "building blocks" of the real world are three-dimensional molecules, then those of the VRML world - the smallest discrete elements that VRML can describe - are two-dimensional polygons, which can be coloured and textured. In its turn, any polygon is constructed from lines of a certain length, thickness and direction, called vectors. Any 3D object is made up of a number of such polygons, so arranged according to the rules of perspective to give the appearance of depth and solidity. The more complicated the shape, the more polygons are required: a young child's building brick, viewed from any angle, requires no more than three polygons; whereas the same child's plastic ball, a honeycomb so that small hands may grip it, would require several hundred. Nor can we emulate set designers working in proscenium-arch theatres, and cut corners by not painting the backs of our sets, or bulking them out into three dimensions because we know the theatre audience cannot get behind the sets, which are in any case painted and arranged to appear suitably three-dimensional from the fixed angles restricted by the theatre seating. The VRML user is a free agent, so everything must be three-dimensional, capable of being manipulated in space - i.e. visited - by the user.

The more complicated the objects-in-space are, the more work the computer has to do in order to present them correctly in the window of the web browser for the user. The world is more complicated than a child's building brick, or even a child's ball. So how would VRML get on with Cezanne's bowls of fruit, or with Monet's haystacks? "Fixed" on canvas in a particular perspective, either would require thousands if the result were faithfully to represent what was seen. Now imagine those changes of light which intrigued both artists, a change of viewpoint, or even one of the pieces of fruit falling to the floor, or parts of the stacks blowing away in a high wind.

Film can represent with ease any rapid actions, such a falling apple, by photographing many frames per second (25 or 30, up to several thousand if necessary). VRML is computer-intensive because it must represent every change of position, angle, texture and lighting as any object changes its position - and each time the user manipulates the viewpoint represented by the "window" of the web browser, every single object seen changes position relative to every other object, with some now being partly concealed, and others now becoming visible for the first time. Compare the distinction between a digital still photograph and digital video: for the former, the computer has only to load and display one image; whereas for the latter, 25 must be loaded and displayed every second (in PAL - Phase Alternate Line, the British/Australian encoding for TV and hence video). If VRML is to get anywhere near the fluency of 25 feet-per-second video, remember that it must do all the work that digital video requires - but also calculate the relative position, orientation and texturing of every polygon for every slightest change (the equivalent of video "frames") it sends to the viewer. Even without calculating the exact number of polygons required, it may intuitively be assumed that the VRML user who knocks the whole of Cezanne's bowl of fruit off the table is in for a substantial wait as the program computes what each does - and what the viewer sees - as they fall to the floor, bounce, and eventually roll to rest.

This is not a case of our waiting around for software which will somehow "conquer" the problem. It will not. There is no doubt that the old adage applies here, namely that "if actions can be adequately described, then the computer can be programmed to execute them". VRML offers a syntax for defining the world as Virtual Reality (which is not to say that updates and competitors to VRML will not be faster and easier to program). But the real questions are ones simply of speed for the computer and convenience for the user: at what processing cost may the world we see be represented? With what time delay? And with what illusion of reality? Are we happy with it being "virtually" (in the slang sense) like our world, or do we demand Virtual Reality in the sense of something we cannot tell from the Real Thing? That is, a kind of Turing Test for computer graphics, an update to Turing's experiment to see whether human beings could tell whether they were conversing with a hidden person or an equally hidden computer.

VRML pushes current machines to their limit

As machines get relatively cheaper and more powerful, and as networks get faster aqnd connectivity more effortless (thanks largely to web browsers and their plugins), we might imagine the Nirvana of a perfect operating environment to be within our grasp. Nevertheless, the opportunities presented by VRML programs for the conjuring of Virtual Reality still remain restricted not by the flexibility of the program and its protocols, but by the limits of what current machines and networks can provide. Even a moderately "rich" VRML environment will display slowly and jerkily on one of today's moderately powerful machines - say, a 233Mhtz Pentium with 64kb of memory and a 10mbs ethernet connection.

Hence a dilemma for the producers of a VRML environment such as Borobudur: what target are they trying to hit? The makers of the 1956 Land Rover had no such problems: the aimed at a robust, long-lasting product, and were successful. But are we, for the Borobudur Project, designing a product that works well on the Pentium 233 noted above? Or are we aiming at a product that will survive the test offered by the recent Pentium 500Mhtz, or my the year-after-next's Pentium 2000Mhtz machines?

Divide and Rule: Building "Elasticity" into the Borobudur Project

The answer is easy to state but difficult to implement. Of course Borobudur should run on today's moderately powerful machines; but we should also contrive a setup which will allow it to perform respectably and interestingly on more powerful machines next year or the year after that. Readers who are un convinced that there is indeed a dilemma here should look at a few CDROMs (perhaps of encyclopaedias) from the mid-1990s: not only are the data frequently restricted in scope, and any hypermedia objects and links crude; but the interface will probably appear quaint and old-fashioned. This is a function of the pace of change in hypermedia, all the more startling when compared with - say - the mono- or duo-media book: the Renaissance productions of Aldus Manutius have lasted their 500 years well, their difference from modern productions being merely style, and neither functionality nor reach.

The prescription for the "survivability" of Borobudur as VRML must therefore be flexibility. It must work well enough today, but also tomorrow and next year. development has of course been on a fast Pentium 400 machine, but with testing across the (nominally) 10Mbs ethernet on a Pentium 122Mhtz (decidedly the machine for the rest of us).

There are several salient features of Borobudur which begin to suggest solutions for how to tackle the "elasticity" techniques:

  1. The stupa is large, complicated and intricate, but just about regular in its construction (it doesn't have true biaxial symmetry, but not far from it). This meant it was imperative to divide the stupa gallery by gallery and section by section, since ordinary computers cannot for the forseeable future handle the whole stupa as one file. The stupa's regularity makes such division easily understandable - hence the division of each of the four galleries into four quarters, each with a main wall and a balustrade wall individually displayed (i.e. 32 sections), plus stupa terrace and basement (another 8 sections), making 40 sections in all;
  2. The host of Buddha statues, those in niches on each gallery, and within those the pierced stupas on the circular terrace are each members of a "family" of such similarity that the same bitmap could conceivably be used time and again, changed only for the terrace Buddhas where the gesture depends on the statue's N-S-E-W orientation. Much the same applies to the other 3D elements - namely the Makara heads, the other various beasts, and the doorways;
  3. All the myriad relief panels are rectangular, except where some of the component blocks have been lost. Equally important, sizes and proportions (although different from gallery to gallery) are consistent within each gallery, whether on the main wall or on the balustrade wall. Since all the reliefs are photographs (bitmaps - where every picture element or dot on the computer monitor has a separate value), this means that slotting them into the structure is easy, because no calculations have to be performed for the task.
  4. There are large quantities of 2D decorative panels interspersed with the figured reliefs. Each one in a sequence is often subtly different from its immediate neighbours (reflecting, in part, their carving by different teams). So little would necessarily be lost from the impact of the work (and much processing would be saved) if "typical" panels were inserted as bitmaps where required;
  5. The profile of Borobudur, from wherever it is viewed, is very sophisticated - Buddha niches, Makara heads, exquisite balustrades, and complicated frameworks for the reliefs. This, the structure, must be constructed from vector graphics - from the polygons discussed above. So the more sophisticated (read: "accurate") one wants to make the structure, the more polygons and hence vectors are required.

    From these five features the following conclusions can be reached:

    1. The stupa must in any case be "chopped up" for processing and hence for viewing, because it is too big to deal with as just one file;
    2. 3D statues, because they are "types", may be represented as by one of their number, thus avoiding an exact relationship with reality and saving on processing and display time;
    3. The figured reliefs slot in as easily at one resolution as at another - so we can vary the dimensions of the reliefs delivered to accord with the network and the user's machine;
    4. The sophisticated profile of Borobudur must be in vectors (- the skeleton on which the "flesh" of the bitmaps is hung) and will consume enormous processing resources if we attempt to represent it with complete accuracy.
    - and these lead in their turn to our "elastic" solution, which in essence allows the product to grow in complication to satisfy the appetites of more powerful machines:

    We may therefore picture four "specimen" types of users, two at the publication date of the Borobudur Project (May 1999), and two a year or more thence: