W.D.Young 3 July 1987
Supervisor: Peter Cheung
Since the early days of computing, engineers have been striving to achieve more
meaningful output from their machines than just numerical results. To this end, the field of
computer graphics has developed over the years into a rapidly expanding industry
encompassing Computer Aided Design, Video Paintbox and Computer Animation to name
just a few.
Until recently, the cost of such ‘high end’ systems was prohibitively expensive
for a single user and could only be afforded by large companies equipping their research
labs. Recent developments in VLSI technology have resulted in cheap memory (about 10
pence per kilobyte - 1987 prices*) and powerful single chip 16 bit microprocessors. This in
turn has reduced the cost of computers in general and made sophisticated computer
graphics available to anyone owning a personal computer.
This report describes the design
and implementation of such a system for producing 3D perspective projected images in
‘real time’. It concludes with an evaluation and comparison with a contemporary C.A.D.
workstation.
*Memory prices at the end of 1997 were around 0.1 pence per kilobyte
This section starts with a history of the project which describes how I became interested in the topic of computer graphics. If then discusses the methods involved in producing realistic ‘3D’ images and the associated problems and concludes with a more detailed description of the objectives of the project.
In the Autumn of 1985 I became interested in the creation by computer of mathematical
objects known as Fractals which mimic the complex structures of natural objects such as
trees. One particularly effective way of generating three dimensional Fractals is to randomly
displace all the vertices of a large array of joined triangles, akin to crumpling up a piece of
paper then unravelling it so that its shape is still roughly rectangular. In this way it is
possible to generate highly realistic landscapes and even whole planets by mapping the
crumpled plane on to the surface of a sphere. However, to view the object the three
dimensional model held in the computer memory must be rotated and translated in three
dimensional space to obtain the appropriate viewing position and then a 3D to 2D
projection mapping performed on every vertex in the model in order to map the object into
the 2D co-ordinate space of the CRT screen or other output device. If the projected image
is to look realistic, the object must also be shaded as if lit by some form of lightsource,
requiring more calculations.
Working on an Orion 32 bit minicomputer, it took up to an hour to generate an image of a
landscape containing over 130,000 triangles. This length of time was unacceptable and was
only performed a few times.
It was more usual to generate a lower resolution image of 8,192 or 32,768 triangles with a
proportionate increase in speed. However, this also resulted in a corresponding decrease in
realism and a compromise had to be made between this and the time spent generating the
image. This was frustrating because quite often this resulted in an image quality that was
unacceptable from an aesthetic point of view. As will be shown in the next subsection, the
mathematics involved consists mostly of multiplications and additions and this led me to
believe that a solution to the problem might be to implement the graphics algorithms in
hardware.
The basic process of image generation follows three steps:
Additions can be made to these steps to increase the realism of the final image. Examples include hidden line/surface removal so the object appears solid and colouration which includes shading and shadows. The original software that I wrote to create landscapes and planets incorporated all of the above features and I shall include them in the following discussion for completeness. However, due to the more complex nature of hidden surface removal and shading and the limited time available these features were not included in the final design.
There are two basic operations fundamental to this process - rotation to orient the object in the right way and translation to place the object where it can be viewed. The following equations are for the two dimensional case for simplicity of demonstration but are easily applied to three dimensions.
pn = po.R
Where R is the matrix
cosQ , -sinQ
sinQ , -cosQ
Expanding obtains :
xn = xo.cosQ - yo.sinQ
yn = xo.sinQ - yo.cosQ
Work required : 4 multiplications and 2 additions.
Note : The sines and cosines are not included in the work required since they are done once at the start of the calculations and do not change until a different view of the object is required.
In practice, the four multiplications and two additions per vertex are all that are required per rotation axis since one co-ordinate will always remain the same. It is possible to construct a 3x3 rotation matrix encompassing all the three rotations about the x, y and z axes in which case it would take 9 multiplications and 6 additions which is less than the 12 multiplications and 6 additions it takes to do 3 consecutive single axis rotations. However, generation of the 3x3 rotation matrix requires 16 multiplicatinos and 4 additions and although these are only performed once, it increases the complexity of the problem. Also, it is usual to rotate about two axes since almost any object orientation can be obtained though a combination of two single axis rotations. Thus the generality of using a 3x3 rotation matrix was not deemed necessary in this application.
pn = po + t
Where t is the translation vector [tx , ty]
Expanding for the 3 dimensional case :
xn = xo + tx
yn = yo + ty
zn = zo + tz
Work required : 3 additions.
This is the simplest operation performed during the production of a three dimensional image.
The type of projection chosen was simple perspective projection. This gives a very realistic effect and is very easy to implement - see diagram below.
xs = k.x
ys = k.y
Where k = d / z
Work required : 2 multiplications and 1 division.
Although this operation is simple, it is computationally intensive because of the division required.
This operation is usually performed by the output device itself. Modern graphics terminals are intelligent enough that they already have the necessary graphics ‘primitives’ such as line drawing and area filling built into them. If they do not, the host computer sometimes has the graphic primitive routines available to the programmer. I am assuming that either of the above cases applies so I will not cover the details of this process. The primitives are not as trivial as they sound as the mapping must, amongst other things, make sure that only those parts of a line that are visible are mapped on to the output device. This is known as clipping.
Hidden surfaces can be effectively removed by two methods which when used in combination are successful in most situations*. The first method involves testing to see if the triangle / polygon is facing away from the observer and if so it is assumed not visible so is not drawn (this also saves work if done early on in the calculations). The second method simply re-organises the polygons so the ones farthest from the observer are drawn first thus they are automatically removed from the image by any polygons closer to the observer appearing in the same area of the image.
If the surfaces of the objects are assumed to be dull, then diffuse reflection of the light from
a light source will occur. This is much easier to simulate than specular reflection which
occurs when the objects are shiny, since in this case multiple reflections can occur greatly
increasing the computational effort involved (the tracing of multiple reflections and
refractions is a technique known as ray tracing and results in the most realistic images since
it mimics the action of light. It is also the most computationally intensive method of
generating images). So, assuming diffuse reflection, it can be shown that the brightness of
an object is approximately proportional to the cosine of the angle of incedence of the light
source vector as measured from the normal to the surface of the object. In addition, this
brightness is independent of the viewers position, so no account of the viewing position
need be made.
Of the three processes outlined above, checking visibility, ordering the polygons and
shading the polygons, the second is probably the most time consuming unless the data
structure of the polygons is already ordered. Checking visibility and shading are remarkably
similar processes since they both use vector algebra in the calculations. The generation of
the cosines necessary in these two processes comes directly from manipulation of vector
dot product equations an involves multiplications, additions and square roots.
* There is a third method that was not discussed in the original report but which is in fact the most commonly used method today - Z buffering. This effectively stores the distance of every pixel on the screen from the observer. When an object is drawn, the distance of its pixels are compared to the distance of the pixels already on the screen. If the pixels of the new object are closer to the viewer than the old ones, they are over-written by the new ones and the Z buffer updated with the distances of the new pixels. This method also allows other effects such as ‘alpha’ blending of semi-transparent objects and ‘depth fogging’ to be realised very easily. Z buffering is common now because of cheap memory removing the cost penalty of maintaining the depth information for every pixel.
n = A x B
cosQ = n.v / nv
Work required:
i) Cross product : 6 multiplications and 3 additions.
ii) Cosine : 9 multiplications, 6 additions, 1 divide and 1 square root.
Note : These calculations are per polygon, not per vertex so to obtain an approximate value for the work required per vertex, the above figures should be divided by the number of vertices in the polygon.
| Operation | Multiplication | Addition | Division | Square Root |
|---|---|---|---|---|
| Rotation | 81 | 41 | - | - |
| Translation | - | 3 | - | - |
| Projection | 2 | - | 1 | - |
| Visibility Check | 62 | 32 | 12 | 0.32 |
| Shading | 62 | 32 | 12 | 0.32 |
| Totals | 22 | 13 | 3 | 0.6 |
1 Assuming two single axis rotations
2 These operations are done per polygon and have been adjusted assuming triangles were used.
Table 1 above summarises all of the operations needed to manipulate and perspective project a three dimensional object. The calculations required only become complicated when visibility checking and shading is performed. To simplify the project I have limited my attention to rotation, translation and perspective projection, eliminating the need for generating square roots which are the main computational effort involved when performing the other two operations. This leads me to state the brief for my project:
Brief : To construct hardware to allow the manipulation and projection of ‘wire frame’ 3D objects in ‘real time’.
Before I investigate possible solutions I will first define a minimum performance figure. If the hardware is to be able to display images in ‘real time’ it must be capable of refreshing the image at the video frame rate which in the U.K. is 25 frames per second (the video is interlaced so the field rate is 50Hz). This puts an upper limit of 40ms to do all the calculations for one image.
Given a minimum object complexity of say 100 vertices, then the calculation time per vertex is: 40ms / 100 = 400ms per vertex If implemented by a microprocessor of some description, 75% of this time might be taken up in software ‘house-keeping’. This reduces the time allowed to 100ms per vertex. Thus a minimum performance can be specified : The hardware should be able to perform at least 10 multiplications, 7 additions and 1 division within 100msThis section deals with the choice of the type of hardware to be used to implement the 3D Graphics Engine as it will now be called. Four ideas were considered.
A summary of the merits and the disadvantages of each of these solutions is given below. I considered the performance, flexibility, cost and complexity of each solution before making a decision on the form that the Graphics Engine would take. In all the designs it was assumed that the drawing would be done by the host or by a special graphics chip capable of performing drawing operations. My final choice is given at the end of this section.
This design utilises the 68000’s ability to do a multiplication or division in a single instruction in order to do the calculations required.
This would be the ideal solution to the problem since the task of performing all the calculations could easily be divided up amongst the processors due to the unrelated nature of the operations on each vertex. Thus, as many Transputers as needed could be added to the design to obtain the performance required.
These microprocessors are usually used to perform operations such as filtration and Fast Fourier Transforms on linear streams of data. Performing 3D graphics calculations would be a rather novel use for one. Their main attraction is the high speed at which they operate, performing one instruction every machine cycle.
The Digital Signal Processor solution offers the best compromise between performance, flexibility, cost and complexity. The other solutions all had too many limitations to be considered any further. Thus the only remaining choice to be made was which particular DSP to use. This decision was made easy by the fact that my supervisor for the project was already very familiar with the TMS32010 made by Texas Instruments. A decision was also made to use the Advance CRT Controller (ACRTC) made by Hitachi. This device is capable of drawing lines and curves and performing area fill operations at a peak speed of 2 million pixels per second. This chip was used to perform the drawing operations required and to provide the necessary video memory management and video control signals. The host computer for the 3D Graphics Engine was an IBM PC XT. This was needed to create the 3D object model to give to the Graphics Engine and to send viewing commands.
This section deals with the detailed hardware design of the 3D Graphics Engine. It starts with a description of the DSP and its support devices and then goes on to describe the ACRTC and its support devices. Both these subsections are split into detailed descriptions of each section of the support blocks for each processor. A block diagram for the design is shown below with the blocks discussed in each section divided up by a dotted line.
Apart from the PROM and RAM for the DSP there are three main sections; IBM Interface, Port RAM and ACRTC Interface. A detailed description of each section follows. However, to begin with I shall describe a few features of the overall design which are of importance.
One significant feature of the design is the extensive use of integrated circuits known as Erasable Programmable Logic Devices made by Intel. These were used to implement the ‘glue’ logic normally designed with SSI TTL logic gates. These devices have several ‘macrocells’ which each contain an AND (product) plane followed by an OR (sum) plane. Thus, any function of the inputs can be obtained. Also, the outputs can be configured in different ways to provide latched or non-latched outputs and output terms can be fed back into the AND plane making them extremely flexible. They use EPROM floating gate technology to allow them to be erased by Ultra Violet light and then reprogrammed. The main advantages offered by these devices are the significant reduction in chip count and the ease with which a design can be modified. Important when developing a design. Five EPLDs were used to implement all the ‘glue’ logic in the entire design apart from the clock generating circuit and then final video output circuitry. Four chips with 8 macrocells each and one with 16 macrocells were used. The number of EPLDs could have been reduced further by using more of the larger variety. However, the propagation delay of 55ns for these devices was too slow to justify their use. The smaller EPLDs have a propagation delay of less than 35ns.
Due to the limited address space of the TMS32010 of 4kwords, there is no room for large data storage and memory mapped I/O. To interface to other systems, the TMS32010 uses the IN and OUT instructions to access 8 I/O ports. The hardware access of these ports is done by placing the address of the port on A0:A2 and asserting DEN or WE whilst holding the memory enable line MEM inactive. In this way ports, program memory and program PROM can all be connected directly to the data bus without fear of contention. Ports were used to access the IBM Interface, Port RAM, ACRTC Interface and to select the BIO polling line input and the video frame buffer. A list of the ports used and their function is shown in Table 2.
| Port | R/W | Function |
|---|---|---|
| 0 | Read | ACRTC status register |
| Write | ACRTC address register | |
| 1 | Read | ACRTC control register read |
| Write | ACRTC control register write | |
| 2 | Read | Port RAM read |
| Write | Port RAM write | |
| 3 | Read | - |
| Write | Port RAM address register | |
| 4 | Read | - |
| Write | BIO select address | |
| 5 | Read | - |
| Write | Frame Select (also modifies BIO select) | |
| 6 | Read | IBM data read |
| Write | IBM data write | |
| 7 | Read | IBM control register read |
| Write | - |
To speed up the polling of peripheral devices, the DSP has a special input pin called BIO. The condition of this pin can be tested with the BIOZ (Branch on BIO zero) instruction. Thus, if the BIO signal is perceived as active high, a fast polling loop can be coded with a BIOZ instruction branching to itself. This was the method used to poll the IBM Interface for acknowledgement of data transfer and to detect the state of the video field sync signal generated by the ACRTC. The addresses for the BIO select port are shown in Table 3. When the address of the appropriate flag is written to port 4, the BIO line will follow the state of than signal.
| Port | Flag Input |
|---|---|
| 0 | IBMDA - IBM Data Available |
| 1 | DSPDA - DSP Data Available |
| 2 | VSYNC - Video Vertical Sync |
Since the IBM and the DSP buses are not synchronised, the data in each direction must be latched until the appropriate processor can access the information. This was achieved with tri-state D-type latches controlled by an EPLD (see circuit 3.1.1). Two ‘Data Available’ flags to act as handshake signals were generated using set-reset latches. When one of these lines is asserted, it signifies that new data has been clocked into the larches by the appropriate processor. The IBM data available signal IBMDA is set by the IBM writing a single byte into the MSB of the latch and is cleared by the DSP reading the contents of the latch (16 bit read of port 6). DSPDA is set by the DSP writing a 16 bit word to the latch and is cleared by the IBM when reading the MSB of the latch. Since the IBM can only perform 8 bit transfers, it must be ensured that 16 bit information is written and read in the order lo-byte then hi-byte in order for the interface to operate correctly.
The extra latch on the IBM address bus (accessed via port 7) was originally intended to be a control register or command register to enable the IBM to instruct the DSP how to interpret the data held in the data latch. This feature was never used. Instead, the commands were sent directly to the DSP via the data latch.
In order for the DSP and the ACRTC to communicate at high speed, their clocks must be synchronised such that the DSP clock CLKOUT is 180 out of phase with the ACRTC clock 2CLK. At the same time, the clock for the video shift registers DOTCK and the high speed timing clock 8MHZ must be synchronous with 2CLK. To achieve this I designed a clock generating circuit followed by a synchronisation circuit (see circuit 3.1.2). The clock generating circuit is a standard crystal clock circuit followed by a frequency doubler made from an XOR gate to generate 32MHz. The 32MHz signal is used to clock a synchronous counter that generates DOTCK, 8MHZ and 2CLK. The synchronisation between CLKOUT and 2CLK was achieved by XORing these two signals together to create a difference signal. This signal was used to start and stop the counter until CLKOUT and 2CLK were out of phase to within +/- half a cycle of the 32MHz clock (+/-15ns). Once locked, the two clocks will never get out of phase since they are derived from the same source so their frequencies are guaranteed to be the same.
This part of the D.S.P. circuitry was already designed since it was part of a teaching board for use with an MSc. course. The only difference is the use of an EPLD to perform the decoding for the program RAM and the bootstrap PROM (see circuit 3.1.3a). An equivenlant circuit of the decoding in the EPLD is shown in circuit 3.1.3b. The PROM must be located at address 0 since this is where the D.S.P. starts execution upon receipt of a hardware reset. The ports occupy addresses 0 to 7 (and repeated from 8 to 15) though there is never any bus contention since the memory enable signal AMN is never asserted when performing a port access. Program memory is decoded from address 32 onwards. Thus it is never possible to access the first 32 words of program memory.
Due to the limited address space of the TMS32010, Program RAM can not be used to store the vertices of the object description. Unless the object consists only of one or two hundred vertices, there will not be enough room for the object description to be co-resident with the program. This problem was solved by using RAM accessible via one of the ports. The address of the data to be accessed is loaded into the counters (see circuit 3.1.4) by writing to port 3. Subsequent reads and writes to port 2 perform reads and writes to the port RAM and automatically increment the address, port 2 thus acts as a stack. To auto-decrement the address, port 3 is written with an additional '1' in bit 14 of the data. Again, all the signals to control the counters and the RAMs are generated using an EPLD. The load signal LD is driven active whenever port 3 is accessed. The UP and DN clock signals are driven low whenever port 2 is accessed but since the counters have rising edge active inputs, the address to the RAM is not changed until the end of the access cycle. EPLD 2 also controls the BIO selection for the D.S.P.. This part of the EPLD consists of a partially decoded I-of-4 multiplexer to select which of IBMDA, DSPDA or VSYNC is to be connected to the BIO input.
This was the most difficult part of the D.S.P. circuitry to design. Usually, the A.C.R.T.C. can be connected directly to the controlling processors data bus since all the necessary control lines are present to facilitate this. However, the D.S.P.s memory cycle is only 250ns long. Since the A.C.R.T.C.s access cycle is almost twice this, it does not have enough time to latch data sent by the D.S.P. or to respond to a D.S.P. read cycle. To overcome this problem I used a bidirectional tri-state latch similar to that used for the IBM Interface (see circuit 3.1.5). It was guaranteed that the A.C.R.T.C. would respond instantly to reads and writes so the data available flags were not necessary. In addition to an EPLD, a set-reset latch was used to generate the CS (chip select) signal for the A.C.R.T.C. . This is asserted (the latch is reset) when the D.S.P. accesses the A.C.R.T.C.. It remains asserted until the A.C.R.T.C. has completed its bus transfer which is signalled by it asserting DTACK. This action sets the latch thus clearing the CS signal, readying the A.C.R.T.C. for another bus cycle. The EPLD is programmed such that when the A.C.R.T.C. is accessed, the WE signal and AO from the D.S.P. are latched to provide the A.C.R.T.C. with the R/W signal and register select signal RS which it requires. To write to the A.C.R.T.C. it is only necessary to perform one OUT instruction to port 0 or 1. However, to read the A.C.R.T.C. , two IN instructions must be performed. The first clocks the data into the latch, the second reads the correct data. The result of the first read will be the data from the previous access to the A.C.R.T.C. .
This section of the Graphics Engine was straight forward to design since all the video timing circuitry is on the A.C.R.T.C.. The only additional circuitry was some RAM for the frame buffer, and a shift register to convert the parallel data in the frame RAM to serial video data (see circuit 3.2.1). In addition, a small amount of SSI glue logic (two 14 pin packages) was required to provide the load signal for the shift register and to combine the composite sync with the video signal to create full composite video. EPLDs were not used here because of their relatively slow speed. However, an EPLD was used to generate the @ and OE signals for the frame RAMS. Also, since the address and data bus for the RAM is multiplexed, an address latch signal FADLA was generated (see timing diagram). The frame buffer actually consists of two identical sets of memory. This allows drawing and displaying from one of them whilst clearing the other. Thus, there is always a clear frame buffer to draw in. The action of displaying the data held in the RAM generates all possible memory addresses for the frame buffer. These addresses are used to access the buffer currently being cleared and 'O' is written into the RAM by holding write enable WE active whilst holding 'O' on all the data lines. The source of the data bus for each buffer is governed by a bidirectional multiplexer constructed from CMOS analogue switches. The data lines are either connected to the A.C.R.T.C. data bus for normal operation, or are grounded when the RAM is being cleared. Ideally there would be three frame buffers enabling simultaneous drawing, displaying and clearing without conflict. However, enough performance can be obtained from the two buffer design and the extra cost and complexity of three frame buffers discounted its use. At the end of every display cycle, the data from the appropriate frame buffer is loaded into the shift register. This data is then clocked out at the video pixel rate of 16 MHz governed by DOTCK. This video is then combined with the composite sync signal from the A.C.R.T.C. using a very simple resistive summer driven by two open-collector buffers.
The controlling software for the project falls roughly into two halves; programming of the IBM PC and programming of the TMS32010. I shall start with the programming of the IBM and then move on to cover the TMS32010 which includes the programming of the A.C.R.T.C. control registers.
The programs for the IBM were written in Pascal. The three programs hexread, pntconv and model are shown in listings 1, 2 and 3. The program hexread takes as input a hex file generated by the TMS32010 cross assembler and produces a prg file. The format for the prg file is a single column of 4 digit hexadecimal numbers which are in groups of two; address then data. The last two numbers govern the starting address for the program. The first word, which is normally the address, is set to '0000'. When the prg file is downloaded to the D.S.P., it recognises this as signifying that the following word is the address at which to start execution of the program and branches to this address.
The data for the 3D object description are held in a .pnt file an example of which is given below. This data is for a simple cube 400 units on a side centred on the origin. The first number in each row is a 'point type'. This governs what will be done with the data when it is sent to the D.S.P.. A 'O' indicates that the A.C.R.T.C. should be instructed by the D.S.P. to move the graphics cursor to that point when projected on the screen. If the point type is '1' a line is drawn from the current graphics cursor position to the new calculated position. The end of the object description is signified by a point type of '2'. The three numbers after the point type are the three dimensional co-ordinates of the vertex (x, y, z).
| 0 | 200 | -200 | 200 |
| 1 | 200 | 200 | 200 |
| 1 | -200 | 200 | 200 |
| 1 | -200 | -200 | 200 |
| 1 | 200 | -200 | 200 |
| 1 | 200 | -200 | -200 |
| 1 | 200 | 200 | -200 |
| 1 | -200 | 200 | -200 |
| 1 | -200 | -200 | -200 |
| 1 | 200 | -200 | -200 |
| 0 | -200 | -200 | -200 |
| 1 | -200 | -200 | 200 |
| 0 | -200 | 200 | -200 |
| 1 | -200 | 200 | 200 |
| 0 | 200 | 200 | -200 |
| 1 | 200 | 200 | 200 |
| 2 | 0 | 0 | 0 |
This format is converted to a dat file consisting of a column of hexadecimal numbers similar to a prg file by the program pntconv. The first number in the dat file is the start address in the port RAM where the data are to be stored.
The main program, called mode1, is shown in listing 3. The execution of this program follows roughly three steps. The first of these is the downloading of the program to be run on the D.S.P.. This must be done first since initially the D.S.P. is only running the bootstrap program held in the PROM. In this application the program that is downloaded is the 3D graphics program '3d-draw.prg'. The next task is to prompt the user for the name of the object description file to be downloaded. This file contains all the co-ordinates of the vertices in the object to be manipulated. The file must be of the dat type that is created by the program pntconv. The remaining task is to prompt the user for viewing parameters and to send them to the D.S.P. to instruct it to draw an image of the object that was downloaded. The parameters are prompted and sent in the following order:
| 1) theta | X axis rotation. |
| 2) phi | Y axis rotation. |
| 3) tx | X translation |
| 4) ty | Y translation |
| 5) tz | Z translation |
| 6) scrdis | Screens distance from the observer. |
The programming of the D.S.P. was the main software task of the project. It was aided by the use of a cross assembler and simulator. The simulator was invaluable when debugging the software since it gave a real insight into the detailed operation of the program. The first program written for the D.S.P. was the bootstrap program that was blown onto PROM. The operation of this program is very simple. It waits until the IBM sends some data (Indicated by the assertion of the IBMDA signal) and treats this as being an address for the next word sent by the IBM. The D.S.P. then writes the data into the appropriate address and waits for another address-data pair. This action is terminated by the issuing of a '0000' address by the IBM. The D.S.P. then interprets the next word as the start address of the downloaded program and branches to this address thus starting execution of this program. A listing of the bootstrap program wboot is shown in listing4. The program 3d-draw is shown in listing 5. This is the main program written for the D.S.P.. It performs the necessary calculations for 3D image generation and controls the A.C.R.T.C.. The program has two stages; an initialisation phase and a command processing stage.
The initialisation is performed once the first time the program is run and includes the setting up of all the constants used by the D.S.P. in the 3D calculations and the initialisation of the A.C.R.T.C. control registers.
The second phase, that of waiting for and then processing commands is looped to allow new objects to be downloaded and new views to be calculated. A command is a single word sent to the TMS32010 by the IBM. When downloading an object (Command 2) the D.S.P. expects the first word sent to be the start address in the port RAM. All subsequent words are written to the port RAM until a '2' is received as the point type which signals the end of the object description. The D.S.P. will then return to the command processing phase.
Command 3 sent to the D.S.P. instructs it to perform a new viewing operation. First, the new viewing parameters are loaded in the order given in the IBM programming section. Then the D.S.P. performs the 3D manipulation and 2D projection on the data stored in the port RAM and sends the calculated screen co-ordinates to the A.C.R.T.C. to be displayed.
In order to generate video and to specify the operation mode, over 40 registers in the A.C.R.T.C. must be programmed. They are grouped into Timing Control Registers and Display Control Registers. The Timing Control Registers govern the exact nature of the video produced, specifying horizontal and vertical sync periods for example. The Display Control Registers govern the form the display will take, for example how many windows are displayed. The values and significance of the Timing Control Registers is shown in figure 4.2.
The project was finished on time and performed as expected. The two wire-wrap boards on which the Graphics Engine was constructed are shown in photographs 1 and 2. Photograph 3 shows both boards in relation to the IBM PC (with cover removed). It is intended that the second board will eventually sit 'piggyback' on the main board.
There are still two hardware bugs in the frame buffer circuitry. One bit of one of the frame buffers is stuck at zero. This has the effect of producing thin black vertical lines on the screen when that frame buffer is selected. This is a trivial fault and could be fixed given some time to find which particular bit is in error.
A potentially more serious bug has the symptoms of leaving 'trash' around diagonal lines whose angle on the screen is less than 45 degrees. This is caused by data read from the RAMs being corrupted in the read-modify-write drawing cycle. Words read and written once only are unaffected. Hence, vertical lines and diagonal lines over 45 degrees do not suffer from this problem. This bug will take much longer to fix because I am still not entirely sure of the cause. However, despite these faults the board performs extremely well and far exceeds the performace of a commercial workstation a as shall be shown in the next sub- section.
The workstation compared was a VAXStation II/GPX with an XTerminal graphics screen attached. On this machine there is graphics hardware to allow high speed drawing to be performed. Photograph 4 shows an image of Imperial College created using this workstation. This image consists of over 200 vertices and was used to perform a qualitative comparison between the two machines. A similar view of the college created by the Graphics Engine is shown in photograph 5. The image created by the Graphics Engine lacks the aesthetic appearance of the multiple window system on the workstation but it would be a simple matter to program this facility.
A quantitative comparison of performance figures for calculation and drawing is given below. The figures for the liVAX were obtained by performing seperate tests which involved calculations or drawing only. A precise figure for drawing speed could not be obtained because the drawing speed seemed to be dependent on the length of the lines, the number of lines and other factors.
| Machine | Calculations per second | Drawing speed (pixels/second) | |
| uVAX | 7000 | 300k - 900k | |
| DSP/ACRTC | 28000 | 500k |
When manipulating simple objects, the GVAX is slightly quicker due to the higher drawing speed. However, for any object with a few hundred vertices, the DSP/ACTRC Graphics Engine is noticeably much quicker due to the higher processing speed.
A number of improvements could be made to the design to increase performance further. Some of these are given below.
This project has shown that high-end workstation-like graphics can now be produced on a personal computer at a fraction of the cost. Whereas a workstation costs many thousands of pounds, the Graphics Engine could probably be produced for less than 1000 pounds (1700 USD). Indeed, Texas Instruments have recently released the TMS34010 Graphics System Processor (or G.S.P.) which combines a D.S.P. and A.C.R.T.C. on a single chip. Tectronix have released a plug in board for the IBM PC using this chip which retails at about E1400. Thus the age of cheap and powerful graphics is upon us and it may not be long before this relatively new technology becomes a standard feature on the personal computers of the future.
© 2008 Wayne Young. All rights reserved.
Maintenance of these pages is attempted by
webmaster@ganimede.demon.co.uk