Urgent Fury     Breaking the 3D Logjam

The Portable Data Base

The implementation of the portable data base in AutoCAD Release 9 finally completed the unification of the product across all machine architectures. The development notes describing this project are an example of the developer documentation that accompanied code submissions in the period.

The Portable Data Base

AutoCAD databases are now portable between operating systems and machine architectures. This allows efficient use of networks containing both personal computers and 32 bit workstations.

by John Walker
February 3rd, 1987

It was a dark and stormy night. The trees swayed in the wind, and the rain beat upon and streamed in rivulets down the dark window pane illuminated only by the cold light of a Luxo lamp, the flickering of a Sun 3 monitor, and the feeble green glow of a programmer debugging too long.

When the doorbell rang, I almost welcomed the interruption from the task in which I was engaged: fourteen subroutines deep in DBXTOOL, on the trail of a stack smasher which not only obliterated AutoCAD, but wiped the information the debugger needed to find where the error occurred. I glanced at the clock and noticed that it was 3:30. Since it was dark outside, it must be 3:30 in the morning. Only a very few people show up at the door at 3:30 on a Sunday morning.

Let's see: the stereo isn't on and no recent revelations have called for celebratory reports from the carbide cannon or the .45, so it's probably not the neighbors or the cops. That narrows the field considerably. I fully expected to open the door to see Kelvin Throop, as always slightly distracted, somewhat overweight, his face looking like it had been slept in, but sparkling with anarchic and subversive ideas.

With the usual irritation mingled with expectation, I opened the door and discovered I was looking at the neck of my early-morning caller. I looked up, and saw a face I had not seen for almost twenty years. It was a face free of pain and fear and guilt. John Galt had come to call in the middle of the night.

“Galt”, I said, “I haven't seen you since, when was it, 1967? That's right, December 1967 it was. We were walking down the railroad tracks in Cleveland; the snow was a foot deep on the ground, the sky was grey and the only warmth was the switchbox heaters at every set of points. Yes, it all comes back now. I remember you saying it was all over and you were going to drop out, and me saying things were just about to turn around. And I remember turning around and walking back to study for the physics exam and seeing you disappear into the snowy distance. Hey, come on in, have a Pepsi, tell me what you've been up to.”

Galt walked in the door, put down his paper bag and, as always, strode to the refrigerator and opened the door. He poured a tall Pepsi and made a peanut butter, turkey, swiss cheese, and onion sandwich, polished both off, and then turned to me and spoke.

“As usual, you've got it all wrong. It wasn't December 1967, it was November—November 8. The first Saturn V launch was scheduled for the next morning, and you were bubbling over about how the final triumph of technology would turn around a disintegrating society. I said I'd had it with this decadent, exploitive culture, and I was no longer going to allow my mind to be enslaved by the looters. I tried to convince you to join me. But your time had not yet come. So I moved on to convince others, and to work on my speech.”

“Hey, I remember that speech. How's it come since that draft I read back in '67.”

“Pretty well. I'm up to 560 pages now, and there's no filler in there. I'm adding a refutation of the epistemology of Kant cast in terms of Maxwell's equations, and that will probably stretch it a tad.”

“Don't you think that's a bit long?”

“Well, with the attention span of this society down to less than 30 seconds, some of the induction steps may get lost in the shuffle, but it's full of great sound bites and should play on the news for days.”

“When 'ya gonna cut loose with it?”

“When the collapse of this decadent society due to its disdain for the products of the mind, and the consequent disappearance and exodus of the creators becomes self-evident.”

“Hey, Galt, lighten up! When I last saw you the cities were in flames, the US was losing a hopeless war, the stock market had just crashed, the gold standard was being abandoned, three astronauts had died in a fire, the SST was facing cancellation, and the ABM was being negotiated away. Look at what you've walked out on! We have peace and prosperity, business is booming, and basic science and technology have flowered in directions unimaginable by the world in which we last spoke.”

Galt walked into the computer room. He looked at the PC/AT linking AutoCAD. He looked at the Sun monitor, which was showing a full compilation of AutoCAD in one window, a completed execution of the regression test in another, and the debugger in a third. He walked over to my bookcase and pulled out my copy of the Dow Jones Averages chartbook from 1885 to the present. Moving in that eerie way he always did, in one motion he pulled the book from the shelf, opened it, and spread it in exactly the open space between the keyboards of the Sun and the IBM. For a full ten minutes Galt was silent as he turned the pages from 1968 through 1986. It appeared to me that the man had been out of circulation for a long time. I watched his face carefully to see if it registered surprise as he hit 1985 and 1986, but as ever those stony features remained unmoved. Galt closed the book, replaced it on the shelf, sat down on the chair in front of the AT, and turned to me. “Just wait,” he said.

“So, enough about me”, Galt continued, “what are you doing?”

“Well”, I said, “where to begin? In '68 I…”

“Oh come off it,” Galt interrupted, “I have my sources, after all. I mean what are you working on now.”

Sheepishly, I continued.

Background

When we ported AutoCAD to non-MS-DOS systems, we were faced with numerous compatibility issues. Although all systems use the ASCII code, compatibility stops about there. Various systems have adopted different conventions for end of line and end of file detection; they store multiple byte binary values in different orders in memory, require different physical alignment of values on byte boundaries, and even use different floating point formats.

These issues make it very difficult for systems to interchange binary files. The only reasonable approach is to define a portable format, hopefully close to the middle point between the systems, then require every system to convert that format to and from its own computational requirements.

Our existing (2.5 and 2.6) AutoCAD releases do not allow interchanging binary files among major machine types (current major machine types are MS-DOS, Apollo, IBM RT PC, Sun, and Vax). To move data between systems, one must convert it to ASCII form, possibly translate the ASCII file due to end of line conventions, then load the file onto the other system and convert it back to binary. For drawing databases, this means one must DXFOUT on the sending system and DXFIN on the receiving system.

Given the difficulties in physically moving files between systems, the small market initially anticipated for non-MS-DOS AutoCADs, and the major work needed to make binary files portable, we chose not to address this problem previously. Sales to date of non-MS-DOS machines indicate that this decision was correct.

The advent of high speed networks and file sharing protocols such as Apollo's Domain, DEC's Decnet/Vaxmate, and Sun's NFS have begun to erode the justification for this decision. Many AutoCAD users, particularly in larger companies, have inquired about configurations involving a file server, one or more 32 bit workstations, and a number of MS-DOS machines, all on a common network. Such a configuration economically provides large central storage, high performance when needed, and very low cost individual workstations for routine work. The usefulness of such an installation is drastically reduced if every transfer of a drawing from a PC to a 32 bit workstation requires a DXFOUT and DXFIN, as these are lengthy operations which consume a large amount of disc space and network bandwidth. As we increase our sales efforts in large accounts, a competent solution to the issues raised by heterogeneous networks will be a major point of distinction which can distance us from the competition.

The first step toward a compatible database was taken when Bob Elman redesigned the entity database code in release 2.5. Galt broke in, “The Bob Elman”. “Yes”, I responded, and showed him the listing of EREAD.C. He shook his head and said, “That's Bob”. Bob's code resolved all issues of byte ordering and alignment in the entity data portion of the database, and did it in a particularly efficient way that takes advantage of the properties of the host machine's architecture. Entities are written with no pad bytes and Intel byte ordering. Thus MS-DOS machines, the overwhelming segment of our market, pay no speed or space penalty. Bob's code did not address machines with non-IEEE floating point (the VAX is the only exemplar of this class).

Providing drawing database compatibility between machines, then, is primarily an issue of fixing the drawing header record (MASTREC), the symbol tables (SMIO), and the headers on the entities themselves, plus resolving the issue of differing floating point formats. In addition, the other binary files that AutoCAD uses, such as DXB files and compiled font and shape definitions should be made compatible. The work described herein defines canonical forms for these files, implements a general package for system-independent binary I/O, and uses it to make AutoCAD drawing databases and the other aforementioned binary files interchangeable. The code has currently been installed and tested on MS-DOS and Sun systems, which may now share files in an NFS environment. The work needed to port it to the Apollo and RT PC should be minor. A VAX version will require certification of the code to interconvert VAX and IEEE floating point formats.

Galt interrupted, “So what you're saying is that before, if you hooked big ones and little ones together on a wire, it was a pain in the neck, and now you've fixed it so it isn't”.

For a longwinded pedant, the man does have a talent for coming to the point.

The Binary I/O Package

To read and write portable binary files, include the file BINIO.H in your compilation. You must include SYSTEM.H before BINIO.H. BINIO.H declares numerous functions, which are used to read and write binary data items on various systems. Each of these functions is of the form:

b_{r|w}type(fp, pointer[, args…]);

where type is the mnemonic for the internal type being written, fp is the file pointer, pointer is the pointer to the datum being read or written (must be an lvalue), and args are optional arguments required by some types. For example, when writing a character array an argument supplies its length.

Thus, to write a real (double precision floating point) number val to a file descriptor named ofile, use:

stat = b_wreal(ofile, &val);

Each of these routines returns the same status FREAD or FWRITE would: 1 for single item reads and writes, and the number of items transferred for array types. Currently defined type codes are as follows:

char
Characters. Signed convention is undefined. Canonical form in the file is a single 8 bit byte.
uchar
Unsigned characters. Used for utility 8 bit data. Canonical form in the file is a single 8 bit byte.
short
Signed 16 bit integers. Canonical form in the file is two's complement, least significant byte first, most significant byte last, two total bytes.
long
Signed 32 bit integers. Canonical form in the file is 4 bytes, starting with the least significant byte and ending with the most significant byte. Two's complement.
real
Double precision floating point numbers. 8 bytes in a file. Canonical form in the file is an 8 byte IEEE double precision number, stored with the least significant byte first and the most significant byte last.
string
An array of char items. The third argument specifies the number of characters to be read or written. Canonical form in the file is one byte per item, written in ascending order as they would be addressed by a subscript.

If the binary I/O package is to do its job, you must be honest with it: only pass the functions pointers of exactly the type they are intended to process. If you use b_wstring to write a structure, you're going to generate files just as incompatible as if you used fwrite. And you must never, never use an INT as an argument to one of these routines.

When using the binary I/O package, you must explicitly read and write every datum: there is no way to read composite data types with one I/O. Bob Elman's code in EREAD solves this problem by packing data into a buffer, then writing it with one call. Since this handles the entity data, which is by far the largest volume of data that AutoCAD reads and writes, I felt that taking a simpler approach in the binary I/O package would have no measurable impact on performance. I felt that the complexity of the mechanism in EREAD was not required for handling the other files.

On a system such as MS-DOS, whose native internal data representation agrees with the canonical format of the database file, the various read and write functions are simply #defines to the equivalent calls on FREAD or FWRITE. The variable TRANSFIO in SYSTEM.H controls this. If it is not defined, all of the binary I/O routines generate in-line calls on FREAD and FWRITE. If TRANSFIO is defined, machine specific definitions in BINIO.H are used to define the I/O routines. Compatible types such as char may still generate direct I/O calls, but incompatible types should be defined as external int-returning functions.

If a machine uses a non-IEEE floating point format, the b_rreal and b_wreal functions must convert the IEEE format in the file to and from the machine's internal representation. In addition, because the entity data I/O code in EREAD.C does not use the Binary I/O package, you must tell it to perform the conversion. You do this by adding the statement:

#define REALTRAN

in the SYSTEM.H entry for the machine. This will generate code within EREAD.C which calls two new functions your binary I/O driver must supply. Whenever a real number is being written to a file, EREAD will call:

realenc(bufptr, rvalue);

where bufptr is a “char *” pointing to an 8 byte buffer in which the canonical IEEE value should be stored (remember, lsb first), and rvalue is the real number value to be stored, passed in the machine's internal type for double. When a number is being read, a call is made to:

rvalue = realdec(bufptr);

in which bufptr points to an 8 byte area containing the IEEE number. Realdec must return the corresponding internal value as a double.

Each machine architecture must define a binary I/O driver providing the non-defaulted I/O routines, and if real number conversion is required, realenc and realdec. Examine the driver for the Motorola 68000 family (BIO68K.C) for an example of such a driver.

Modifying AutoCAD

Utilising the binary I/O package within AutoCAD to implement portable databases involved modifications in several areas. The changes are large, numerous, widespread, and significant, despite their limited impact on what gets written into the file. Installing them and debugging database compatibility was not a difficult design task; it was simply a matter of hacking, slashing, slogging, and bashing until every place where a nonportable assumption was made was found, and then fixing them all. “That's what you were always best at,” Galt interjected. I said that I hoped so, for I know of no single project I've done within AutoCAD which is so likely to destabilise the product as this one. The following paragraphs cover the highlights of each section.

The Drawing Database

Making drawing databases compatible consisted of several subprojects. The result of all of this is that an AutoCAD with the new code installed can read existing drawing databases written by the machine on which it is executing, old MS-DOS databases, and new portable databases. It writes new portable databases, which can be read by any AutoCAD with this code installed.

The ability to read both formats of databases is implemented via the flag rstructs. When a drawing database header is read by MVALID, if it is an old, nonportable database, rstructs is set to TRUE and the file pointer used to read the file is saved. Subsequent reads from that file will use the old code to read aggregate data. At the end of every database reading operation, such as INSERT or PLOT, rstructs is cleared.

The drawing header.

The drawing header is managed by code in MASTREC.C. The header is defined, for I/O purposes, by a table called MTAB. This table previously contained pointers and lengths for all the items in the header, and each was written or read with an individual call on FREAD or FWRITE. Compatibility problems were created by the fact that the header contained several kinds of composite objects: symbol table descriptors, transformation matrices, the “header header”, a view direction array, Julian dates, and calendar dates. I modified the table to contain an item type and implemented a switch to read and write each item with the correct calls on the Binary I/O package. Special code had to be added for each composite type to read and write it; just adding entries to the table for the components of the composite types falls afoul of the mechanism that allows addition of new fields to the header. I tried it; it doesn't work. The symbol table descriptors have a several unique problems: first, their definition contains a “FILE *” item. The length of this item varies depending on the system's pointer length, so the structure changes based on this length. On MS-DOS systems, data in the structure totals 37 bytes, and different compilers pad this structure differently. The file pointer field means nothing in a drawing database, but it is present in all existing databases and it varies in length. But if you think that it never uses a pointer read from a file, you haven't looked at the code in WBLOCK.C that saves and restores the header around its diddling with it. Look and see the horror I had to install to fix that one.

The symbol tables.

The symbol tables, managed by SMIO.C, were an utter catastrophe from the standpoint of portability. The problems encountered in MASTREC with their headers was only a faint shadow of the beast lurking within SMIO. To refresh your memory, each symbol table has a descriptor which is usually in the drawing header (another symbol table is used for active font and shape files, but it is not saved with the drawing and does not enter this discussion). The descriptor for the symbol table contains its length, the number of items in the table, the file descriptor used to read and write it, and the address within the file where it starts. There is no type field in a symbol table. Symbol tables are read and written by the routines GETSM and PUTSM, which are passed the descriptor. Each symbol table entry consists of a structure containing several fields of various types.

Previously, GETSM and PUTSM did not care about the content of the symbol table record; they just read and wrote the structure as one monolithic block. That, of course, won't work if you want the tables to be portable: each field has to be handled separately with the Binary I/O package. So in order to do this, GETSM and PUTSM must know the type of table they are processing.

“So,” said Galt, “add a type field to the table.”

“Heh, heh, heh,” I said, walking over to the Sun and bringing up all the references to the block symbol table descriptor in CSCOPE. There are few data structures within any program that are chopped, diced, sliced, shuffled, and maimed as much as an AutoCAD symbol table descriptor. Most (but not all) live within the drawing header. They can point to their own file or be part of a monolithic database. They contain that ghastly variable length file pointer which gets written in the drawing header. They get copied, created dynamically in allocated buffers, and in WBLOCK, saved to a file, modified to refer to another file, then read back in. And that “length” field I mentioned, sm_eln. Well, it may include a trailing pad byte on the disc depending on which compiler and options made your MS-DOS database. And it gets used both to seek into the file and to dynamically allocate symbol table descriptors except in the places where it uses sizeof(struct whatever) instead. One week into this project, I had the feeling that I had not stuck my head into the lion's mouth—I had climbed into the lion's stomach.

The most severe fundamental problem was that I had to both decouple the symbol table descriptor on disc from the one in memory, and also introduce separate lengths for the symbol table as stored on disc (used to seek to records) and in memory (used to allocate buffers, copy tables, and so on). I ended up adding two fields to the symbol table item in memory, sm_typeid and sm_dlen, which specify the type of the symbol table (mnemonics are defined in SMIO.H) and its length as stored on the disc. When a symbol table is in memory, sm_eln specifies the length of the structure in memory. When a symbol table is written out, the two new fields are not written: instead the disc length is written into the sm_eln field and the type is expressed implicitly in the symbol table's position in the drawing header.

By the way, the lack of a type code in symbol tables has been felt before: there is some marvelous to behold code in WBLOCK.C that figures out which symbol table it is working on by testing the pointer against the descriptor address. I did not fix these to use my new type codes. Somebody should some day. Once the type codes and disc lengths were present, the changes to process the symbol tables separately were straightforward to install in SMIO.C.

Because the code to process the symbol tables field by field is substantially larger and also somewhat slower than reading a single structure, I set up conditional compilation to use the old code on MS-DOS. Since MS-DOS already writes the tables in canonical form and has the most severe memory constraints, there's no reason it should have to pay the price of compatibility code it doesn't need. Note that if you remove the #ifdefs on MSDOS from the file, it will still work fine: it will just be bigger and slower.

The entity headers.

There is a fixed set of fields which precedes every entity in the drawing database to specify its type, flags, length of the packed data which follows, and a pointer. When Bob made the entity data compatible, he could not use his scatter/gather mechanism for these fields because they control the scatter/gather process. I modified EREAD.C to use the Binary I/O package for these fields. In addition, if REALTRAN has been defined on this system, the gathreal and scatreal functions call realenc and realdec routines to translate floating point formats. If REALTRAN is not defined, no additional code is compiled or executed, so IEEE-compatible systems pay no price for the possibility of floating point format conversion. The floating point conversion mechanism has never been tested.

Shape and text font files

Compiled text font and shape files were made compatible by using the Binary I/O package within SHCOMP.C when compiling a shape file and in SHLOAD.C when loading it. The shape files written by the modified code are identical to those generated by an MS-DOS AutoCAD but are incompatible with other systems. All .SHX files on non-MS-DOS systems must be recompiled when converting to this release of AutoCAD. Attempting to load an old format file results in an I/O error message. It was my judgement that considering the tiny installed base of non-MS-DOS systems, it just wasn't worth putting in some form of level indicator and generating a special message. This code has never been tested with “big fonts” (e.g., Kanji).

DXB files

Binary drawing interchange files were just plain busted on non-MS-DOS systems. The problems were:

  1. Type codes greater than 127 did not work due to some code incorrectly copied from SLIDE.C.
  2. An fread was done into an int, resulting in failure on any machine whose ints are not 16 bits.
  3. The AutoCAD manual documented .DXB files as being in Intel byte order, but the code did not perform the required reversals.

I modified all I/O within DXBIN.C to use the Binary I/O package, and corrected these problems. All systems now read DXB files which are compatible with existing MS-DOS files. Since the existing code in non-MS-DOS systems could never have worked, compatibility with existing non-MS-DOS DXB files is not a consideration since none exist.

Slide files

I corrected a problem in my earlier submission of code to make slide files portable which was found by the regression test. A null slide file created by MS-DOS (or the new portable code) would get an I/O error if you attempted to view it on a Sun. SLIDE.C was reading the in-memory length of the slide file header when it validated the header. I changed it to read the portable length in the file.

Compatibility status summary

The following is a summary of AutoCAD file portability as of the integration of this code.

Drawing files
Fixed to be compatible. All systems read both their own old-format files and the new portable files. All systems emit portable files.
ASCII files
Fixed to be compatible. Note that this causes the following file types to become compatible: HLP, HDX, SHP, DXF, DXX, MNU, PAT, LIN, PGP, MSG, LSP.
ACADVS
The virtual string file is compatible by design.
Filmrolls
Compatible by design.
Slides
Fixed to be compatible. Systems can read their own old files and portable files. All write portable files.
Slide libraries
Compatible by design.
SHX files
Fixed to be compatible. Old MS-DOS files are portable. Old non-MS-DOS files must be recompiled.
IGES files
Compatible by design.
DXB files
Fixed to be compatible. Previously worked only on MS-DOS. Old MS-DOS files work without modification.
[MNX files]
Incompatible. A system must use menus compiled by its own AutoCAD.

Upper and lower case

I have done nothing in this project to resolve the issue of case conventions for file names. I consider this issue so controversial and politically charged that I'm not yet ready to step into it. I hereby submit my recommendations for comment. Each system will define a tag in SYSTEM.H called CASECONV. It shall be set to one of four values:

CCMONOU      System is monocase and uses upper.
CCMONOL      System is monocase and uses lower.
CCULU        System uses both cases and prefers upper.
CCULL        System uses both cases and prefers lower.

When a system writes a drawing database, it stores its CASECONV setting in the drawing header. This is referred to as the “case convention of the sending system”. When a system reads a drawing, if it was created on a system with a different case convention, it processes file names in symbol table entries based upon a matrix of the sending system's case convention and its own. If the receiving system is monocase, file names in symbol tables are not translated, but FFOPEN and its clones translate all file names to the receiving system's case convention before submitting them to the system. If the receiving system uses both cases and the sending system was monocase, names in symbol tables are translated at read-in time to the preferred case of the receiving system. The names are then used as modified, without further modification by FFOPEN. This is asymmetrical and impossible to justify except by convincing yourself that this is the best approximation to what's best for the user.


My throat was feeling a little dry after such a lengthy dissertation. I got up to refill my glass. When I walked back to my chair, Galt was flipping through the listing of SMIO.C next to the Sun. He turned to me and said, “Why do you do this? Here you are in the middle of the night struggling trying to trick this megalith of software into threading its way around incompatibilities between computers that aren't even of your making.”

I replied, “Differences in products are a consequence of their rapid evolution in a free market. Incompatibility is the price of progress”. John Galt was speechless for at least 12 seconds.

He rose and said, “Join us. You weren't ready in 1967. Now, in 1987 you should see that you're struggling to make money in a world where the money you make is taxed away and handed to defence contractors like Lockheed and McDonnell-Douglas, who turn around and compete against you with products your taxes paid to develop. While so many others are sleeping, you labour to produce intellectual property, then you listen to others lecture you on their “right” to steal it. Can't you feel the circle closing? Can't you see that this can't go on? Why not hasten the inevitable and pave the way for a brighter day? You should drop out, or work to hasten the collapse.”

I looked at the DIFFs of my portable database code. I said, “After this project, I can't help but feel that hastening the collapse would be an exercise in supererogation.”

Galt shrugged. He sat back down and said, “Your time hasn't yet come. I try to talk to people when they'll see the issues most clearly. I try to find the times when they see what they're doing and begin to wonder why. I'll be back. It may be in two days, two years, or maybe twenty years.”

We talked for an hour or so about old times, common friends, and shared interests. He left as the sun was rising.

Urgent Fury     Breaking the 3D Logjam