Juho Snellman's Weblog

Numbers and tagged pointers in early Lisp implementations

jsnell@iki.fi — Mon, 04 Sep 2017 15:00:00 GMT

There was a bit of discussion on HN about data representations in dynamic languages, and specifically having values that are either pointers or immediate data, with the two cases being distinguished by use of tag bits in the pointer value:

If there's one takeway/point of interest that I'd recommend looking at, it's the novel way that Ruby shares a pointer value between actual pointers to memory and special "immediate" values that simply occupy the pointer value itself [1].
This is usual in Lisp (compilers/implementations) and i wouldn't be surprised if it was invented on the seventies once large (i.e. 36-bit long) registers were available.

I was going to nitpick a bit with the following:

The core claim here is correct; embedding small immediates inside pointers is not a novel technique. It's a good guess that it was first used in Lisp systems. But it can't be the case that its invention is tied into large word sizes, those were in wide use well before Lisp existed. (The early Lisps mostly ran on 36 bit computers.)

It seems more likely that this was tied into the general migration from word-addressing to byte-addressing. Due to alignment constraints, byte-addressed pointers to word-sized objects will always have unused bits around. It's harder to arrange for that with a word-addressed system.

But the latter part of that was speculation, maybe I should try to check the facts first before being tediously pedantic? Good call, since that speculation was wrong. Let's take a tour through some early Lisp implementations, and look at how they represented data in general, and numbers in particular.

The problem with integers
LISP I
LISP 1.5
Basic PDP-1 LISP
M 460 LISP
PDP-6 LISP
BBN LISP
Conclusion

The problem with integers

Before we get started, let's state the problem that tagged pointers solve. In a dynamically typed programming language, the language implementation must be able to distinguish between values of different types. The obvious implementation is boxing; all values are treated as blobs of memory allocated somewhere on the heap, with an envelope containing metadata such as the type and (maybe) the size of the object.

But this means that integers now have tons of overhead. They use up heap space, need to be garbage collected, and new memory needs to be constantly allocated for the results of arithmetic operations. Since integers are so critical to almost all kinds of computing, it would be great to minimize the overhead. And ultimately, to eliminate the overhead completely by encoding small integers as recognizably invalid pointers.

LISP I

I wasn't super hopeful about finding out exactly what numbers looked like in the original Lisp implementation. As far as I know, the source code hasn't been preserved. Now, the original paper describing Lisp ( Recursive Functions of Symbolic Expressions and their Computation by Machine, Part I ) isn't quite as theoretical as the title suggests. For example it describes the memory allocator and garbage collector on a reasonable systems level. But it doesn't mention numbers at all; this is a system for symbolic computation, so numbers might as well not exist.

The LISP I Programmer's Manual from 1960 is more illuminating, though not entirely consistent. In one place the manual claims that LISP I only supports floats, and you'll need to wait until LISP II to use integers. But the rest of the document happily describes the exact memory layout of integers, so who can tell.

A floating point value looks like this:

Let's say we have the value 1.0 in a LISP I program. This value is actually pointer to a word. How do we know what the type of the pointed to word is? If the upper half of that word is -1, it's a symbol. Otherwise it's a cons. (The use of -1.0 and 1.0 as the example floats in this picture is unfortunate, since it looks like the -1.0 and -1 are somehow related. That's not the case, -1 is the universal tag value for atoms, and independent of the exact floating point values.)

So the number 1.0 is a symbol? Technically yes, since at this stage of Lisp's evolution everything is either a symbol or a cons. There are no other atoms. We can find out if the symbol represents a number by following the linked list starting from the cdr of the symbol (a pointer stored in the lower half of the word). If we find the symbol NUMB on the list, it's some kind of number. If we find the symbol FLO, it's a floating point number, and the property list will be pointing to a word that contains the raw floating point value that this number represents.

There's a detail here that's kind of amazing. Notice that 1.0 and -1.0 share the same list structure. The only difference is that -1.0 has the symbol MINUS in the list, after which the list merges with the list of 1.0. What a fabulously inefficient representation! Not only do you have to do a bunch of pointer chasing just to find the actual value of a number, but then you'll get to do it again to find out the sign!

The question I can't answer just from reading this document is how exactly the raw floating point value is handled. Surely the garbage collector must know not to interpret those raw bits as pointer data? There is a very detailed example of the memory layout for an integer on pages 94-95, but even with that example I just don't see where the type information is stored. It's clearly not based on address ranges (the raw values are mixed in with the other words), nor the pointer value (all the pointers are stored as 2's complement), nor the 6 unused bits in the machine word.

Suggestions welcome. My best guess is that the example is inaccurate.

LISP 1.5

The LISP 1.5 Programmer's Manual from 1962 explains in a very concise manner how numbers worked in that implementation:

Numbers are still considered to be symbols, and symbols are still marked with -1 as the car. But the standard symbol property list is now gone; instead the symbol is pointing directly to the memory that stores the raw integer value. How does the program know not to follow that pointer as a list? As the document says, that's specified by "certain bits in the tag".

The tag? What's the tag? The IBM 704 had a 36-bit word size but just a 15 bit address space. The words were split (on the ISA level) into a 3 bit "prefix", 15 bit "address", 3 bit "tag", and 15 bit "decrement". Since Lisp values are pointers, only the two 15 bit regions are useful for that. One of the 3 bit regions has been repurposed by the Lisp implementation to mark the pointers to raw data.

This is a clear improvement over LISP I, but a number is still represented as an untagged pointer to a tagged pointer to the raw value. Why is the intermediate word there at all, why not go directly with a tagged pointer to the raw value? Maybe code size?

In parallel to that, the address space has now been split into multiple separate pieces, with the cons cells being allocated from a different range of addresses than plain data like numbers and string segments. It could well be that the tagged pointer is irrelevant to the GC, which just makes its decisions on what's a pointer based on whether the pointer is contained in the "full word space" or the "free space". The tags would then be used just for implementing NUMBERP.

Basic PDP-1 LISP

For a L. Peter Deutsch joint, The LISP implementation for the PDP-1 Computer proves to be a surprisingly unsatisfying document. It's almost exclusively user documentation, with no information on the systems architecture. Well, except a full source code listing. Guess we'll have to look at that, then. NUMBERP is the easiest starting point:

/// ("is a number")
/NUMBERP
nmp,    lac i 100
        and (jmp
        sad (jmp
        jmp tru
        jmp fal

The main thing that need to be known from the rest of the code is that the interpreter stores a pointer to the Lisp value that's currently operated on value at address 100 (octal).

First "lac i 100" follows the pointer to read the first data words of the value into the accumulator. The next line looks bizarre; due to the way the PDP-1 macro-assembler works, "and (jmp" effectively means "and 600000". So this instruction is masking away all but the top two bits of the accumulator, and "sad (jmp" is checking whether the result of the masking equals octal 600000. It appears that there is nothing special about the pointer to a number, but numbers are identified by having the top two bits set in the pointed-to value.

The next step in understanding the layout is the code for reading the raw value of a number.

/get numeric value
vag,    lio i 100
        cla
        rcl 2s
        sas (3
        jmp qi3
        idx 100
        lac i 100
        rcl 8s
        rcl 8s
        jmp x

"lio i 100" loads the current Lisp value into the IO register. "cla" sets the accumulator to zero. "rcl 2s" then rotates the combination of the IO register and accumulator by 2 bits. The accumulator now contains as its low bits the previous high two bits of the IO register. "sas (3" compares the accumulator to 3; if they're not equal we jump to qi3 (the error routine for "non-numeric arg for arith"). "idx 100" moves the pointer to the next word of the value, and "lac i 100" reads that word into the accumulator. And finally the combination of the two registers is rotated by 16 bits, so that we end up with the raw 18 bit value in the accumulator. Written out step by step the process looks like this:

    . == Bit with value of 0
    ! == Bit with value of 1
    ? == Bit with unknown value
    0-9, A-H == bits of the integer value

    X                    X+1
------------------------------------------------
    [!!23456789ABCDEFGH] [................01]

    IO                   AC
------------------------------------------------
Load IO from address X
    [!!23456789ABCDEFGH] [??????????????????]
Clear AC
    [!!23456789ABCDEFGH] [..................]
Rotate left by 2
    [23456789ABCDEFGH..] [................!!]
Check AC == 3
Load AC from address X+1
    [23456789ABCDEFGH..] [................01]
Rotate left by 8
    [ABCDEFGH..........] [........0123456789]
Rotate left by 8
    [..................] [0123456789ABCDEFGH]

Clearly an integer is now represented by a pointer to two words that has a special tag in the high bits of the first word. This implementation got rid of the extra layer of indirection in LISP 1.5; an integer is now just a pointer to tagged data. But we're still left with the storage of a one-word integer requiring three words.

Why use a layout that requires shuffling data around this much, instead of just having the tag in X and the raw value in X+1? It seems awfully inconvenient. My best guess is that the top 1-2 bits of the second word are reserved for the GC, e.g. for use as mark bits. But understanding exactly how the GC works is maybe a project for another day.

M 460 LISP

Before starting research for this article, I'd never heard of the early Lisp implementation for the Univac M 460. A description of the system can be found in the 1964 collection The programming language LISP: Its operation and applications .

Numbers and print names are placed in free storage using the device that sufficiently small (i.e., less than 2^10) half-word quantities appear to point into the bit table area and so don't cause the garbage collector any trouble. A number is stored as a list of words (a flag-word and from 1 to 3 number words, as required), each number word containing in its CAR part 10 significant bits and sign. Thus an integer whose absolute value is less than 2^11 will occupy the same amount of storage (2 words) as in 7090 LISP 1.5.

This is another bit of progress! The key insight on the road to tagged pointers is that invalid parts of the address space can be used to distinguish between pointers and immediate data. Another important insight in this paper is that most numbers in a program are going to be small, so it might make sense to have variable representations for numbers of different magnitude. But it's not a full realization of the concept yet, immediate small numbers are not accessible directly by the user. They are internal to the implementation, used as a building block for boxed integers of various levels of inefficiency.

The paper gets even better once we get a few more pages in, since for characters M 460 Lisp does take that final step:

Each character in the character set available on the M 460 (including tab, carriage return, and others) is represented internally by an 8-bit code (6 bits for the character (up to case), 1 bit for case, and 1 bit for color). To facilitate the manipulation of character strings within our LISP system, we permit such character literals to appear in list structure as if they were atoms, i.e. pointers to property lists. These literals can, where necessary, be distinguished from atoms since they are less than 2^8 in magnitude and hence, viewed as pointers, don't point into free storage (where, as in 7090 LISP, property lists are stored). The predicate charp simply makes this magnitude test.

That's about as clear a case of using embedding immediate data in pointers as it gets. It's just that the tag is rather large (22 highest bits, rather than the 1-4 lowest bits you'd expect today). And it's also dealing with characters rather than numbers, so let's carry on with the investigation a bit longer.

PDP-6 LISP

The June 1966 report on PDP-6 LISP has the following to say on integers:

Fixed-point numbers >= 0 and < about 4000 are represented by a "pointer" 1 greater than their value, and no additional list structure. All other numbers use a pointer to full-word space as part of an atom header with a FIXNUM or FLONUM indicator.

This is starting to get close to the modern fixnum, except for no facility for immediate negative numbers and a tiny range. (This is a machine with 36 bit words and 18 bit pointers; one would hope for a bit more than 12 bits for immediate integers).

BBN LISP

Structure of a LISP system using two-level storage is a wonderful systems design paper from November 1966, describing BBN LISP for a PDP-1 with 16K of core memory, 88K of absurdly slow drum memory, and no hardware paging support. How do you make efficient use of the drum memory? By some clever data layout, software-driven paging, and a locality-optimizing memory allocator.

So it's actually a paper I thought was totally worth reading just for its own sake. But for the purposes of this post, this is the money quote:

LISP assumes that it is operating in an environment containing 128K words, that is from 0 to 400,000 octal. Only 88K actually exist on the drum. The remaining portion of the address space is used for representation of small integers between -32,767 and 32,767 (offset by 300,000 octal), as described below.

The paper describes a machine with both an 18-bit word size and address space, with 16-bit signed fixnums embedded in the pointers. That's about as good as it gets. (Though not quite optimal; they're using bit 17 as the integer tag, but what happened to bit 18? The paper doesn't say, but odds are that it's again a GC mark bit).

The particularly observant reader might have noticed that this machine had 104K words of physical memory, but the described tagging scheme only leaves 64K words addressable. What's up with that? On one level it's exactly what M 460 LISP and PDP-6 Lisp were doing: that 40K of address space stores things that can't be directly pointed to from another Lisp value. But those other implementations were just opportunistically reusing the parts of address space that contained native code.

By contrast, BBN LISP carefully arranged for there to exist as much of such storage as possible, and for it to be located above the address 200,000 (octal).

The most clever example of that is the representation of symbols. The first implementations we saw just implemented symbols as a list of properties indexed by name (e.g. name, value cell, function cell, etc). An obvious optimization is to allocate a symbol as a single larger block of memory with fixed slots for the most common properties, and a generic property list slot to contain anything else.

What BBN Lisp does instead is allocate a symbol in multiple separate blocks rather than a single contiguous one. A pointer to the symbol will point to the block of value cells, so reading the value cell is trivial. What if you want to read another property, e.g. the function? We look at the offset of the value cell pointer to the start of the value cell block, and access the function cell block at the same offset. In modern parlance it ends up as an structure-of-arrays layout rather than an array-of-structures.

In addition to getting more address space for fixnums, they also got exactly the same kind of locality improvements that an structure-of-arrays would be used for today. So it was an all-around neat optimization.

There is also an early design document for BBN 940 LISP from almost the same time as the above paper. It appears to describe the kind of elaborate tagging scheme that a modern Lisp might use, and places the tags in the low bits where they're easier to test for/eliminate. And they even call heap-allocated numbers "boxed"! I had no idea this terminology was in use 50 years ago. The relevant section:

There will be a maximum of 16 pointer types of objects in the 940 LISP System. These are (numbered in octal)

00. S-expressions (nonatomic)
01. Identifiers (literal atoms)
02. Small Integers
03. Boxed Large Integers
04. Boxed Floating Point Numbers
05. Compiled Function - Lambda Type
06. Compiled Function - Lambda Type - Indef Args
07. Compiled Function - Mu Type - Args Paired
10. Compiled Function - Mu Type - List of Args
11. Compiled Function - Macro
12. Array - Pointers
13. Array - Integers
14. Array - FP #s
15. Strings - Packed Character Arrays
16.
17. Pushdown List Pointers

Each pointer will be contained in one 940 word of 24 bits. Bits 0 and 1 will be nominally empty, and may in some cases be used by the system (e.g. bit 0 for garbage collection) or perhaps even the user (in S-expressions). The four bits 2-5 will contain the type number for this pointer. The 18 bits 6-23 will contain an effective address (in the LISP drum file) where the referenced information is stored.

It looks like they ended up not using this design for BBN 940 LISP, and it instead uses an extended version of the segmented memory scheme from the PDP-1 implementation described earlier in this section. But even if these particular bits weren't practical to use with that hardware, at this point just about all the ideas for tagged pointers have definitely been invented.

Conclusion

The initial LISP I implementation in 1960 had the least efficient implementation of numbers this side of church numerals, where even just getting the value might imply chasing half a dozen pointers. But new implementations optimized that layout aggressively. By 1964, the M 460 LISP implementation had arrived at the general solution of using pointers to invalid parts of the address space for storing immediate data, but user-accessible integers were still boxed; the only use for the unboxed integers was as an internal building block. In 1966 PDP-6 LISP applied the idea of tagged immediate data to tiny positive integers, and the PDP-1 based BBN LISP took the idea to the logical conclusion, and allowed immediate storage of integers of almost the full machine word.

I would not have guessed that these optimizations were discovered and applied so early and so aggressively. It's also noteworthy that this was independent of both the machine word size, address space size, and addressing mode of the machine. The first fully fledged implementation I found was on a machine with 18 bit words, 18 bits of address space, and word-addressing. That should have been just about the worst case!

There's an interesting tangent with how MacLISP ended up reversing this progress in the '70s and going back to boxed integers, since they wanted to have just a single integer representation. I won't go into the details since this post already grew longer than intended. But for those interested in the subject AI Memo 421 is a fun read.

Was the technique definitely first used in Lisp? These implementations are early enough that there aren't a ton of other possibilities. The only ones I can think of would be APL and Dartmouth BASIC. If anyone can find documentation on earlier uses of storing immediate data in tagged pointers, please let me know and I'll edit the article.

The origins of XXX as FIXME

jsnell@iki.fi — Mon, 17 Apr 2017 18:00:00 GMT

The token XXX is frequently used in source code comments as a way of marking some code as needing attention. (Similar to a FIXME or TODO, though at least to me XXX signals something far to the hacky end of the spectrum, and perhaps even outright broken).

It's a bit of an odd and non-obvious string though, unlike FIXME and TODO. Where did this convention come from? I did a little bit of light software archaeology to try to find out. To start with, my guesses in order were:

MIT (since it sometimes feels like that's the source of 90% of ancient hacker shibboleths)
Early Unix (probably the most influential codebase that's ever existed)
Some kind of DEC thing (because really, all the world was a PDP)

Other uses of `XXX`

It turns out that XXX and xxx are incredibly annoying things to search for in old code. I'd bet it's the most common sequence of 3+ identical letters in source code. That means there's a ton of false positives to sift through. Here's a few examples of the kind of stuff that will be found.

By far the most common use of XXX in old is for it to be some kind of a template placeholder. This makes some sense; x for an unknown value has an obvious long history that predates computing. These templates might be used to describe the exact data layout of something, like in the following bits from the Apollo guidance computer:

# 17    ASTRONAUT TOTAL ATTITUDE      3COMP   XXX.XX DEG FOR EACH
# 18    AUTO MANEUVER BALL ANGLES     3COMP   XXX.XX DEG FOR EACH
# 19    BYPASS ATTITUDE TRIM MANEUVER 3COMP   XXX.XX DEG FOR EACH
# 20    ICDU ANGLES                   3COMP   XXX.XX DEG FOR EACH
# 21    PIPAS                         3COMP   XXXXX. PULSES FOR EACH
# 22    NEW ICDU ANGLES               3COMP   XXX.XX DEG FOR EACH
# 23    SPARE
# 24    DELTA TIME FOR AGC CLOCK      3COMP   00XXX. HRS. DEC ONLY

Or as just a wildcard for a bunch of related names, like the in this Lisp Machine source code:

;Q-FASL-xxxx refers to functions which load into the cold load, and
; return a "Q", i.e. a list of data-type and address-expression.
;M-FASL-xxxx refers to functions which load into Maclisp, and
; return a Lisp object.

Or as actual templates-as-program, with parts of an input remains while others (those marked with XXX) are programatically replaced. For example temporary file generation in in UNIXv5:

                f = ranname("/usr/lpd/dfxxx");

And finally, it could denote parts of persistent data structures that were reserved for future use (or no longer used), for example in CPM:

/* THE FILE CONTROL BLOCK FORMAT IS SH0WN BELOW:
   --------------------------------------------------------
   /    1 BY / 8 BY / 3 BY / 1 BY /2BY/1 BY/ 16 BY /
   /F1LETYPE/   NAME / EXT / REEL NO/XXX/RCNT/DM0 DM15/
   --------------------------------------------------------

   FILETYPE     :       0E5H IF AVAILABLE (OTHERWISE UNDEFINED NOW)
...
   XXX          :       UNUSED FOR NOW
   RCNT         :       RECORD COUNT IN FILE (0 TO , 127)

A less savoury use of XXX is as an identifier for something that didn't even qualify to have a real name. Most commonly it'd be the name of a branch target, like in a very early version of the C compiler:

    xxx:
        if (o==KEYW) {
                if (cval==EXTERN) {
                        o = symbol();
                        goto xxx;
                }

It could also be used to name variables. The following is from the FORTRAN II compiler for the IBM 704 from 1958. (I don't read 704 assembler, so maybe I'm misinterpreting what's going on in that program. It seems funny enough that I wanted to include it here anyway).

XXXXXX SYN 0  THE APPEARANCE OF THIS SYMBOL IN   F4400370
       REM       THE LISTING INDICATES THAT ITS  F4400380
       REM       VALUE IS SET BY THE PROGRAM.    F4400390

Some DEC code seems to have gone really overboard with this, with single source files having half a dozen different XXXYYY identifiers. (Sorry, had to use YYY as the placeholder there for obvious reasons).

Finally, there are all kinds of bizarre one-off uses. TENEX seems to have used XXX for implementing rubout. That is, when you'd press backspace to delete something you've typed, it'd print out XXX on the teletype to mark the deletion. (Rather than try to move the cursor back). Some kind proto-instant messaging program from 1976 written in Interlisp that I found would just print XXX as the error message for invalid user input.

Now, sorry if the above parts were kind of tedious. But there is actually a point here. Turns out that XXX is a really stupid marker to use for a FIXME. Looking at the Panda TOPS-20 distribution, there are 3083 instances of XXX, none of which are FIXMEs. Just about anything else would be easier to find. This makes its use as one of the three main FIXME-markers all the more puzzling.

`XXX` as a `FIXME`

To get the negative results out of the way, there is absolutely no sign of this being an MIT or DEC thing. XXX as FIXME doesn't appear on ITS or TOPS-20 disks, nor does it appear in any of the mountains of really old Lisp code that I happened to have around; I don't think it makes it to Lisp-land until the mid-'80s. It's also absent in smaller collections of old code from other sources.

No, this seems to definitely be a Unix thing. There are a couple of interesting possibilities in early BSD. First, there's the following lines in a package of troff macros that first appeared in 2BSD, with a copyright date of 1978:

..
.de (t                 \" XXX temp ref to (z
.(z \\$1 \\$2
..
.de )t                 \" XXX temp ref to )t
.)z \\$1 \\$2

I'm pretty sure these are not actually a FIXME. It looks like the convention in this code was to mark .de commands with three character tags depending on their type, as explained in the beginning of the file:

+.\"	Code on .de commands:
+.\"		***	a user interface macro.
+.\"		&&&	a user interface macro which is redefined
+.\"			when used to be the real thing.
+.\"		$$$	a macro which may be redefined by the user
+.\"			to provide variant functions.
+.\"		---	an internal macro.

These lines seem to have been commands that didn't fit into those existing categories, and needed a new tag.

Next up, there's a bunch of very promising looking changes to the troff C source in the summer of 1980. Stuff like:

if(j == ' '){
        storeword(i,width(i));  /* XXX */
        continue;
}

That certainly looks like a classic FIXME. But I think this is another dead end. It turns out that after this change there are 37 /* XXX */ comments in code that didn't use to have any. And when comparing to Unix v7 source code, it looks like basically every single line that was changed got marked with one. So it's unlikely that these are actual FIXMEs. I think this was just the author making sure they could identify their changes, in case they wanted to reintegrate with "upstream".

Soon after that BSD moves to SCCS, and we start getting fine-grained changes rather than huge code-dumps. From there, it's easy to find the first /* XXX */ commit from Nov 9, 1981. This one is interesting in a few ways:

This is definitely a FIXME; just a few very special parts of the code got tagged, and many of them got rewritten soon after.
After this commit, the use of /* XXX */ starts spreading quickly through the BSD codebase and eventually to other authors.
A closer reading of the commit shows something interesting: a bunch of /* ### */ comments. Going through the earlier history, it seems that Bill Joy had been marking his FIXMEs with ###, and halfway through this commit changed to using XXX. I don't know why, or whether these two markers were intended to have slightly different semantics (like ### was code that needed to be fixed, XXX was code that was commented out and needed to be fixed and re-enabled). But XXX quickly became the preferred form.

                        if (rcv_empty(tp)) {                    /* 16 */
-                               tcp_close(tp, UCLOSED);
+                               sowakeup(tp->t_socket); /* ### */
+/* XXX */                      /* tcp_close(tp, UCLOSED); */
                                nstate = CLOSED;
                        } else

(On a personal note, as someone who goes out of their way to read through any published TCP stacks, I'm kind of amused that a search for a random historical trivia leads me to a damn TCP stack).

Leaving it at that seems like a good story. And I'd already checked basically all of Bell Labs code that I could find. It's not in Unix v2-v7 and not in the Programmer's Workbench. But then I decided to check Unix v1 just for completeness sake, and got very confused. Because...

/ XXX fix me, I dont quite understand what to do here or
/ what is done in the similar code below e407:
/ cmp   r5, u.count / see if theres enough room
/ bgt   1f
mov     r5,u.count / read text+data into core

WTF? It doesn't get any clearer than that. But where did it come from? And if this convention was used at Bell Labs in 1970, where did XXX disappear for a decade?

Turns out this was a false alarm. The only reason we have the Unix v1 source code in the first place is that a team of people transcribed the source from PDF scans to text. Then they went on to make it possible to compile the code and run it in an emulator. As part of this latter work, a block of code was added to the source. And a bit unfortunately it was this patched version rather than the "original" that made it to the Unix History Repo. This comment was actually from 2008, not 1971.

There's actually an interesting story behind that extra block of code, as told by Toomey. After finally getting the v1 kernel transcribed, compiled, and running, they hit the problem of the only having two userland programs available: init and sh. Everything else was using a more recent executable header. To be able to do anything at all with the system, they needed to add support for "0407 binaries" as opposed to the "0405" ones the kernel supported natively.

What about C code outside of Unix distributions? It's actually kind of hard to find any of that from before 1982. There might be an earlier instance in Gosling Emacs, though it differs from the modern form by going for a full 9 Xs:

#ifdef HalfBaked
/*    sigset (SIGINT, InterruptKey); *//*XXXXXXXXX*/
    sigset (SIGINT, InterruptKey);/*XXXXXXXXX*/
#endif

And there's a Changelog entry from July 1981, which seems to match up perfectly with both the functionality of the code, and the surrounding ifdef:

Tue Jul  7 12:51:44 1981  James Gosling  (jag at VLSI-Vax)
        ... I also installed Dave
        Dyer's hack to allow ^G's to interrupt execution immediatly.  This
        has a rather major bug, and is the reason that I didn't implement
        it a long time ago: if you type ^G while Emacs is doing output,
        then all queued-but-not-printed characters get lost and Emacs no
        longer has any idea of what the screen looks like. It is pretty
        much impossible for Emacs to tell whether or not this has
        happened. You end up having to type ^L now and then.  The
        "HalfBaked" switch in config.h controls the compilation of this
        facility, ...

But thankfully this code has RCS history starting from 1986, and somebody did in fact edit this code in 1986 with no functional changes, but adding the commented out copy and the XXXXXXXXX:

 #ifdef HalfBaked
-    sigset (SIGINT, InterruptKey);
+/*    sigset (SIGINT, InterruptKey); *//*XXXXXXXXX*/
+    sigset (SIGINT, InterruptKey);/*XXXXXXXXX*/
 #endif

And those are the only signs of XXX in applications that could predate the BSD usage. Both were red herrings, caused by how difficult it's to actually find pristine copies of source code that old. It was very lucky that the Gosling Emacs comment was added after the code was put to RCS, and made not in the five year interval between the original commit and the project starting to use RCS.

So it seems likely that this convention was invented by Bill Joy in BSD. If he wasn't the first one, he was certainly the one that popularized it. Why he chose to switch to the rather inconvenient XXX from ### is unclear.

If you can find an earlier occurence (or know of good collections of pre-1981 C source code), please let me know and I'll update the post.

The most obsolete infrastructure money could buy - my worst job ever

jsnell@iki.fi — Tue, 01 Sep 2015 17:30:00 GMT

Today marks the 10th anniversary of the most bizarre, and possibly the saddest, job I ever took.

The year was 2005. My interest in writing a content management system in Java for the company that bought our startup had been steadily draining away, while my real passion was working on compilers and other programming language infrastructure (mostly SBCL). One day I spotted a job advert looking for compiler people, which was a rare occurrence in that time and place. I breezed through the job interview, but did not ask the right questions and ignored a couple of warning signs. Oops.

It turned out to be a bit of an adventure in retrocomputing.

The bizarre

This was the former internal tools unit of a very large company, let's call them X. For some reason X had split off the unit and sold (given?) it to a moderately large consulting company, whom we shall call Y. I was going to work at Y. The reason they needed compiler people was that they were about to take over the maintenance of a C compiler suite (compiler, linker, assembler, etc). Except I'd misunderstood them as taking over the maintenance from X. That wasn't the case. Actually the compiler was from another very large company, Z, who were discontinuing all support. So X bought the source code from Z for very significant $$$, and needed somebody (Y) to actually do something with it. In fact it wasn't even just one compiler suite as I'd initially understood, it was two. Woo, double the compilers to play with!

I started in September, but some schedules had slipped and we wouldn't actually have anything to work with for a month or two. So I had plenty of time to acclimatize there. Which is good, because it's like I'd stepped into some strange parallel dimension where the 80s never ended. You know, the kind of place where you need access to some old documentation, and eventually find it's stored in an ingenious in-house source control system built on top of RCS.

For example on my first day I found that X was running what was supposedly largest VAXcluster remaining in the world, for doing their production builds. Yes, dozens of VAXen running VMS, working as a cross-compile farm, producing x86 code. You might wonder a bit about the viability of the VAX as computing platform in the year 2005. Especially for something as cpu-bound as compiling. But don't worry, one of my new coworkers had as their current task evaluating whether this should be migrated to VMS/Alpha or to VMS/VAX running under a VAX emulator on x86-64! [0]

Why did this company need to maintain a specific C compiler anyway? Well, they had their own ingenious in-house programming language that you could think of as an imperative Erlang with a Pascal-like syntax that was compiled to C source [1]. I have no real data on how much code was written in that language, but it'd have to be tens of millions lines at a minimum.

The result of compiling this C code would then be run on an ingenious in-house operating system that was written in, IIRC, the late 80s. This operating system used the 386's segment registers to implement multitasking and message passing. For this, they needed the a compiler with much more support for segment registers than normal. Now, you might wonder about the wisdom of relying on segment registers heavily in the year 2005. After all use of segment registers had been getting slower and slower with every generation of CPUs, and in x86-64 the segmentation support was essentially removed. But don't worry, there was a project underway to migrate all of this code to run on Solaris instead [2].

After a couple of months of twiddling my thumbs and mostly reading up on all this mysterious infrastructure, a huge package arrived addressed to this compiler project. But... We were supposed to get a source dump. Why does the package need two men to carry it? Did somebody play a practical joke on us, and send the source as printouts?

Why it's the server that we'll use for compiling one of the compiler suites once we get the source code! A Intel System/86 with a genuine 80286 CPU, running Intel Xenix 286 3.5. The best way to interface with all this computing power is over a 9600 bps serial port. Luckily the previous owners were kind enough to pre-install Kermit on the spacious 40MB hard drive of the machine, and I didn't need to track down a floppy drive or a Xenix 286 Kermit or rz/sz binary. God, what primitive pieces of crap that machine and OS were.

You might wonder about the wisdom of using a 15-20 year old machine as the sole method of building a piece of software. It's dog slow and obviously will break sooner or later. In fact I raised this very issue and suggested maybe imaging the hard drive and getting everything running virtualized. That idea was nixed since the machine was old and fragile, we couldn't risk poking around in the inside. It'd be really hard to replace, when they went hunting for this machine from antique computer specialists, they only found two remaining working units [3].

This might be a good time to say that computationally speaking, I was raised by the wolves on a SunOS 4 server (which I ended up sysadmining for a few hundred users). My personal email was still going over UUCP in 2005. The highlight of my previous weekend (in 2015, when I'm writing this) was finding what looks like a partial source repository for a Lisp implementation written before I was born, and which appeared to have been completely lost to time. It was on a copy of some old backup tapes from an ITS server, and I don't even remember how or when those ended up on my harddrive. Which is to say, I like old computer systems more than is reasonable.

But even by my standards this level of computational archeology was going a bit too far. And the rabbit hole still had a little bit deeper to go.

A couple of weeks later the source drop arrived. I'll talk about the other compiler later, let's tackle this one that needed to be built on a 286 first.

So it was written in PL/M. (Wait, is that even a thing? That's not a thing, right?). And it was last modified in the mid 80s. I'd like to say the build instructions were generated using a typewriter, but it could be that my memory is playing tricks on that. Some of the components didn't build cleanly, and required various Makefile tweaks with excruciating round trip times for every test. Because, you know, this is a 286.

The hard drive wasn't large enough for all of the components either, so the process of rebuilding everything would be:

Upload the linker source tarball over the 9600 bps serial connection from a Linux server acting as a frontend
Unpack it
Build
Download the linker binary back to safety
Remove the source and the build artifacts
Repeat the same for all five components of the system.

Just the data transfers for each component took an hour. But after a long time fighting with it I had a script that with a single keystroke generated bit-identical binaries when compared to the ones that had apparently been in use for almost the last 20 years.

I was pretty worried though, it'd be really hard to actually make any use of this source. There was no documentation except for the build instructions, we'd need to reverse engineer everything. There wouldn't be any training either from company Z either, frankly it's a miracle if anyone who originally worked on the software was still with the company. Nobody knew PL/M. The roundtrip time from making a change on the build machine to having a binary on a machine capable of actually running it was at least an hour. And we didn't have a source level debugger for this, so that'd mean an hour just to add a single debug printf. (Wait, not a debug printf of course. A debug whatever-it-is-that-PL/M-uses-for-io). It'd be pure pain.

I expressed these concerns, and was told not to worry.
- " Oh, we'll never want to make changes to this compiler, not enough code is compiled with it these days for that to be worth it. The more modern suite is the important one."
- "Wait? I just spent a month elbow deep in PL/M and Xenix/286 over a 9600bps Kermit connection, and you're telling me we're never going to actually use any of this?!"
- "Right, we just needed to verify that we really got what we bought."

I didn't really know whether to be happy about not having to do any more work on that crap, or angry about the waste of time.

The sad

That concluded the bizarre retrocomputing part of the story. We now get to the part with sad dysfunctional corporate politics. If you're just reading this for the laughs, maybe just skip to the end.

The more modern compiler suite wasn't a spring chicken either. It had to be compiled specifically on Visual Studio 6. There were again no design docs, nor tests. The lack of tests was explained as being due to third party IP concerns. The lack of documentation we never got an answer for.

Unlike the truly ancient compilers, this one was easy to build. But what could we possibly do with it? So I read through the compiler, tried to understand what each file did, did some experiments and wrote some notes.

We arranged a big meeting with senior engineers from all the relevant departments of X. The agenda was to figure out what improvements they'd want in the compiler. It was pretty dispiriting. Half of them seemed to think it'd be better not to touch it at all, since we'd probably just break it. Even those who weren't completely opposed to changes couldn't think of anything they really needed. Finally someone took pity on me, and noted that the compiler isn't very smart about scheduling segment register loads, and those were expensive operations. Maybe that could be improved?

After the meeting one of the managers told me that it was really our job to come up with projects that the customer wanted to buy, not the other way around. And it usually couldn't just be a general project for minor improvements, it'd need clear and ideally measurable goals. The projects would also need to be pretty large to justify all the overhead. It should go without saying that this is an absolutely insane way of doing platform development, but it's something that follows directly from the incentives of the two parties. How anyone at X thought that anything good would come out of this, I don't know.

But never mind that. Our initial project for taking over the compiler maintenance was well funded, and vague enough that it was easy to argue that proving capability of shipping some kind of improvements is a core deliverable. We could at least proceed with the only improvement anyone had shown any interest in.

So I implemented a new peephole optimizer stage for the segment registers, and even got the code reviewed by the original authors of the compiler when they came over from Z to give us a training session. It seemed to work, but as mentioned above we didn't have a test suite and building one would take a long time and a lot of work. (Excellent! We can propose that as a project later!).

We couldn't even run any of the production code since that would require the ingenious in-house operating system. The only way to get any performance numbers and confidence in the changes being correct would be to schedule a load test in X's test lab. Unfortunately weeks and weeks of discussion over that never got us both the lab time and the people from their side who would have been needed. It's of course understandable; whether these compiler changes got released or not wouldn't make a difference to these people, who had their own actual work to do. But it also made it very hard to see how we could ship this change. The justification would be improved performance, but with no numbers it'd be a hollow claim.

That's when it dawned on me that there was never going to be any real compiler work there. These special compilers would not really matter once X migrated away from the custom OS, which would have to happen. Oh, sure it'd need to be "maintained" just in case a customer running 20 year old code needed a bugfix. Given the dysfunctional processes, it seemed pretty clear that the costs for any improvements would be massive in the short active development life these systems had remaining. They'd probably spent a seven figure sum on this project as insurance, but actually doing something with the code? No way.

All of this infrastructure was just going to be on life support while it was being replaced by new systems that would in turn be obsoleted in five years. But nothing would ever actually go away, all this cruft would just accumulate and accumulate, nominally supported for ever. And you'd have these extremely good engineers doing this completely insane work, having been moved working from a prestigious high tech company to a despised consulting firm.

And how do you even get out of that job? I imagined myself in a job interview in 2010, trying to explain how useful my extensive knowledge of Xenix, PL/M build systems, and VMS would be to my prospective new employer. There might be a time when you just stop keeping up with tech, but doing it in your 20s is really not that time :)

Coda

So I quit without arranging for another job first, assuming that something would probably turn up. In an amazing display of serendipity, during my notice period ITA Software posted to the SBCL mailing list that they wanted to pay somebody to work on SBCL improvements for them, which was pretty much my dream gig at the time [4]. Perfect timing.

Ok, that's all. You can now proceed with the one-upping with stories of developing new production software on a physical IBM 1401 in this millennium, or something ;-)

Footnotes

[0] I don't know the outcome of that evaluation.

[1] No, transpiling is not a word no matter how much you people try to make it one.

[2] And that leads us to the question of whether Solaris is really what you want to be migrating to in 2005.

[3] Wait?! There are two machines in the world that you think can be used to build this software, and we only bought one of them? "Oh, yes. It would have been a pretty good idea to buy the second one as a spare. Let's do that now!"

[4] Of course if one worry with the job at Y was that I'd be unemployable due to only having worked on boring and obsolete technology, one might wonder about the long term career prospects in Common Lisp compilers. But look, in 2006 CL was really going places!