Juho Snellman's Weblog

Numbers and tagged pointers in early Lisp implementations

jsnell@iki.fi — Mon, 04 Sep 2017 15:00:00 GMT

There was a bit of discussion on HN about data representations in dynamic languages, and specifically having values that are either pointers or immediate data, with the two cases being distinguished by use of tag bits in the pointer value:

If there's one takeway/point of interest that I'd recommend looking at, it's the novel way that Ruby shares a pointer value between actual pointers to memory and special "immediate" values that simply occupy the pointer value itself [1].
This is usual in Lisp (compilers/implementations) and i wouldn't be surprised if it was invented on the seventies once large (i.e. 36-bit long) registers were available.

I was going to nitpick a bit with the following:

The core claim here is correct; embedding small immediates inside pointers is not a novel technique. It's a good guess that it was first used in Lisp systems. But it can't be the case that its invention is tied into large word sizes, those were in wide use well before Lisp existed. (The early Lisps mostly ran on 36 bit computers.)

It seems more likely that this was tied into the general migration from word-addressing to byte-addressing. Due to alignment constraints, byte-addressed pointers to word-sized objects will always have unused bits around. It's harder to arrange for that with a word-addressed system.

But the latter part of that was speculation, maybe I should try to check the facts first before being tediously pedantic? Good call, since that speculation was wrong. Let's take a tour through some early Lisp implementations, and look at how they represented data in general, and numbers in particular.

The problem with integers
LISP I
LISP 1.5
Basic PDP-1 LISP
M 460 LISP
PDP-6 LISP
BBN LISP
Conclusion

The problem with integers

Before we get started, let's state the problem that tagged pointers solve. In a dynamically typed programming language, the language implementation must be able to distinguish between values of different types. The obvious implementation is boxing; all values are treated as blobs of memory allocated somewhere on the heap, with an envelope containing metadata such as the type and (maybe) the size of the object.

But this means that integers now have tons of overhead. They use up heap space, need to be garbage collected, and new memory needs to be constantly allocated for the results of arithmetic operations. Since integers are so critical to almost all kinds of computing, it would be great to minimize the overhead. And ultimately, to eliminate the overhead completely by encoding small integers as recognizably invalid pointers.

LISP I

I wasn't super hopeful about finding out exactly what numbers looked like in the original Lisp implementation. As far as I know, the source code hasn't been preserved. Now, the original paper describing Lisp ( Recursive Functions of Symbolic Expressions and their Computation by Machine, Part I ) isn't quite as theoretical as the title suggests. For example it describes the memory allocator and garbage collector on a reasonable systems level. But it doesn't mention numbers at all; this is a system for symbolic computation, so numbers might as well not exist.

The LISP I Programmer's Manual from 1960 is more illuminating, though not entirely consistent. In one place the manual claims that LISP I only supports floats, and you'll need to wait until LISP II to use integers. But the rest of the document happily describes the exact memory layout of integers, so who can tell.

A floating point value looks like this:

Let's say we have the value 1.0 in a LISP I program. This value is actually pointer to a word. How do we know what the type of the pointed to word is? If the upper half of that word is -1, it's a symbol. Otherwise it's a cons. (The use of -1.0 and 1.0 as the example floats in this picture is unfortunate, since it looks like the -1.0 and -1 are somehow related. That's not the case, -1 is the universal tag value for atoms, and independent of the exact floating point values.)

So the number 1.0 is a symbol? Technically yes, since at this stage of Lisp's evolution everything is either a symbol or a cons. There are no other atoms. We can find out if the symbol represents a number by following the linked list starting from the cdr of the symbol (a pointer stored in the lower half of the word). If we find the symbol NUMB on the list, it's some kind of number. If we find the symbol FLO, it's a floating point number, and the property list will be pointing to a word that contains the raw floating point value that this number represents.

There's a detail here that's kind of amazing. Notice that 1.0 and -1.0 share the same list structure. The only difference is that -1.0 has the symbol MINUS in the list, after which the list merges with the list of 1.0. What a fabulously inefficient representation! Not only do you have to do a bunch of pointer chasing just to find the actual value of a number, but then you'll get to do it again to find out the sign!

The question I can't answer just from reading this document is how exactly the raw floating point value is handled. Surely the garbage collector must know not to interpret those raw bits as pointer data? There is a very detailed example of the memory layout for an integer on pages 94-95, but even with that example I just don't see where the type information is stored. It's clearly not based on address ranges (the raw values are mixed in with the other words), nor the pointer value (all the pointers are stored as 2's complement), nor the 6 unused bits in the machine word.

Suggestions welcome. My best guess is that the example is inaccurate.

LISP 1.5

The LISP 1.5 Programmer's Manual from 1962 explains in a very concise manner how numbers worked in that implementation:

Numbers are still considered to be symbols, and symbols are still marked with -1 as the car. But the standard symbol property list is now gone; instead the symbol is pointing directly to the memory that stores the raw integer value. How does the program know not to follow that pointer as a list? As the document says, that's specified by "certain bits in the tag".

The tag? What's the tag? The IBM 704 had a 36-bit word size but just a 15 bit address space. The words were split (on the ISA level) into a 3 bit "prefix", 15 bit "address", 3 bit "tag", and 15 bit "decrement". Since Lisp values are pointers, only the two 15 bit regions are useful for that. One of the 3 bit regions has been repurposed by the Lisp implementation to mark the pointers to raw data.

This is a clear improvement over LISP I, but a number is still represented as an untagged pointer to a tagged pointer to the raw value. Why is the intermediate word there at all, why not go directly with a tagged pointer to the raw value? Maybe code size?

In parallel to that, the address space has now been split into multiple separate pieces, with the cons cells being allocated from a different range of addresses than plain data like numbers and string segments. It could well be that the tagged pointer is irrelevant to the GC, which just makes its decisions on what's a pointer based on whether the pointer is contained in the "full word space" or the "free space". The tags would then be used just for implementing NUMBERP.

Basic PDP-1 LISP

For a L. Peter Deutsch joint, The LISP implementation for the PDP-1 Computer proves to be a surprisingly unsatisfying document. It's almost exclusively user documentation, with no information on the systems architecture. Well, except a full source code listing. Guess we'll have to look at that, then. NUMBERP is the easiest starting point:

/// ("is a number")
/NUMBERP
nmp,    lac i 100
        and (jmp
        sad (jmp
        jmp tru
        jmp fal

The main thing that need to be known from the rest of the code is that the interpreter stores a pointer to the Lisp value that's currently operated on value at address 100 (octal).

First "lac i 100" follows the pointer to read the first data words of the value into the accumulator. The next line looks bizarre; due to the way the PDP-1 macro-assembler works, "and (jmp" effectively means "and 600000". So this instruction is masking away all but the top two bits of the accumulator, and "sad (jmp" is checking whether the result of the masking equals octal 600000. It appears that there is nothing special about the pointer to a number, but numbers are identified by having the top two bits set in the pointed-to value.

The next step in understanding the layout is the code for reading the raw value of a number.

/get numeric value
vag,    lio i 100
        cla
        rcl 2s
        sas (3
        jmp qi3
        idx 100
        lac i 100
        rcl 8s
        rcl 8s
        jmp x

"lio i 100" loads the current Lisp value into the IO register. "cla" sets the accumulator to zero. "rcl 2s" then rotates the combination of the IO register and accumulator by 2 bits. The accumulator now contains as its low bits the previous high two bits of the IO register. "sas (3" compares the accumulator to 3; if they're not equal we jump to qi3 (the error routine for "non-numeric arg for arith"). "idx 100" moves the pointer to the next word of the value, and "lac i 100" reads that word into the accumulator. And finally the combination of the two registers is rotated by 16 bits, so that we end up with the raw 18 bit value in the accumulator. Written out step by step the process looks like this:

    . == Bit with value of 0
    ! == Bit with value of 1
    ? == Bit with unknown value
    0-9, A-H == bits of the integer value

    X                    X+1
------------------------------------------------
    [!!23456789ABCDEFGH] [................01]

    IO                   AC
------------------------------------------------
Load IO from address X
    [!!23456789ABCDEFGH] [??????????????????]
Clear AC
    [!!23456789ABCDEFGH] [..................]
Rotate left by 2
    [23456789ABCDEFGH..] [................!!]
Check AC == 3
Load AC from address X+1
    [23456789ABCDEFGH..] [................01]
Rotate left by 8
    [ABCDEFGH..........] [........0123456789]
Rotate left by 8
    [..................] [0123456789ABCDEFGH]

Clearly an integer is now represented by a pointer to two words that has a special tag in the high bits of the first word. This implementation got rid of the extra layer of indirection in LISP 1.5; an integer is now just a pointer to tagged data. But we're still left with the storage of a one-word integer requiring three words.

Why use a layout that requires shuffling data around this much, instead of just having the tag in X and the raw value in X+1? It seems awfully inconvenient. My best guess is that the top 1-2 bits of the second word are reserved for the GC, e.g. for use as mark bits. But understanding exactly how the GC works is maybe a project for another day.

M 460 LISP

Before starting research for this article, I'd never heard of the early Lisp implementation for the Univac M 460. A description of the system can be found in the 1964 collection The programming language LISP: Its operation and applications .

Numbers and print names are placed in free storage using the device that sufficiently small (i.e., less than 2^10) half-word quantities appear to point into the bit table area and so don't cause the garbage collector any trouble. A number is stored as a list of words (a flag-word and from 1 to 3 number words, as required), each number word containing in its CAR part 10 significant bits and sign. Thus an integer whose absolute value is less than 2^11 will occupy the same amount of storage (2 words) as in 7090 LISP 1.5.

This is another bit of progress! The key insight on the road to tagged pointers is that invalid parts of the address space can be used to distinguish between pointers and immediate data. Another important insight in this paper is that most numbers in a program are going to be small, so it might make sense to have variable representations for numbers of different magnitude. But it's not a full realization of the concept yet, immediate small numbers are not accessible directly by the user. They are internal to the implementation, used as a building block for boxed integers of various levels of inefficiency.

The paper gets even better once we get a few more pages in, since for characters M 460 Lisp does take that final step:

Each character in the character set available on the M 460 (including tab, carriage return, and others) is represented internally by an 8-bit code (6 bits for the character (up to case), 1 bit for case, and 1 bit for color). To facilitate the manipulation of character strings within our LISP system, we permit such character literals to appear in list structure as if they were atoms, i.e. pointers to property lists. These literals can, where necessary, be distinguished from atoms since they are less than 2^8 in magnitude and hence, viewed as pointers, don't point into free storage (where, as in 7090 LISP, property lists are stored). The predicate charp simply makes this magnitude test.

That's about as clear a case of using embedding immediate data in pointers as it gets. It's just that the tag is rather large (22 highest bits, rather than the 1-4 lowest bits you'd expect today). And it's also dealing with characters rather than numbers, so let's carry on with the investigation a bit longer.

PDP-6 LISP

The June 1966 report on PDP-6 LISP has the following to say on integers:

Fixed-point numbers >= 0 and < about 4000 are represented by a "pointer" 1 greater than their value, and no additional list structure. All other numbers use a pointer to full-word space as part of an atom header with a FIXNUM or FLONUM indicator.

This is starting to get close to the modern fixnum, except for no facility for immediate negative numbers and a tiny range. (This is a machine with 36 bit words and 18 bit pointers; one would hope for a bit more than 12 bits for immediate integers).

BBN LISP

Structure of a LISP system using two-level storage is a wonderful systems design paper from November 1966, describing BBN LISP for a PDP-1 with 16K of core memory, 88K of absurdly slow drum memory, and no hardware paging support. How do you make efficient use of the drum memory? By some clever data layout, software-driven paging, and a locality-optimizing memory allocator.

So it's actually a paper I thought was totally worth reading just for its own sake. But for the purposes of this post, this is the money quote:

LISP assumes that it is operating in an environment containing 128K words, that is from 0 to 400,000 octal. Only 88K actually exist on the drum. The remaining portion of the address space is used for representation of small integers between -32,767 and 32,767 (offset by 300,000 octal), as described below.

The paper describes a machine with both an 18-bit word size and address space, with 16-bit signed fixnums embedded in the pointers. That's about as good as it gets. (Though not quite optimal; they're using bit 17 as the integer tag, but what happened to bit 18? The paper doesn't say, but odds are that it's again a GC mark bit).

The particularly observant reader might have noticed that this machine had 104K words of physical memory, but the described tagging scheme only leaves 64K words addressable. What's up with that? On one level it's exactly what M 460 LISP and PDP-6 Lisp were doing: that 40K of address space stores things that can't be directly pointed to from another Lisp value. But those other implementations were just opportunistically reusing the parts of address space that contained native code.

By contrast, BBN LISP carefully arranged for there to exist as much of such storage as possible, and for it to be located above the address 200,000 (octal).

The most clever example of that is the representation of symbols. The first implementations we saw just implemented symbols as a list of properties indexed by name (e.g. name, value cell, function cell, etc). An obvious optimization is to allocate a symbol as a single larger block of memory with fixed slots for the most common properties, and a generic property list slot to contain anything else.

What BBN Lisp does instead is allocate a symbol in multiple separate blocks rather than a single contiguous one. A pointer to the symbol will point to the block of value cells, so reading the value cell is trivial. What if you want to read another property, e.g. the function? We look at the offset of the value cell pointer to the start of the value cell block, and access the function cell block at the same offset. In modern parlance it ends up as an structure-of-arrays layout rather than an array-of-structures.

In addition to getting more address space for fixnums, they also got exactly the same kind of locality improvements that an structure-of-arrays would be used for today. So it was an all-around neat optimization.

There is also an early design document for BBN 940 LISP from almost the same time as the above paper. It appears to describe the kind of elaborate tagging scheme that a modern Lisp might use, and places the tags in the low bits where they're easier to test for/eliminate. And they even call heap-allocated numbers "boxed"! I had no idea this terminology was in use 50 years ago. The relevant section:

There will be a maximum of 16 pointer types of objects in the 940 LISP System. These are (numbered in octal)

00. S-expressions (nonatomic)
01. Identifiers (literal atoms)
02. Small Integers
03. Boxed Large Integers
04. Boxed Floating Point Numbers
05. Compiled Function - Lambda Type
06. Compiled Function - Lambda Type - Indef Args
07. Compiled Function - Mu Type - Args Paired
10. Compiled Function - Mu Type - List of Args
11. Compiled Function - Macro
12. Array - Pointers
13. Array - Integers
14. Array - FP #s
15. Strings - Packed Character Arrays
16.
17. Pushdown List Pointers

Each pointer will be contained in one 940 word of 24 bits. Bits 0 and 1 will be nominally empty, and may in some cases be used by the system (e.g. bit 0 for garbage collection) or perhaps even the user (in S-expressions). The four bits 2-5 will contain the type number for this pointer. The 18 bits 6-23 will contain an effective address (in the LISP drum file) where the referenced information is stored.

It looks like they ended up not using this design for BBN 940 LISP, and it instead uses an extended version of the segmented memory scheme from the PDP-1 implementation described earlier in this section. But even if these particular bits weren't practical to use with that hardware, at this point just about all the ideas for tagged pointers have definitely been invented.

Conclusion

The initial LISP I implementation in 1960 had the least efficient implementation of numbers this side of church numerals, where even just getting the value might imply chasing half a dozen pointers. But new implementations optimized that layout aggressively. By 1964, the M 460 LISP implementation had arrived at the general solution of using pointers to invalid parts of the address space for storing immediate data, but user-accessible integers were still boxed; the only use for the unboxed integers was as an internal building block. In 1966 PDP-6 LISP applied the idea of tagged immediate data to tiny positive integers, and the PDP-1 based BBN LISP took the idea to the logical conclusion, and allowed immediate storage of integers of almost the full machine word.

I would not have guessed that these optimizations were discovered and applied so early and so aggressively. It's also noteworthy that this was independent of both the machine word size, address space size, and addressing mode of the machine. The first fully fledged implementation I found was on a machine with 18 bit words, 18 bits of address space, and word-addressing. That should have been just about the worst case!

There's an interesting tangent with how MacLISP ended up reversing this progress in the '70s and going back to boxed integers, since they wanted to have just a single integer representation. I won't go into the details since this post already grew longer than intended. But for those interested in the subject AI Memo 421 is a fun read.

Was the technique definitely first used in Lisp? These implementations are early enough that there aren't a ton of other possibilities. The only ones I can think of would be APL and Dartmouth BASIC. If anyone can find documentation on earlier uses of storing immediate data in tagged pointers, please let me know and I'll edit the article.

Use cases for CHANGE-CLASS in Common Lisp

jsnell@iki.fi — Mon, 27 Jul 2015 21:00:00 GMT

This is a post on use cases for Common Lisp's CHANGE-CLASS operation [0]. As the name suggests, it changes the class of an object without changing its object identity. It's an operation that a certain class of programmers would consider totally abhorrent. I think it's both cool and useful.

As far as I an see, the class of an instance has three effects in Common Lisp. It determines the set of slots the object has, it determines which methods will be executed when a generic function is called with that object as one of the arguments, and it determines how the object interacts with the rest of the system based on the metaclass of the class of the object.

Why change the class at runtime?

Why would you change the class of an object rather than create a new object as a replacement? Because there might be references to the object all over the place, and updating all of those references to point to the new object might be a lot of work or even impossible.

And why not just create the object with the right class in the first place? Sometimes it's because the object is not created by the application code but in the depths of some library, and changing that library is not feasible. At other times it's because the appropriate class of the object genuinely changes during execution.

And why not use a workaround like some kind of a delegating proxy object instead? Both of the above reasons kind of apply there.

Adding new slots to a class

I recently had to go and change something in the code that runs my blog, for the first time in years and years. Now, this is some crufy code. How crufty, you ask? Well... It runs a web server that was last updated 10 years ago [1]. While spelunking through the code, I found a bit of code that looked essentially like this:

  (defclass blog-request (araneida:request)
    ((db :initarg :db :accessor db-of)
     (buffer-stream :initarg :buffer-stream :accessor buffer-stream-of)))

  (defmethod araneida:handle-request :around ((handler blog-handler) request)
    (clsql:with-database (db *db-spec* :if-exists :new)
      (let ((string (with-output-to-string (stream)
                     (change-class request 'blog-request
                                   :db db
                                   :buffer-stream stream)
                     (call-next-method))))
         (write-string string (araneida:request-stream request)))))

What's going on here? Well, we're hooking around the HANDLE-REQUEST generic function of the web server, setting up a couple of state objects (a database connection, a STRING-OUTPUT-STREAM). We then proceed with normal execution of the request handling with CALL-NEXT-METHOD, and write the data in that's been buffered in the STRING-OUTPUT-STREAM into the normal output stream.

The core problem here is that the state data needs to be threaded down the call stack to where it's actually used. Since we're doing all of this from the middle of third party code, changing the function signatures is not an option. so we change the class of the web server's request object from REQUEST to BLOG-REQUEST (a subclass), and stuff the state objects into the slots that have now appeared in the object.

The natural way of writing this in Common Lisp would probably be to use special variables [2]. I think the reason I didn't go that route was that way back when I was not running each request in a separate thread, but was using SERVE-EVENT, SBCL's rather bizarre recursive event loop which really doesn't play together well with special variables. But it's also not always the case that the lifetime of the additional data is determined by a particular dynamic extent.

Another typical solution for attaching extra data to an object would be storing the extra information in a weak-keyed hash table with the objects as a key, and making that hash-table accessible in all of the places where this extra data is needed (most likely as a global variable). As far as I'm concerned, that's just gross.

Is there a converse situation where you'd want to CHANGE-CLASS to remove some slots from an object? I can't really think of a plausible case. It might be a side effect from changing the object to be an instance of a class that isn't a sublass of the original class. But never the actual goal since the amount of memory you'd save from having fewer slots would be miniscule.

Modifying method dispatch

A more obvious use for CHANGE-CLASS is a need to manipulate method dispatch. An example I like for this is the intermediate data representation of a compiler. Consider the representation of a variable binding. The binding could be for example constant vs. modified, or totally local lexical binding vs lexical binding closed over by a function vs. dynamic binding.

The compiler is going to need to treat the binding objects of intermediate representation in very different ways depending on the exactly what kind of variable this is. A variable binding that's never changed and that's known to contain an immutable value can be trivially constant-folded. A closed over and potentially modified variable will need a some extra code to allocate some memory in which the variable is stored. And the code that's generated for any of the variable references (both reads and writes) will need to be different as well.

Now, the funny thing is that these binding objects can change their state multiple times during compilation. If dead code elimination ends up removing the last read from a variable, that binding becomes dead. A variable can bounce from non-closed over to closed over as a closure is discovered, back to non-closed as it's proven that the closure can't escape after all. And so on. It'd just be infeasible to generate all of the objects with the correct class up front. And these objects are going to be referenced willy-nilly from all over the IR tree, making replacing references truly annoying.

One way of representing this is to have a bunch of state flags in the compiler's binding objects. But then you have to implement the specializations a bunch of conditionals in large functions, rather than by having the specialized behavior in separate methods and relying on method dispatch to sort things out. I know which form of organizing code I prefer [3].

Switching metaclasses

Using CHANGE-CLASS to switch the class to a hierarchy that's based on a different metaclass is where I start drawing a blank. Unlike the other two cases I haven't ever felt the need to do that myself so it's harder to spin a convincing story. The best I can do is go through some typical uses of non-standard metaclasses, and think about whether there could be any reason to change between them and normal classes. Here's some broad categories of features you could change:

Changing the slot storage representation
Changing slot access in some other way
Changing the code generated for accessors
Adding new metadata to slot definitions or class definitions
Changing the method dispatch resolution in some fundamental way, for example using C3 class hierarchy linearization

And as for how you'd achieve something useful with one of these features:

Perhaps the most prototypical use of metaclasses is persistent objects - for example an object-relational mapping library, but it could be a real object database too. Why do you need a custom metaclass for this?

One reasons is lazy initialization of some or all slots. When you load an object from the database, you don't necessarily want to load all the data up front. Some of it might trigger the loading of arbitrarily deep graphs of other persistent object, which is expensive. To do this you only want to fetch the value of these slots when their value is read the first time. This doesn't look like a compelling case for CHANGE-CLASS; why would we change our already existing fully initialized instance into one of these lazily initialized objects?

Alternatively you might want to attach extra information to slot descriptors for describing how the data is to be persisted. What's the SQL datatype of this field? Is it part of the primary key? are there any foreign key constraints? It'd definitely be reasonable to CHANGE-CLASS on instance of USER to USER* given the following definitions:

  (defclass user ()
    ((uid :accessor uid-of :initarg :uid)
     (username :accessor username-of :initarg :username)
     (password-hash :accessor password-hash-of :initarg :password-hash)))

  (defclass user* ()
    ((uid :accessor uid-of :initarg :uid :primary-key t :sql-datatype 'integer)
     (username :accessor username-of :initarg :username
               :unique t :sql-datatype 'text)
     (password-hash :accessor password-hash-of :initarg :password-hash
                    :sql-datatype 'text))
    (:metaclass db-object)
    (:sql-table "user"))

But... This doesn't explain why you'd ever end up with a USER instead of a USER* in the first place. Persisting objects that you didn't create but that were injected to your program by a library seems very odd.

Another textbook example of non-standard metaclasses are alternative slot representations. Instead of an instance being essentially a vector of slot values, it could be a hash-table mapping slot names to values. The benefit here would be more space-efficient storage for sparse objects; a class with hundreds of slots most of which never get initialized. Could you want to swap back and forth between the normal and the sparse representation? Maybe, but then you'd just implement a metaclass that automatically chooses the right representation and switches between them behind the scenes. There's no point in forcing the user to switch between the representations manually. This doesn't feel plausible either.

One last try, Pascal Costanza's ContextL library extends the support for dynamic binding in the language, allowing not only dynamically binding variables but also doing it for functions (including autogenerated accessor methods) and slot values. The way this kind of extension would be implemented in a threadsafe manner is by indirecting the function calls and slot accesses through a special variable. Which is to say the slot access protocol needs to be reimplemented, and maybe the autogeneration of accessors too. Obviously this needs a new metaclass!

And could you want to change a standard object to one supporting dynamic binding? That's actually pretty plausible. A framework injects some object whose behavior you need to customize on a very fine-grained level. Dynamically scoped functions seem like a good tool for that. But it's still pretty hand-wavey.

Does anyone have a more concrete example for using CHANGE-CLASS primarily on order to switch to a different metaclass?

Footnotes

[0] Analogous operations are of course available in a bunch of languages. The key difference to my mind is that in Common Lisp, as with many other dynamic features, there's a well-defined protocol for customizing exactly how the feature works and how it's configured and extended.

[1] Woo, Araneida for the win! Old code never dies.

[2] Where "special variable" is actually a technical concept, perhaps one of the worst named ones in the world :-) Essentially a thread-local dynamically scoped global variable, though there's a couple of extra warts.

[3] Fans of statically typed functional programming languages with pattern matching obviously have the opposite preference. What I find interesting is that a lot of the impetus for CHANGE-CLASS comes from wanting to preserve object identity and not needing to update all the references to the object. The first is a non-issue in functional programming, the second is something you need to do anyway.

A Monte Carlo simulation of Red7

jsnell@iki.fi — Mon, 30 Mar 2015 14:00:00 GMT

Red7 is a very clever little card game, and one of my favorite 2014 releases. But I have wondered about the density of meaningful decisions in the game. Sometimes it doesn't feel like you have all that much agency, and are just hanging on in the game with a single valid move every time it's your turn.

So here's some automated exploration of what a game of Red7 actually looks like from a statistical point of view. The method used here is a pure Monte Carlo simulation, with the players choosing randomly from the set of their valid moves.

Why a Monte Carlo simulation? I started trying to do a full game tree for a given starting setup but to my surprise the game tree is actually too large for that to be feasible; 2 weeks of computation even for a single two player game and a lot of optimization. The branching factor is just much bigger than it feels like when playing the game.

The rules

(Skip this section if you're already familiar with the game. All you need to know is that we're using the advanced version of the game but without the optional special action rules.)

The rules of the game are very simple. There's a deck of 49 cards (7 colors, numbers 1-7 in each color). In the middle is a discard pile ("canvas"). The color topmost card of the discard pile determines the victory condition. You must be "winning" at the end of each turn you take, or you're out of the game.

There are three options to choose from on your turn. Play a card from your hand to the table in front of you (your "palette"), discard a card from your hand to the canvas, or first play a card and then discard a card. If you discard a card with a number higher than the number of cards in your palette, you get to draw a card.

The winning condition is determined based on the color of the canvas (i.e. top card in discard pile):

Red	Highest card
Orange	Most cards of the same number
Yellow	Most cards of the same color
Green	Most even cards
Blue	Most different colors
Indigo	Longest run of sequential numbers (e.g. 4/5/6)
Violet	Most cards with a number lower than 4

If two players are tied for the winning condition (e.g. the rule is blue and both of them have three even cards in their palette), the winner is the player who had a higher card included in their card combination (cards that didn't contribute to the winning condition are ignored for the tie breaker). This is primarily based on the numeric value of the card. But if two cards have the same value, the one closer to red in the spectrum wins the tie (e.g. green 5 > indigo 5 > green 4).

The implementation

(Ignore this section if you're not interested in the programming, and skip straight on to the results).

I suspect that every Common Lisp program will eventually evolve to using a clever bit-packing of fixnums as its primary data structure. That's the case here as well.

Cards

A card is an integer between 0 and 55 (inclusive). The low 3 bits are the color, with a 0 being a dummy color that's not used for anything, 1 for violet going all the way to 7 for red. The next 3 bits are the card's numeric value minus one (0-6). Note that with this representation determining the higher of two cards is simply a matter of making an integer comparison.

(deftype card () '(mod 56))

(defun card-color (card)
  (ldb (byte 3 0) card))

(defun card-value (card)
  (1+ (ash card -3)))

We'll also need a way to represent a set of cards, for a player's hand or palette. We're going to use a 56-bit integer for that, with bit X being 1 if the set contains card X.

(deftype card-set () '(unsigned-byte 56))

Adding and removing cards is simple. (Except how annoying is it that SETF LOGBITP is not specified in the standard?).

(defun remove-card (card card-set)
  (logandc2 card-set (ash 1 card)))

(defun add-card (card card-set)
  (logior card-set (ash 1 card)))

;; Create a new set from a list of cards.
(defun make-card-set (cards)
  (reduce #'add-card cards))

We'll also need to be able to iterate through all the cards in a set. This is most easily achieved by using INTEGER-LENGTH to find the highest bit currently set, executing the loop body, clearing out the highest bit, and carrying on.

(defmacro do-cards ((card card-set) &body body)
  (let ((modified-set (gensym)))
    `(loop with ,modified-set of-type card-set = ,card-set
           until (zerop ,modified-set)
           for ,card = (1- (integer-length ,modified-set))
           do (setf ,modified-set (remove-card ,card ,modified-set))
           do ,@body)))

Scoring

With these primitives we can then write a very fast function to determine who is currently winning the game. We'll base this evaluation function on scoring a combination of a palette + rule, and comparing the score that each player gets with the current rule. This is a much better way than trying to directly compare the palettes. If you're caching this evaluation function, you get a much higher cache hit rate when the cache key depends only on the state of one player rather than a combined state of two players. (I'm also pretty sure that given this data layout, computing a score will be faster than any kind of direct comparison).

Let's start off with the general structure, and fill in the details as functions under LABELS afterwards. So given a card-set and a color, we'll return a score for that set:

(defun card-set-score (card-set type)
  (labels (...)
    (ecase type
      (7 (red))
      (6 (orange))
      (5 (yellow))
      (4 (green))
      (3 (blue))
      (2 (indigo))
      (1 (violet)))))

Red (highest card) is trivial. We just find the highest card in the set with a call to INTEGER-LENGTH.

           (red ()
             (integer-length card-set))

For other rules we can make good use of the following helper function. It matches the set against a bitmask, and returns a score based on the number of bits that are set both in the set and the mask (main part of score) which we get with LOGCOUNT, as well as the highest bit set in both (the tiebreaker). Given this definition, most of the scoring types can be written in a very concise manner:

           (score-for-mask (mask)
             (let ((matching-cards (logand card-set mask)))
               (let ((matching-cards (logcount matching-cards))
                     (best-matching-card (integer-length matching-cards)))
                 (+ best-matching-card (* 64 matching-cards)))))

For orange (cards of one number) we start with a bitmask that matches all bits corresponding to a card with the value 7. We compute the score for that mask, then shift the mask right by 8 bits such that it covers the cards with the value 6. Repeat 7 times, and find the maximum score. (We don't need to know which iteration produced the highest score, only what the score was).

           (orange ()
             (loop for mask = #xff000000000000 then (ash mask -8)
                   repeat 7
                   maximize (score-for-mask mask)))

Yellow (most cards with the same number) is very similar. We start off with a bitmask that matches all the red cards (so bit 55, 47, 39, etc) and compute the score. Then shift it right by one, such that the mask matches all orange cards instead. Again repeat 7 times and maximize.

           (yellow ()
             (loop for mask = #x80808080808080 then (ash mask -1)
                   repeat 7
                   maximize (score-for-mask mask)))

Green (most even cards) and violet (most cards under 4) are trivial; we can just score a single mask matching the even cards for green, all cards of value 1, 2 or 3 for violet.

           (green ()
             (score-for-mask #x00ff00ff00ff00))
           (violet ()
             (score-for-mask #x00000000ffffff))

Blue (most cards of different colors) is where we get into unintuitive territory. Let's start with the tiebreaker; it's obviously guaranteed that he highest card in the palette as a whole can be included in this winning set, so we can just use INTEGER-LENGTH on the whole set the same way we did for the red scoring rule.

To get the number of different colors, we will fold the cardset multiple times. First we'll do a bitwise OR of the high 32 bits and the low 32 bits. Then we'll take OR bits 0-15 of that result with bits 16-31. And finally one more OR of bits 0-7 with 8-15. The low 8 bits are now such that bit 7 is set if any of the "red" bits in the original were set, bit 6 if any of the "orange" bits, etc. We can then just use LOGCOUNT on that byte to get the number of colors present in the palette, and combine it together with the tiebreaker score computed above.

           (blue ()
             (let* ((palette card-set)
                    (best-card (integer-length palette)))
               (setf palette (logior palette (ash palette -32)))
               (setf palette (logior palette (ash palette -16)))
               (setf palette (logior palette (ash palette -8)))
               (+ best-card
                  (* 64 (logcount (ldb (byte 8 0) palette))))))

Finally, there's indigo (longest straight). There does not appear to be any clever bit manipulation trick to compute this quickly (if you can think of one, please let me know!). We need to iterate through the cards in order of descending value, ignore any consecutive cards with the same number, and reset our scoring computation when the straight gets interrupted by a missing number.

           (indigo ()
             (let ((prev nil)
                   (current-run-score 0)
                   (best-score 0))
               (declare (type (unsigned-byte 16) current-run-score best-score))
               (do-cards (card card-set)
                 (cond ((not prev)
                        (setf current-run-score card)
                        (setf prev card))
                       ((= (card-value card) (card-value prev)))
                       ((= (card-value card) (1- (card-value prev)))
                        (incf current-run-score 64)
                        (setf prev card))
                       (t
                        (setf current-run-score card)
                        (setf prev card)))
                 (setf best-score (max best-score current-run-score)))
               best-score))

Players

A player is defined as a normal structure, with the only oddity being that they form a circular linked list using the NEXT slot. This tends to be more convenient for iterating through players in turn order than keeping them stored in an external collection of some sort.

(defstruct (player)
  (id 0 :type (mod 5))
  eliminated
  (hand 0 :type card-set)
  (palette 0 :type card-set)
  (score-cache (make-array 16) :type (simple-vector 16))
  (next nil :type (or null player)))

The core operation of generating a list of valid moves is deciding whether the player is winning the game after those a move is made. When doing this we'll end up repeatedly evaluating the scores for the same palettes over and over again. To speed this up, there's a minimal cache; for each player / rule combination we store both the last palette we evaluated for that rule, as well as the score.

(defun player-score (player rule)
  (declare (type (mod 8) rule))
  (let* ((palette (player-palette player))
         (cache (player-score-cache player))
         (cached-key (aref cache rule)))
    (if (eql cached-key palette)
        (aref cache (+ rule 8))
        (progn
          (setf (aref cache rule) palette)
          (setf (aref cache (+ rule 8))
                (card-set-score palette rule))))))

Given that way to score a player against a rule, we can then check whether the current player is winning the game with the rule.

(defun player-is-winning (player rule)
  (loop with orig-player = player
        with orig-score of-type fixnum = (card-set-score player rule)
        for player = (player-next orig-player) then (player-next player)
        until (eql player orig-player)
        do (when (>= (the fixnum (player-score player rule))
                     orig-score)
             (return-from player-is-winning nil)))
  t)

We can then generate all valid moves by iterating through all the PLAY, PLAY+DISCARD, and DISCARD combinations for the player's current state, and collecting the ones result in the player winning.

(defun valid-moves (player current-rule)
  (let (valid-moves)
    (labels ((check-discard (play-card)
               (do-cards (discard-card (player-hand player))
                 (unless (or (eql play-card discard-card)
                             ;; Filter out cases where player discards a card
                             ;; without changing rule or gaining a new card.
                             (and (eql current-rule (card-color discard-card))
                                  (>= (logcount (player-palette player))
                                      (card-value discard-card))))
                   (when (player-is-winning player (card-color discard-card))
                     (push (cons (cons :play play-card)
                                 (cons :discard discard-card))
                           valid-moves)))))
             (check-plays ()
               (do-cards (play-card (player-hand player))
                 (setf (player-palette player)
                       (add-card play-card (player-palette player)))
                 (when (player-is-winning player current-rule)
                   (push (cons :play play-card) valid-moves))
                 (check-discard play-card)
                 (setf (player-palette player)
                       (remove-card play-card (player-palette player))))))
      (check-plays)
      (check-discard nil))
    valid-moves))

Other stuff

There's a little bit more code required to generate the scaffolding for a game, and to actually do the random walk through the game tree. None of that code is particularly interesting, nor are the INLINE or TYPE declarations that you'd need to sprinkle on the above code to make it fast. The full code is available on GitHub.

Performance

In the optimal case of trying to iterate through the whole game tree in a 2p game, the average cost of making a move is about 500 cycles, with my desktop doing 7 million moves per second. This is however amortizing the cost of computing the set of valid moves across all of those moves (since in a full search every valid move gets executed). If you're just doing a pure random walk with no backtracking, you'd get no amortization at all. That effect makes an order of magnitude difference.

But it's funny that the biggest profiler hotspot in the program is the PLAYER-SCORE function. Which, if you remember, will simply do an array lookup to get the previous cache key, compare it to the card-set that should be evaluated, and either return a previous result or call out to the real scoring function. The function does basically nothing, but it does nothing really often. When all of the things of substance are pretty fast as well, it's maybe not a surprise that the bottleneck ends up in a place like that.

Results

(Skip this section if you're not actually interested in the game, and just wanted to read some Common Lisp code).

The following results are computed from running simulations of 10k different initial setups, with 100k matches for each simulation with each player making random but valid moves. (So a total of one billion games). All plays were with 3 players, the only player count I consider worth playing.

As a sanity check, I ran a smaller simulation of 1000 initial setups where the players would not play a card + discard, if just playing that same card was sufficient to get into the lead without a discard. The results were very close to the large fully random simulation (e.g. the average game length was 14.6 instead of 14.1 turns, and the win percentage of the best turn order position was 39% rather than 40%).

Finally, an even smaller scale experiment had the AIs use move selection heuristics very similar to those I personally use when playing the game. Those results didn't differ materially from random play either.

Caveats

Unless stated otherwise, all of the numbers are from games with players making completely random moves. It is possible that the aggregate statistics are different when players consciously build toward palettes that are strong in multiple scoring rules, or strong in rules that they have a lot of cards in hand for.

The games are always played with the full deck, rather than in reality as the deck slowly depletes from hand to hand as cards are moved to the scoring piles of players.

Starting player effect

One thing I was curious about is whether the starting player has an advantage, a disadvantage, or neither. It's not obvious, since there are effects both ways.

The case for a disadvantage: Running out of cards means losing the game, and the all other things being equal the first player will also run out of cards first. Due to the way in which the player order is picked, the last player is also guaranteed to have the highest value starting card in their palette giving them a leg up on winning future tiebreakers.

The case for an advantage: The earlier in turn order a player is, the fewer cards the opponents have in their palettes. It's much easier to pass two players with one card each, than two players with two cards each. And this effect continues throughout the game, so it should accumulate over time.

It turns out that at least with undirected random play there's a major disadvantage to being first. It could be that the effect is smaller when players are making "good" moves.

Position	Win rate
1st	27.20%
2nd	32.42%
3rd	40.37%

Number of possible moves

Like mentioned above, the branching factor in the game was higher than I'd been expecting. There are cases where players have a lot more moves available than I would have expected.

The theoretical maximum number of options is 7 + 7 + 7 * 6 = 56, where a player can get in the lead either by discarding any of their cards, playing any of their cards, or with a combination of the two. This situation actually happened a total of 483986 times in 14 billion moves (0.03% of the time). A lot more common than I would have thought.

But of course we don't particularly care about the 0.03% case. The more common cases are more interesting. The following graph shows how often you have at least X moves available in the game.

For example, you can see that about a 1/3rd of the time a player had 10 or more options to choose from. It appears that the game is nowhere as constrained as I thought, even when playing without the special action rules.

Length of game

The average game lasted for 14.2 turns, which is perhaps less than I expected given 2 of those 14 turns were by definition a player just dropping out from the match.

There were some games that already ended on turn 4, which meant that only two cards were played in the game. That number was a mercifully low 0.01%. And while there were players who got eliminated before playing a card, there at least were no games ending in turn 2 or 3 even if that's theoretically possible. And a single game lasted all the way to turn 28.

The following graph shows how large a proportion of the games were still running on a given turn.

Effect of player decisions

The final question is about how strongly predetermined a single hand of Red7 is, and how much a player can affect it.

We've already established that at least with this skill level of play there's a very large start player advantage, but is that an isolated issue or does the setup matter even more than that. In these simulations all players are by definition equally skilled. If the end result of the game is primarily determined by player skill, you'd thus expect them to have similar win rates from game to game. So let's graph the distribution of per-setup win rates for each starting position:

Now, this graph is a little abstract since we're looking at probabilities of probabilities. The way to read this is that across those 10000 starting setups, the most common win percentage for player 1 (red) across the 100000 games in a specific setup was around 15% (the peak of the red line is at around 0.15). You can see that the later players in turn order have a graph that's shifted further to the right, which is what you'd expect when they have a substantially higher win percentage. But you can also see that from any starting position you might get absolutely dismal win rates (near 0) or very high win rates (over 80%). The ridiculously high win rates (95%) appear to be purely reserved for the player last in turn order.

There were two setups where a player didn't manage to win even a single match out of 100000 (in both cases that was player 1). In 25% of the cases the player with the worst chance of winning a setup had a 10% win rate or lower, in 7% of the cases a win rate of 5% or lower. It does appear that within a single hand of Red7, luck plays a massive role.

Out of all of the questions we've been looking at, this is of course the one where the applicability of a purely random search strategy is the most questionable. If we're investigating the effect of player skill, how can results from the least skillful play imaginable be relevant? I'm sympathetic to that argument, but before buying into it I'd really like to understand the mechanism by which one player is supposed to disproportionately benefit from the random play.

Also... As mentioned earlier, I also tried extending the AIs to be smarter about selecting each move. This was not based on any kind of lookahead, but simply the kinds of heuristics I'd usually use myself when playing the game. If I can get into the lead either by playing a card or discarding a card (without drawing a new one to replace it), I'd rather play a card since that's going to be useful on future rounds. When choosing which of two cards to play, I'd usually prefer to play the one that adds strength to more different scoring rules.

Experiments with one AI player getting use of these kinds of heuristics while the others played completely randomly did not show a big effect, the changes in the win rate were on the order of 1-2 percentage points.

Future work

I might be done with this little project, but if I pick it up again there's a couple of obvious directions to take this. Implementing the optional special action rules would be nice. That's my preferred form of the game anyway.

The more interesting one is to extend the current system to be a full AI using the Monte Carlo Tree Search approach. This would allow generating statistics based on "good" play of the game, maybe provide information on what kinds of moves are in general successful, as well as give a more conclusive answer to the level of skill the game has.

The tricky bit with evolving this code to a MCTS is that the system in the current form would allow the MCTS to exploit knowledge of future random events and hidden information. It would need to randomize all card draws (currently deterministic), as well as swap the opponents hands for random cards for the duration of the evaluation phase, and then swap the original deck and original hands back in for the move execution. That's going to slow down each individual move a lot, which is a problem when MCTS will intrinsically require computing several orders of magnitude more moves than a random walk.

Pretty SBCL backtraces

jsnell@iki.fi — Thu, 20 Dec 2007 00:00:00 GMT

Every now and then I see complaints about the stacktraces in SBCL. They contain too little info, or too much info, or are formatted the wrong way, etc. But the backtrace printing isn't really any dark magic, it's just basic Lisp code. If you don't like the default format, just write a new backtrace function that prints something prettier/less cluttered/more informative/etc.

For inspiration, below is one implementation, based on a really quick hack I wrote in answer to a c.l.l post a few weeks ago. In addition to cosmetic changes, it adds a a couple of extra features: printing filenames and line numbers for the frames when possible, and printing the values of local variables when possible. Just call backtrace-with-extra-info in any condition handler where you'd normally call sb-debug:backtrace, or call it from the debugger REPL instead of using the backtrace debugger command.

The code assumes that you've got Swank loaded. For best results, compile your code with (debug 2) or higher.

(defun backtrace-with-extra-info (&key (start 1) (end 20))
  (swank-backend::call-with-debugging-environment
   (lambda ()
     (loop for i from start to (length (swank-backend::compute-backtrace
                                        start end))
           do (ignore-errors (print-frame i))))))
(defun print-frame (i)
  (destructuring-bind (&key file position &allow-other-keys)
      (apply #'append
             (remove-if #'atom
                        (swank-backend:frame-source-location-for-emacs i)))
    (let* ((frame (swank-backend::nth-frame i))
           (line-number (find-line-position file position frame)))
      (format t "~2@a: ~s~%~
                   ~:[~*~;~:[~2:*    At ~a (unknown line)~*~%~;~
                             ~2:*    At ~a:~a~%~]~]~
                   ~:[~*~;    Local variables:~%~{      ~a = ~s~%~}~]"
              i
              (sb-debug::frame-call (swank-backend::nth-frame i))
              file line-number
              (swank-backend::frame-locals i)
              (mapcan (lambda (x)
                        ;; Filter out local variables whose variables we
                        ;; don't know
                        (unless (eql (getf x :value) :<not-available>)
                          (list (getf x :name) (getf x :value))))
                      (swank-backend::frame-locals i))))))
(defun find-line-position (file char-offset frame)
  ;; It would be nice if SBCL stored line number information in
  ;; addition to form path information by default Since it doesn't
  ;; we need to use Swank to map the source path to a character
  ;; offset, and then map the character offset to a line number
  (ignore-errors
   (let* ((location (sb-di::frame-code-location frame))
          (debug-source (sb-di::code-location-debug-source location))
          (line (with-open-file (stream file)
                  (1+ (loop repeat char-offset
                            count (eql (read-char stream) #\Newline))))))
     (format nil "~:[~a (file modified)~;~a~]"
             (= (file-write-date file)
                (sb-di::debug-source-created debug-source))
             line))))

For example on the following code:

(declaim (optimize debug))
(defun foo (x)
  (let ((y (+ x 3)))
    (backtrace)
    (backtrace-with-extra-info)
    (+ x y)))
(defmethod bar ((n fixnum) (y (eql 1)))
  (foo (+ y n)))

The old backtrace would look like:


1: (FOO 4)
2: ((SB-PCL::FAST-METHOD BAR (FIXNUM (EQL 1)))
    #<unused argument>
    #<unused argument>
    3
    1)
3: (SB-INT:SIMPLE-EVAL-IN-LEXENV (BAR 3 1) #<NULL-LEXENV>)

And the new backtrace like:

1: FOO
   At /tmp/testlisp:5
   Local variables:
     X = 4
     Y = 7
2: (SB-PCL::FAST-METHOD BAR (FIXNUM (EQL 1)))
   At /tmp/testlisp:8
   Local variables:
     N = 3
     Y = 1
3: SB-INT:SIMPLE-EVAL-IN-LEXENV
   At /scratch/src/sbcl/src/code/evallisp:93 (file modified)
   Local variables:
     ARG-0 = (BAR 3 1)
     ARG-1 = #<NULL-LEXENV>

An improvement? That's probably in the eye of the beholder, and depends on the codebase and the use cases. For example I can imagine that for large functions showing the values of local variables in the trace would make it way too spammy. But that's besides the point: if the default stacktrace format is making debugging difficult for you, it's not hard to customize it.

Faster SBCL hash-tables

jsnell@iki.fi — Mon, 01 Oct 2007 05:15:00 GMT

Long time, no blog. I have an excuse though, since I moved to Switzerland for a new job a month ago, and haven't had a lot of time for things like blogging or hacking Lisp (the latter is usually a prerequisite for the former for me).

Anyway, I finally finished and committed the third rewrite of my patch for speeding up the embarrassingly slow hash-tables in SBCL. It turned out to be a really frustrating game of whack-a-mole, with every change uncovering either some new deficiency or another interaction between the GC and the hash-tables that the old implementation had handled by always inhibiting GC during a hash-table operation.

The main user-visible change is that SBCL no longer does its own locking for hash-tables (the fact that it locked the tables was always just an implementation detail, not a part of the public interface). This follows the usual SBCL policy of requiring applications to do take care of locking when sharing data structures between threads.

The exact details are pretty boring, so I won't repeat them here (read the commit message if you really want to know). Instead I'm just going to post a pretty benchmark graph, since it's been way too long since I last did one of these:

Sadly those improvements don't mean that SBCL now has the fastest hash-tables in the West, it just means they don't completely suck. For some reason the issue of SBCL hash-table speed has come up more often during the last couple of months than during the previous three years combined, so it was probably time to get this sorted out.

ICFP 2007

jsnell@iki.fi — Wed, 25 Jul 2007 07:30:00 GMT

For the last five years or so it's always been my firm intent to take part in the programming contest associated with the International Conference on Functional Programming (ICFP). And each year something has prevented it. But this year there was no emergency at work, no computer hardware broke, no sisters were getting married, etc. So instead of playing poker on the net, which had been consuming all of my free time for the last couple of weeks, I read the 22 page spec and fired up emacs. (Just kidding, emacs was already running).

The surface task was to write an interpreter for a weird string-rewriting language. The organizers supplied a large blob of data, which when run through the interpreter would produce as output some image drawing operations (for which you basically had to write some kind of a visualizer if you wanted to achieve anything). The goal was to come up with some prefix to the program which would make it instead produce output that would be as close as possible to a certain target image.

The intended way to achieve that goal was to notice that the drawing operations generated by the blob would first write a clue message, which would then be hidden in the final image by other image operations. This seems like a really bad decision. I luckily noticed the message since my first version of the image conversion tool didn't support the flood fill operation. But apparently a lot of teams never saw the message, and were left to stumble in the dark for the whole weekend. The image that could be drawn by using the clue would then lead to another obscure puzzle. Again, I was lucky to figure out the solution after a while, but judging by IRC and mailing list traffic a huge amount of teams never got it, and were basically stuck.

That clue could then finally be used to produce some concrete details on how the big blob of data was using the string-rewriting language to produce the image. There was even a catalog of the functions that the blob contained. But the really useful data seemed to be hidden behind yet more puzzles. So at this point I just did a minimal hack to make a token improvement to the produced image: the source image had a grove of apple trees, the target had pear trees. And according to the catalog the function apple_tree was exactly as large as pear_tree. So I wrote a prefix that overwrote the former with the latter. And then I submitted that prefix, and switched to doing something more interesting. (I think that token improvement was still in the top 20 something like 8 hours before the contest ended, which probably says something about how much progress people were making).

I did rather enjoy writing the interpreter and the visualization tool, and the specifications for both were mostly very good. Unfortunately the spec contained only a couple of trivial test cases with the expected results, so if your interpreter had a problem, figuring out what exactly was going wrong just from looking at execution traces was really hard. The organizers originally replied on the mailing list that such debugging "is exactly part of the task", but later released an example trace from a few iterations at the start. There was a documented prefix that would run some tests on the implementation, and generate an image from those results, but the coverage of those tests didn't seem to be very good. (I had several bugs that only showed up with the full image, not with the test one).

The part of the interpreter that many teams seemed to have big trouble with was that you couldn't really use a basic string or array to represent the program. If you did, performance would be orders of magnitude too slow (people were reporting getting 1 iteration / second, when drawing the basic image would require 2 million iterations) due to excessive copying of strings. Now, this was even pointed out in the specification! Paraphrasing: "these two operations needs to run faster than in linear time". And still people tried to use strings, bawled when their stupid implementation wasn't fast enough, and decided that the only solution would be to rewrite their program in C++ instead of their favourite Haskell/Ocaml/CL. Good grief...

For what it's worth, I used just about the stupidest imaginable implementation strategy beyond just a naive string: represent the program as a linked list of variable length chunks, which will share backing storage when possible. My first CL implementation of this ran at about 5.5k iterations / second. This was good enough at the stage in the competition that I got to, and would've been easy to optimize further if I'd decided to continue (afterwards I made a 15 line change that gave a 8x speedup, so the basic image now only takes 41 seconds to render on an Athlon x2 3800+). And this was with a stupid data structure and couple of minor performance hacks. It seems obvious that practically any language could have been used to write a sufficiently fast interpreter. It never ceases to amaze me how programmers would rather blame their tools than think about the problem for a couple of minutes.

Anyway, the organizers obviously put in a huge effort for this contest, so thanks to them for that. It's just that the format really wasn't what I was looking for in a programming contest. But at least it was interesting enough to temporarily shake me out of playing poker into doing some hacking again :-) (Faster SBCL hash tables coming soon, I hope).

I've made the source code for the interpreter available since several people have asked for it. I'm not sure why they've asked for it, since it's not very good code, and probably contains no worthy insights. But if you want it, it's there.

Addenda: After writing the above, I read a few messages on the mailing list which claimed that there really wasn't much of a puzzle aspect, but that success was mainly determined by how good tools (compilers, debuggers, disassemblers, etc) you were able to write. While it's possible that after the initial two humps that I described above the puzzles were irrelevant, that wasn't my impression. At the point where I stopped, it didn't feel to me as if sufficient knowledge was available for writing the tools, but rather was hidden behind encrypted pages, steganography, etc. None of which I really wanted to deal with.

There was definitely enough information available to make a start at reverse-engineering, but I don't think there was enough time to reverse-engineer enough of the system to figure out how to write the tools, write them, and then use the tools to actually solve the real problem. I'm sure things were different for larger teams, but that doesn't really comfort me as a one person team :-) My impression is that in the earlier ICFP contests the tasks were such that it was possible for a single programmer to achieve a decent result, even if it's unlikely that it's good enough to win. In this case you don't get any points for the reverse-engineering or for the tools, but just for the end result.

(Having written the above, I'm now sure that the eventual winner will turn out to be a single programmer who only started working on the task 8 hours before the deadline).

Code coverage tool for SBCL

jsnell@iki.fi — Thu, 03 May 2007 10:00:00 GMT

SBCL 1.0.5.28 includes an experimental code coverage tool (sb-cover) as a new contrib module. Basically you just need to compile your code with a special optimize proclamation, load it, run some tests, and then run a reporting utility. The reporting utility will produce some html files. One will contain an aggregate coverage report of your whole system, the others will show your source code transformed into angry fruit salad:

For a more substantial example, here's the coverage output for the cl-ppcre test suite.

There are still some places where the coverage output won't be what most people would intuitively expect. Some, like the handling of inlined functions, would be simple to solve. It's just not yet clear to me what the right solution would be. For example in the case of inlined functions the right solution might be suppressing inlining when compiling with coverage instrumentation, or it might be to say "don't do that, then" to the users. Others are fundamentally unsolvable, due to the impossibility of reliably mapping the forms that the compiler sees back to the exact character position in the source file. Hopefully this'll still turn out to be useful in its current state.

If you have any suggestions for improvements, I'd love to hear them.

ILC 2007 Summary

jsnell@iki.fi — Wed, 11 Apr 2007 22:00:00 GMT

I wrote several almost finished blog posts during ILC, but didn't get around to posting them "live" due to the issues with wireless access and a generic lack of time, due to being off having a jolly good time. Then I did some more traveling after the ILC, and didn't manage to get them posted right afterward either. And now that I'm finally back home, most of what I wrote then no longer seems worth posting, since it's lost the immediacy.

So here's a few things that come to mind now.

The good

The organization was stellar in almost all respects. A huge thanks to Nick Levine and anyone else who was involved. Cambridge was just incredibly pretty, and the weather ranged from great to "not bad". There were some very good talks, though disappointingly most of the best ones were from Schemers :-) The last day of talks was particularily good. I had incredible fun meeting old friends, most of whom I hadn't seen for a year, putting faces to names I knew from the net, and talking to completely new people. Special honorable mentions in the latter category go to Jeremy Jones and Richard Brooksby, with whom I had several very interesting and fruitful discussions.

I also got lots of very valuable SBCL feedback and new ideas, for all kinds of things from the GC to the user interface for my code coverage tool for SBCL (work in progress). It looks as if we need to beef up the SBCL marketing department, though. I had several discussions of the form "Q: What would it take to make SBCL do FOO? A: It's already done that for the latest X releases.". In the worst case with the same person asking for three different features in succession, all of which had been implemented :-) For example no-one seems to be aware that SBCL/Slime have stepper support. Not horribly good stepper support, but support nonetheless. Also got to talk shop with SBCL developers and Clozure/ITA people, which is always good. And maybe even managed to offload some ideas that I'd proof-of-concepted, but have no intention of ever properly implementing myself.

Got a surprisingly large number of congratulations on graduating. And the guys had even got me a present (a copy of the Lisp 1.5 manual that Nikodemus had found from a bookshop in Cambridge, MA). Thanks! Conveniently the title of the programming contest for the next ILC was pre-announced as "Lisp 1.5", so the manual might even be useful, not just cool :-)

I think the Ravenbrook guys are going to try integrating MPS with SBCL, since CMUCL didn't work out for them. While it's unlikely to replace the current SBCL GC for licensing reasons (it's currently under a GPLish license), it would be very interesting for two reasons: as a benchmark for the current GC and as a first step towards pluggable GCs. The first one would be good since we know that the SBCL memory management is suboptimal in many ways. It'd be valuable to find out what the real cost of fixing many of those suboptimalities is. As for pluggable GCs, Frode wrote a nice message to sbcl-devel about that. If MPS is better for someone's use case than SBCL's gencgc and they can live with the license, it'd certainly be nice for them to be able to just switch GCs. And of course at some point implement other alternative GCs.

Compared to the ECLMs, surprisingly many people that I talked to weren't yet using Lisp seriously, but were just interested about it. Some might think that this is bad, but I think it's really great that there are people still in that stage who are interested enough to travel to and attend a multi-day Lisp conference. And of course there were a lot more serious Lisp users than newbies.

Overall my ILC experience was very positive. I'll talk next about some bad stuff, but that's just because I believe that you can't just sweep that stuff under the rug.

The bad

I think that program-wise there was maybe a day of talks that could've been discarded with little loss. Or if not a whole day, than at least enough to make the rest of the schedule less tight. For example the History of Lisp presentation was total crap (not just somewhat bad, but "I'd rather listen to an hour of silence"-bad), and the information theory one had no business being presented in a Lisp conference. Given what little I heard of the review process in other cases, I don't understand how the latter ever got accepted.

I understand that people don't really go to a conference for the talks, but that doesn't mean that anything goes. My plea to the next ILC program committee is threefold:

Please invite only speakers with something to say that's relevant to Lisp now or in the future, not in the last millennium.
More specifically, I'm sure there's a temptation to "honor" the 50th birthday of Lisp by historical navel-gazing. Please don't give in to it.
If you don't get enough good submissions, don't accept the irrelevant ones as padding.

My attempts at industrial espionage were mostly a failure. Both Duane and Jans ran out of time before getting around to stuff that would've been both worth stealing. For example Duane didn't have time to demo their profiler, which I'd heard described as the gold standard of Lisp profilers, and of course I can't really try it out myself due to the license. I was surprised that the Allegro equivalent to SBCL's optimization notes didn't have any kind of UI for mapping the notes back to the original source, making it look mostly useless. Or at least Duane, who is probably an expert at reading them, did get confused by the results a couple of times despite it being a scripted demo :-)

[ Which isn't to say that Franz's presentations were bad. I just didn't get much out of them SBCL-wise. ]

The controversial

Some stuff has received a lot of airtime after the conference.

Before the conference I expressed some puzzlement about there being an invited talk about CL-HTTP, which I regarded as a choice that was completely out of touch with the current state of the Lisp world. Seeing the talk didn't change my opinion (oh, wow, still using the White House information system from the Clinton administration as the example?). E.g. when Mallery asked about who had ever used CL-HTTP, and practically no hands went up, unlike with every other similar question that was asked during the conference. But amazingly enough, in the last day two presentaters appeared to be seriously using CL-HTTP. (IIRC they were the RacerPro and XMLisp ones).

Most of the Allegro features that Duane and Jans had time to show were things that SBCL already does in some form. It's just that they're exporting their internals, and in some cases the interfaces don't seem very polished. I guess READ-LINE-INTO (?) wouldn't be a bad addition, but e.g. MEMCPY-UP and MEMCPY-DOWN were just completely wrong.

So I wasn't horribly impressed with what they talked about. But unlike Luke, who was stirring up the debate both at ILC and after, I think that it is a very worthwhile goal to give Lisp users access to low level facilities, and that we really should be suppling non-consing / resource-reusing versions of functions where possible. STRING-TO-OCTETS and OCTETS-TO-STRING are an obvious example where SBCL could be improved.

Yes, it'd be really great to just cons indiscriminately, but no matter what the GC scheme is, there will be programs where consing will be deadly. And yes, it'll mean that code written for performance might be a bit ugly, but it's still better than dropping to C from Python for performance, etc. Of course SBCL users can use many of those low level facilities right now, but most of them are undocumented and unexported, which sets the bar for using them pretty high.

The end

Anyway, it was lots of fun! I hope to see all of you again next year.

ILC 2007 MPS Tutorial

jsnell@iki.fi — Sun, 01 Apr 2007 14:05:00 GMT

Oh, man. My excitement about the CMUCL/MPS integration seems to have been premature :-)

Paraphrases from the early part of the MPS tutorial:

"We didn't actually get too far with the actual implementation of MPS and CMUCL, since we were unable to boostrap CMUCL if we made any (even tiny modifications)." (But apparently they have all the design issues solved).

"Unfortunately Dave Jones who's been doing the work on this is ill and thus not at the conference."

"Used CMUCL rather than SBCL since Carl Shapiro had earlier expressed interest in integrating MPS and CMUCL. No particular reason besides that." (In answer to my question about why they didn't try SBCL if bootstrapping was a problem).

ILC 2007 pre-conference stuff

jsnell@iki.fi — Sun, 01 Apr 2007 13:45:00 GMT

(Stuff from Saturday, before the actual conference starts. Sorry for any typos, I wrote it late last night after half a bottle of wine, and didn't have time to proofread it this morning. And am now in the middle of a tutorial. I'll fix it up later.).

Woke up at 0500. Almost missed the plane lifting off at 0745 despite that, since Taxis were nowhere to be found. Met Martti, fellow Helsinki Lisper, at Heathrow, and was entertained by his tales of British engineering for most of the trip from Heathrow to King's Cross.

The conference accommodation is nice, especially for the price. Except for the British plumbing, but complaining about that is about as original as complaining about left side traffic. I got a room in the top floor, which seems to be an attic that was later converted to dorms. It looks pretty dramatic, in a good way (with the room being horseshoe-shaped and varying in height between 4.5 meters to 0.5 meters). Unfortunately I don't have a camera.

Cambridge looks really pretty. I haven't yet random walked around the city properly, and probably won't have time to do so on this trip. I did go to the conference tour, though. Thanks to Martin Simmons for doing the hard work of punting on the punt that I was on. I didn't get horribly much out of the guided walking tour part, but at least it meant visiting various places that I would never have gone to on my own.

The sexp-formatted conference badges that Christophe designed look sweet, though they're not a big surprise since I'd seen them in the earlier stages.

We had a very nice dinner at a Turkish place that Christophe recommended, and which surprisingly enough was able to give a table for 12 with no warning at 1930 on a Saturday. IIRC the name of the restaurant was Anatolia, and based on some after-the-fact backtracking the location is off the conference-provided map, but probably on the Bridge Street that Sidney Street transforms into in the intersection to St. Johns Street. I really liked the food. Didn't mind the wine either, though I won't pretend that I can make any kind of judgment on its quality.

All of tomorrow's 4 tutorials look interesting, but since they're in parallel I can only do two. The MPS tutorial is a must-see for me. Choosing between industrial espionage (performance tuning in Allegro) and cool Lisp hacks (ContextL) will be tough.

It's now 00:30 (2:30 Finnish time, so I've been up for 19+ hours). Time to get some sleep, and hope that I can get this entry posted tomorrow. No wireless reception in the room, and I couldn't get a wireless connection up in the Library Common Room. Some people reportedly had more luck with it.