Re: my previous post:
Gábor Melis, SBCL thread and signal handling guru, implemented a similar optimization for inline allocation on x86, with major speedups in allocation-heavy benchmarks.
Christophe ended up doing most of the patch-merging that I'd threatened to do. He also suggested a hopefully better way of presenting CL-BENCH results than the one I was using: instead of using an arbitrary ordering for the results, base it on existing SBCL benchmark clustering data.
I wrote some marginally better GC benchmarks than the ones I'd been using before, and found that the GC page-table stuff that has been languishing in a CVS branch for ages has some nasty worst-case behaviour for long-running processes. While there was a 20% improvement in average GC latency over vanilla SBCL, a full GC with lots of data in older generations was 50% slower. The unambiguously good parts of are already in HEAD, so instead of merging the remains of the branch I implemented a couple of smaller GC optimizations.
Also finished and merged was source location recording for non-functions
(i.e. "where was this package/class/variable/etc defined?"), along with
swank-sbcl support for using the information in