I just remembered that a couple of months ago I did a quick test on the performance effects of the SBCL x86/x86-64 calling convention. Essentially the easy part of one of the summer of code project proposals by Christophe.

Recipe: Slap together some nonsense-code that looks like it'll do a lot of non-tail function calls and little real work. Attach a gdb to a SBCL where the code has been loaded, and disassemble the functions. This'll produce assembly that can be fed back to gcc after a little tweaking. I used this:

(defvar *a*)
(defun foo (x y)
  (declare (optimize speed (safety 0) (debug 0))
           (type (unsigned-byte 32) x y))
  (ldb (byte 61 0) (+ x y)))
(defun bar (n)
  (declare (optimize speed (safety 0) (Debug 0))
           (fixnum n))
  (if (= n 1)
        (setf *a* (+ (the (unsigned-byte 32) (foo n 1))
                     (the (unsigned-byte 32) (foo n 2))
                     (the (unsigned-byte 32) (foo n 3))
                     (the (unsigned-byte 32) (foo n 4))
                     (the (unsigned-byte 32) (foo n 5))))
        (bar (1- (the fixnum (foo n 0)))))))

Then make a modified version of the code, compile it with the normal Linux toolchain (it's a lot easier to make these sorts of experiments outside of SBCL), and do some timed runs. I did two alternative versions. One replaced the strange "single-value-return by jumping to RCX+3" convention with paired CALL/RETs and removed the extra stack manipulations that the old convention required. The other version additionally took advantage of FOO being a leaf function, and didn't build a stack frame at all.

The results: On my Athlon 64 the original version took 1.1s on (BAR 10000000), the second version took 0.5s and the third 0.25s. It looks like there's a lot more overhead in the current calling convention than anyone expected. Actually implementing any of these changes in SBCL would be a lot of work, so I'm not going to do anything about it (excuses: new full time job, really need to work on the thesis, too many half-finished SBCL project in my local trees already). But maybe this will motivate someone else to try :-)

Special thanks to Kenny Tilton for being an asshole on comp.lang.lisp. Otherwise I would've forgotten this completely.

Update: By popular demand, here are the assembly files: test.S, test2.S, test3.S.