I just remembered that a couple of months ago I did a quick test on the performance effects of the SBCL x86/x86-64 calling convention. Essentially the easy part of one of the summer of code project proposals by Christophe.
Recipe: Slap together some nonsense-code that looks like it'll do a lot of non-tail function calls and little real work. Attach a gdb to a SBCL where the code has been loaded, and disassemble the functions. This'll produce assembly that can be fed back to gcc after a little tweaking. I used this:
(defvar *a*) (defun foo (x y) (declare (optimize speed (safety 0) (debug 0)) (type (unsigned-byte 32) x y)) (ldb (byte 61 0) (+ x y))) (defun bar (n) (declare (optimize speed (safety 0) (Debug 0)) (fixnum n)) (if (= n 1) 1 (progn (setf *a* (+ (the (unsigned-byte 32) (foo n 1)) (the (unsigned-byte 32) (foo n 2)) (the (unsigned-byte 32) (foo n 3)) (the (unsigned-byte 32) (foo n 4)) (the (unsigned-byte 32) (foo n 5)))) (bar (1- (the fixnum (foo n 0)))))))
Then make a modified version of the code, compile it with the normal Linux toolchain (it's a lot easier to make these sorts of experiments outside of SBCL), and do some timed runs. I did two alternative versions. One replaced the strange "single-value-return by jumping to RCX+3" convention with paired CALL/RETs and removed the extra stack manipulations that the old convention required. The other version additionally took advantage of FOO being a leaf function, and didn't build a stack frame at all.
The results: On my Athlon 64 the original version took 1.1s on (BAR 10000000), the second version took 0.5s and the third 0.25s. It looks like there's a lot more overhead in the current calling convention than anyone expected. Actually implementing any of these changes in SBCL would be a lot of work, so I'm not going to do anything about it (excuses: new full time job, really need to work on the thesis, too many half-finished SBCL project in my local trees already). But maybe this will motivate someone else to try :-)
Special thanks to Kenny Tilton for being an asshole on comp.lang.lisp. Otherwise I would've forgotten this completely.