I've been doing some hacking on SBCL's bignum implementation lately, which resulted in some dramatic improvements in bignum and rational benchmarks. The jury's still out on whether anyone will notice this in real life, since a case can be made that Lisp users don't really care about bignum performance.
The primary changes were a couple of iterations of using smarter GCD algorithms, and some low-level optimizations for bignum division. The latter change gave a suprisingly large performance boost; apparently accessing special variables and boxing/unboxing lots of small bignums are both big no-nos in performance-critical inner loops. At least that's the case on current SBCL, could be that the performance characteristics were different when the code was originally written for CMUCL (that's at least 14 years ago; the CVS logs don't go further back than that).
Speaking of CMUCL, it has some bignum goodies that were added after SBCL was forked. Most importantly there's a Karatsuba multiplier which I've now ported to SBCL. It had somewhat erratic performance due to requiring all inputs to consist of a power of two bits (so multiplying two (2^x)+1-bit numbers takes as long as multiplying two 2^(x+1)-bit numbers; not nice when dealing with O(n^1.58) time complexity). Removing this limitation made the Karatsuba multiplier consistently better than the O(n^2) schoolboy multiplication at about 1500 bits. I'm still not really happy with the performance vs. complexity of the code, and am waiting inspiration to strike before submitting the code for inclusion to SBCL. Meanwhile Raymond Toy ported the change back to CMUCL and reported some encouraging numbers.
All of these modifications were quite frustrating to make. On
one hand the bignum implementation is written almost completely in
Lisp which is certainly nicer than doing it in assembly or C, and
results in pretty competitive performance despite naive register
allocation and lack of instruction scheduling. But on the other hand
the internal bignum representation and the primitives for manipulating
bignums aren't all that suitable for implementing complex algorithms
with mutable bignums. Both accelerated GCD and Karatsuba need to jump
through a lot of hoops to work efficiently, and are therefore
fragile. And, on the gripping hand, I suspect that fixing this would
require a major rewrite of bignums, which would be a lot of work with
little immediate gain (possibly there's some simple way to refactor
the existing implementation to be more friendly, but I can't see it
yet). A rewrite is unlikely, but it's worth thinking about before
adding more horrible kludges into