SBCL character IO has been always been rather slow, but after the Unicode support was added about a year ago it got even worse. To give an idea of how bad, reading and printing a 65MB (1.3 million lines) file line-by-line takes <1.5 seconds with Perl, 5-8 seconds with other Lisps that I have installed, and 15 seconds with SBCL 0.9.6. A pre-unicode SBCL takes about 7 seconds.

So I went hunting for some low-hanging fruit in (fd-)streams, and found quite a lot.

  • There were several places where the Unicode-induced separation of SIMPLE-BASE-STRING and (SIMPLE-ARRAY CHARACTER) had forced formerly inlined operations (stuff like AREF, FIND, REPLACE, etc) to be replaced with a generic calls due to insufficient type information.

  • The addition of the OUTPUT-NOTHING restart when trying to write a character into a stream with an incompatible external format was causing overhead on every iteration of some inner loops. Though I have a vague recollection that it was even worse at some point (creation of a restart on every iteration of the innermost loop) than it was now (establishing a catch tag on every iteration).

  • The input buffer for UTF-8 streams never received more than one character at a time.

  • READ-LINE was fetching data from the internal input buffer character by character, instead of looking ahead for a newline and then copying a bigger batch of characters at once.

After fixing all of the above and doing some additional micro-optimizations SBCL now takes about 3.5 seconds, which isn't too bad. If you've been having IO performance troubles with SBCL, now might be a good time to test CVS SBCL.

One thing that I ran into and didn't have time to look at is that SB-SYS:*STDIN* doesn't get a CIN-BUFFER at all, and thus is still painfully slow. If this is intentional, my guess is that FD-STREAM-READ-N-BYTES doesn't play along well with line-buffering.