Long time since the last golf. Inspired by the recent announcement of a Perl Golf book I took part in a Polish golf that was announced on the mailing list.

Given a input string that has been "encrypted" with ROT-n on STDIN and a dictionary of words (sequences of letters A-Za-z, not of \w) in @ARGV the program needs to output to STDOUT the original plaintext. (Formal rules).

My best solution was 62 characters, but I figured out about an hour before the golf ended that it was actually broken, and didn't have time to figure out anything better than the 65.44 below, which is currently good for a second place. The apparent winning solution of 63 doesn't seem to work either, for unrelated reasons. So the explanation might be for the winning entry, or it might not.

#!perl -p0

You know the drill. -p handles reading the input and printing the output. Use -0 to read the input in one go, instead of a line at a time.

INIT{%a=map{pop,1}@ARGV}

In the INIT block, pop all command line parameters to make -p read from STDIN. Use the removed arguments as keys in a hash table for detecting dictionary words. Using the symbol table with something like $$_=1while$_=pop would save a few characters, but that's incorrect since $ARGV is automatically set to '-' on entering the main loop.

$a{$&}||y/B-ZA-Gb-za/A-z/while/\pL+/g

At the start of the main body $_ contains the whole ROT-n text.

On the first iteration /\pL+/g will match the first word (letters only; \pL is essentially [a-zA-Z]). //g works differently in scalar than in list context: it will only match once per call, but the next call will start at the location in the string where the last match ended. If a match was found it returns true, otherwise false.

In the body of the while we first check if the word we matched is in the dictionary. If it isn't (i.e. $a{$&} is untrue) $_ obviously isn't plaintext yet, so we rotate it by one step with y///. This contains the only tricky bits in the program:

  • Changing $_ causes the scalar //g to be reset, and start matching from the start of the program.

  • Doing the rotation backwards (A -> Z, B -> A, ..., Z -> Y) instead of the more intuitive direction (A -> B, B -> C, ... Z -> A) allows writing the transliteration in a way that saves one character.

    There are six characters ([\+]^_`) between Z and a. By adding six extra characters into the right place on the left side of the transliteration operation (with -G) we can use the range A-z on the right side, instead of specifying separate ranges for upper- and lowercase letters. Compare:

y/A-Za-z/B-ZAb-za/
y/B-ZA-Gb-za/A-z/

FWIW, the 65.48 by Piotr Fusik by far the coolest solution. Wish I'd thought of that...