## A Monte Carlo simulation of Red7, in Common Lisp

Red7 is a very clever little card game, and one of my favorite 2014 releases. But I have wondered about the density of meaningful decisions in the game. Sometimes it doesn't feel like you have all that much agency, and are just hanging on in the game with a single valid move every time it's your turn.

So here's some automated exploration of what a game of Red7 actually looks like from a statistical point of view. The method used here is a pure Monte Carlo simulation, with the players choosing randomly from the set of their valid moves.

Why a Monte Carlo simulation? I started trying to do a full game tree for a given starting setup but to my surprise the game tree is actually too large for that to be feasible; 2 weeks of computation even for a single two player game and a lot of optimization. The branching factor is just much bigger than it feels like when playing the game.

## The rules

(Skip this section if you're already familiar with the game. All you need to know is that we're using the advanced version of the game but without the optional special action rules.)

The rules of the game are very simple. There's a deck of 49 cards (7 colors, numbers 1-7 in each color). In the middle is a discard pile ("canvas"). The color topmost card of the discard pile determines the victory condition. You must be "winning" at the end of each turn you take, or you're out of the game.

There are three options to choose from on your turn. Play a card from your hand to the table in front of you (your "palette"), discard a card from your hand to the canvas, or first play a card and then discard a card. If you discard a card with a number higher than the number of cards in your palette, you get to draw a card.

The winning condition is determined based on the color of the canvas (i.e. top card in discard pile):

 Red Highest card Orange Most cards of the same number Yellow Most cards of the same color Green Most even cards Blue Most different colors Indigo Longest run of sequential numbers (e.g. 4/5/6) Violet Most cards with a number lower than 4

If two players are tied for the winning condition (e.g. the rule is blue and both of them have three even cards in their palette), the winner is the player who had a higher card included in their card combination (cards that didn't contribute to the winning condition are ignored for the tie breaker). This is primarily based on the numeric value of the card. But if two cards have the same value, the one closer to red in the spectrum wins the tie (e.g. green 5 > indigo 5 > green 4).

## The implementation

(Ignore this section if you're not interested in the programming, and skip straight on to the results).

I suspect that every Common Lisp program will eventually evolve to using a clever bit-packing of fixnums as its primary data structure. That's the case here as well.

### Cards

A card is an integer between 0 and 55 (inclusive). The low 3 bits are the color, with a 0 being a dummy color that's not used for anything, 1 for violet going all the way to 7 for red. The next 3 bits are the card's numeric value minus one (0-6). Note that with this representation determining the higher of two cards is simply a matter of making an integer comparison.

```(deftype card () '(mod 56))

(defun card-color (card)
(ldb (byte 3 0) card))

(defun card-value (card)
(1+ (ash card -3)))
```

We'll also need a way to represent a set of cards, for a player's hand or palette. We're going to use a 56-bit integer for that, with bit X being 1 if the set contains card X.

```(deftype card-set () '(unsigned-byte 56))
```

Adding and removing cards is simple. (Except how annoying is it that SETF LOGBITP is not specified in the standard?).

```(defun remove-card (card card-set)
(logandc2 card-set (ash 1 card)))

(logior card-set (ash 1 card)))

;; Create a new set from a list of cards.
(defun make-card-set (cards)
```

We'll also need to be able to iterate through all the cards in a set. This is most easily achieved by using INTEGER-LENGTH to find the highest bit currently set, executing the loop body, clearing out the highest bit, and carrying on.

```(defmacro do-cards ((card card-set) &body body)
(let ((modified-set (gensym)))
`(loop with ,modified-set of-type card-set = ,card-set
until (zerop ,modified-set)
for ,card = (1- (integer-length ,modified-set))
do (setf ,modified-set (remove-card ,card ,modified-set))
do ,@body)))
```

### Scoring

With these primitives we can then write a very fast function to determine who is currently winning the game. We'll base this evaluation function on scoring a combination of a palette + rule, and comparing the score that each player gets with the current rule. This is a much better way than trying to directly compare the palettes. If you're caching this evaluation function, you get a much higher cache hit rate when the cache key depends only on the state of one player rather than a combined state of two players. (I'm also pretty sure that given this data layout, computing a score will be faster than any kind of direct comparison).

Let's start off with the general structure, and fill in the details as functions under LABELS afterwards. So given a card-set and a color, we'll return a score for that set:

(defun card-set-score (card-set type) (labels (...) (ecase type (7 (red)) (6 (orange)) (5 (yellow)) (4 (green)) (3 (blue)) (2 (indigo)) (1 (violet)))))

Red (highest card) is trivial. We just find the highest card in the set with a call to INTEGER-LENGTH.

```           (red ()
(integer-length card-set))
```

For other rules we can make good use of the following helper function. It matches the set against a bitmask, and returns a score based on the number of bits that are set both in the set and the mask (main part of score) which we get with LOGCOUNT, as well as the highest bit set in both (the tiebreaker). Given this definition, most of the scoring types can be written in a very concise manner:

```           (score-for-mask (mask)
(let ((matching-cards (logcount matching-cards))
(best-matching-card (integer-length matching-cards)))
(+ best-matching-card (* 64 matching-cards)))))
```

For orange (cards of one number) we start with a bitmask that matches all bits corresponding to a card with the value 7. We compute the score for that mask, then shift the mask right by 8 bits such that it covers the cards with the value 6. Repeat 7 times, and find the maximum score. (We don't need to know which iteration produced the highest score, only what the score was).

```           (orange ()
repeat 7
```

Yellow (most cards with the same number) is very similar. We start off with a bitmask that matches all the red cards (so bit 55, 47, 39, etc) and compute the score. Then shift it right by one, such that the mask matches all orange cards instead. Again repeat 7 times and maximize.

```           (yellow ()
repeat 7
```

Green (most even cards) and violet (most cards under 4) are trivial; we can just score a single mask matching the even cards for green, all cards of value 1, 2 or 3 for violet.

```           (green ()
(violet ()
```

Blue (most cards of different colors) is where we get into unintuitive territory. Let's start with the tiebreaker; it's obviously guaranteed that he highest card in the palette as a whole can be included in this winning set, so we can just use INTEGER-LENGTH on the whole set the same way we did for the red scoring rule.

To get the number of different colors, we will fold the cardset multiple times. First we'll do a bitwise OR of the high 32 bits and the low 32 bits. Then we'll take OR bits 0-15 of that result with bits 16-31. And finally one more OR of bits 0-7 with 8-15. The low 8 bits are now such that bit 7 is set if any of the "red" bits in the original were set, bit 6 if any of the "orange" bits, etc. We can then just use LOGCOUNT on that byte to get the number of colors present in the palette, and combine it together with the tiebreaker score computed above.

```           (blue ()
(let* ((palette card-set)
(best-card (integer-length palette)))
(setf palette (logior palette (ash palette -32)))
(setf palette (logior palette (ash palette -16)))
(setf palette (logior palette (ash palette -8)))
(+ best-card
(* 64 (logcount (ldb (byte 8 0) palette))))))
```

Finally, there's indigo (longest straight). There does not appear to be any clever bit manipulation trick to compute this quickly (if you can think of one, please let me know!). We need to iterate through the cards in order of descending value, ignore any consecutive cards with the same number, and reset our scoring computation when the straight gets interrupted by a missing number.

```           (indigo ()
(let ((prev nil)
(current-run-score 0)
(best-score 0))
(declare (type (unsigned-byte 16) current-run-score best-score))
(do-cards (card card-set)
(cond ((not prev)
(setf current-run-score card)
(setf prev card))
((= (card-value card) (card-value prev)))
((= (card-value card) (1- (card-value prev)))
(incf current-run-score 64)
(setf prev card))
(t
(setf current-run-score card)
(setf prev card)))
(setf best-score (max best-score current-run-score)))
best-score))
```

### Players

A player is defined as a normal structure, with the only oddity being that they form a circular linked list using the NEXT slot. This tends to be more convenient for iterating through players in turn order than keeping them stored in an external collection of some sort.

```(defstruct (player)
(id 0 :type (mod 5))
eliminated
(hand 0 :type card-set)
(palette 0 :type card-set)
(score-cache (make-array 16) :type (simple-vector 16))
(next nil :type (or null player)))
```

The core operation of generating a list of valid moves is deciding whether the player is winning the game after those a move is made. When doing this we'll end up repeatedly evaluating the scores for the same palettes over and over again. To speed this up, there's a minimal cache; for each player / rule combination we store both the last palette we evaluated for that rule, as well as the score.

```(defun player-score (player rule)
(declare (type (mod 8) rule))
(let* ((palette (player-palette player))
(cache (player-score-cache player))
(cached-key (aref cache rule)))
(if (eql cached-key palette)
(aref cache (+ rule 8))
(progn
(setf (aref cache rule) palette)
(setf (aref cache (+ rule 8))
(card-set-score palette rule))))))
```

Given that way to score a player against a rule, we can then check whether the current player is winning the game with the rule.

```(defun player-is-winning (player rule)
(loop with orig-player = player
with orig-score of-type fixnum = (card-set-score player rule)
for player = (player-next orig-player) then (player-next player)
until (eql player orig-player)
do (when (>= (the fixnum (player-score player rule))
orig-score)
(return-from player-is-winning nil)))
t)
```

We can then generate all valid moves by iterating through all the PLAY, PLAY+DISCARD, and DISCARD combinations for the player's current state, and collecting the ones result in the player winning.

```(defun valid-moves (player current-rule)
(let (valid-moves)
;; Filter out cases where player discards a card
;; without changing rule or gaining a new card.
(>= (logcount (player-palette player))
(push (cons (cons :play play-card)
valid-moves)))))
(check-plays ()
(do-cards (play-card (player-hand player))
(setf (player-palette player)
(when (player-is-winning player current-rule)
(push (cons :play play-card) valid-moves))
(setf (player-palette player)
(remove-card play-card (player-palette player))))))
(check-plays)
valid-moves))
```

### Other stuff

There's a little bit more code required to generate the scaffolding for a game, and to actually do the random walk through the game tree. None of that code is particularly interesting, nor are the INLINE or TYPE declarations that you'd need to sprinkle on the above code to make it fast. The full code is available on GitHub.

### Performance

In the optimal case of trying to iterate through the whole game tree in a 2p game, the average cost of making a move is about 500 cycles, with my desktop doing 7 million moves per second. This is however amortizing the cost of computing the set of valid moves across all of those moves (since in a full search every valid move gets executed). If you're just doing a pure random walk with no backtracking, you'd get no amortization at all. That effect makes an order of magnitude difference.

But it's funny that the biggest profiler hotspot in the program is the PLAYER-SCORE function. Which, if you remember, will simply do an array lookup to get the previous cache key, compare it to the card-set that should be evaluated, and either return a previous result or call out to the real scoring function. The function does basically nothing, but it does nothing really often. When all of the things of substance are pretty fast as well, it's maybe not a surprise that the bottleneck ends up in a place like that.

## Results

(Skip this section if you're not actually interested in the game, and just wanted to read some Common Lisp code).

The following results are computed from running simulations of 10k different initial setups, with 100k matches for each simulation with each player making random but valid moves. (So a total of one billion games). All plays were with 3 players, the only player count I consider worth playing.

As a sanity check, I ran a smaller simulation of 1000 initial setups where the players would not play a card + discard, if just playing that same card was sufficient to get into the lead without a discard. The results were very close to the large fully random simulation (e.g. the average game length was 14.6 instead of 14.1 turns, and the win percentage of the best turn order position was 39% rather than 40%).

Finally, an even smaller scale experiment had the AIs use move selection heuristics very similar to those I personally use when playing the game. Those results didn't differ materially from random play either.

### Caveats

Unless stated otherwise, all of the numbers are from games with players making completely random moves. It is possible that the aggregate statistics are different when players consciously build toward palettes that are strong in multiple scoring rules, or strong in rules that they have a lot of cards in hand for.

The games are always played with the full deck, rather than in reality as the deck slowly depletes from hand to hand as cards are moved to the scoring piles of players.

### Starting player effect

One thing I was curious about is whether the starting player has an advantage, a disadvantage, or neither. It's not obvious, since there are effects both ways.

The case for a disadvantage: Running out of cards means losing the game, and the all other things being equal the first player will also run out of cards first. Due to the way in which the player order is picked, the last player is also guaranteed to have the highest value starting card in their palette giving them a leg up on winning future tiebreakers.

The case for an advantage: The earlier in turn order a player is, the fewer cards the opponents have in their palettes. It's much easier to pass two players with one card each, than two players with two cards each. And this effect continues throughout the game, so it should accumulate over time.

It turns out that at least with undirected random play there's a major disadvantage to being first. It could be that the effect is smaller when players are making "good" moves.

 Position Win rate 1st 27.20% 2nd 32.42% 3rd 40.37%

### Number of possible moves

Like mentioned above, the branching factor in the game was higher than I'd been expecting. There are cases where players have a lot more moves available than I would have expected.

The theoretical maximum number of options is 7 + 7 + 7 * 6 = 56, where a player can get in the lead either by discarding any of their cards, playing any of their cards, or with a combination of the two. This situation actually happened a total of 483986 times in 14 billion moves (0.03% of the time). A lot more common than I would have thought.

But of course we don't particularly care about the 0.03% case. The more common cases are more interesting. The following graph shows how often you have at least X moves available in the game.

For example, you can see that about a 1/3rd of the time a player had 10 or more options to choose from. It appears that the game is nowhere as constrained as I thought, even when playing without the special action rules.

### Length of game

The average game lasted for 14.2 turns, which is perhaps less than I expected given 2 of those 14 turns were by definition a player just dropping out from the match.

There were some games that already ended on turn 4, which meant that only two cards were played in the game. That number was a mercifully low 0.01%. And while there were players who got eliminated before playing a card, there at least were no games ending in turn 2 or 3 even if that's theoretically possible. And a single game lasted all the way to turn 28.

The following graph shows how large a proportion of the games were still running on a given turn.

### Effect of player decisions

The final question is about how strongly predetermined a single hand of Red7 is, and how much a player can affect it.

We've already established that at least with this skill level of play there's a very large start player advantage, but is that an isolated issue or does the setup matter even more than that. In these simulations all players are by definition equally skilled. If the end result of the game is primarily determined by player skill, you'd thus expect them to have similar win rates from game to game. So let's graph the distribution of per-setup win rates for each starting position:

Now, this graph is a little abstract since we're looking at probabilities of probabilities. The way to read this is that across those 10000 starting setups, the most common win percentage for player 1 (red) across the 100000 games in a specific setup was around 15% (the peak of the red line is at around 0.15). You can see that the later players in turn order have a graph that's shifted further to the right, which is what you'd expect when they have a substantially higher win percentage. But you can also see that from any starting position you might get absolutely dismal win rates (near 0) or very high win rates (over 80%). The ridiculously high win rates (95%) appear to be purely reserved for the player last in turn order.

There were two setups where a player didn't manage to win even a single match out of 100000 (in both cases that was player 1). In 25% of the cases the player with the worst chance of winning a setup had a 10% win rate or lower, in 7% of the cases a win rate of 5% or lower. It does appear that within a single hand of Red7, luck plays a massive role.

Out of all of the questions we've been looking at, this is of course the one where the applicability of a purely random search strategy is the most questionable. If we're investigating the effect of player skill, how can results from the least skillful play imaginable be relevant? I'm sympathetic to that argument, but before buying into it I'd really like to understand the mechanism by which one player is supposed to disproportionately benefit from the random play.

Also... As mentioned earlier, I also tried extending the AIs to be smarter about selecting each move. This was not based on any kind of lookahead, but simply the kinds of heuristics I'd usually use myself when playing the game. If I can get into the lead either by playing a card or discarding a card (without drawing a new one to replace it), I'd rather play a card since that's going to be useful on future rounds. When choosing which of two cards to play, I'd usually prefer to play the one that adds strength to more different scoring rules.

Experiments with one AI player getting use of these kinds of heuristics while the others played completely randomly did not show a big effect, the changes in the win rate were on the order of 1-2 percentage points.

## Future work

I might be done with this little project, but if I pick it up again there's a couple of obvious directions to take this. Implementing the optional special action rules would be nice. That's my preferred form of the game anyway.

The more interesting one is to extend the current system to be a full AI using the Monte Carlo Tree Search approach. This would allow generating statistics based on "good" play of the game, maybe provide information on what kinds of moves are in general successful, as well as give a more conclusive answer to the level of skill the game has.

The tricky bit with evolving this code to a MCTS is that the system in the current form would allow the MCTS to exploit knowledge of future random events and hidden information. It would need to randomize all card draws (currently deterministic), as well as swap the opponents hands for random cards for the duration of the evaluation phase, and then swap the original deck and original hands back in for the move execution. That's going to slow down each individual move a lot, which is a problem when MCTS will intrinsically require computing several orders of magnitude more moves than a random walk.

## Can't even throw code across the wall - on open sourcing existing code

Starting a new project as open source feels like the simplest thing in the world. You just take the minimally working thing you wrote, slap on a license file, and push the repo to Github. The difficult bit is creating and maintaining a community that ensures long term continuity of the project, especially as some contributors leave and new ones enter. But getting the code out in a way that could be useful to others is easy.

Things are different for existing codebases, in ways that's hard to appreciate if you haven't tried doing it. Code releases that are made with no attempt to create a community around it and which aren't kept constantly in sync with the proprietary version are derided as "throwing the code across the wall". But getting even to that point can be non-trivial.

We've had vague plans almost from the start of eventually releasing some of Teclo's code as open source. Not the commercially vital bits like the core TCP stack, but at least some of the auxiliary programs and libraries. Setting those bits free should be a win all around. The public gets access to something they didn't have before. We get the warm and fuzzy feeling of contributing something back, maybe a tiny bit of good PR, maybe some useful code contributions, and potentially a future recruitment channel.

But it's always been just talk. Something of such low priority ("would be nice to do that some day") gets pushed back by more important things, and never actually gets done. But last week I had two discussions like that with people outside the company, rather than internally. The two discussions were rather different.

### The unrealistically easy case

In the first discussion, Luke was looking for a SSE-accelerated IP checksum library to benchmark and maybe use in Snabb Switch. He knew that Teclo had one, and asked if he could get access to it. It was an ideal case: completely self-contained in one file, not providing any real competitive advantage to us, and in the dependency graph of stuff we'd already in decided should be open. Realistically the only reason not to have released that code ages ago was that we didn't know anyone had any use for it; hardware offload of checksum computation / validation is kind of the default these days.

So releasing that code to the wild took about 5 minutes of my time. And within a day the now free code had sprouted a significantly faster AVX2 variant that we could easily merge back (thanks Luke!). Checksum computations aren't a huge bottleneck for us, but it could still reduce our CPU usage by 1% once we upgrade to Haswell based Xeons.

It's about as clean a win-win as you can get. If things were always this easy and productive, we'd have most of our codebase opened up in a jiffy :-)

### A more realistic case

Unfortunately things aren't always that easy. The other discussion went roughly like this.

A customer of ours needs to do some load-testing. They know we've got a decent in-house traffic generator with some properties that no other freely available traffic generator had. (Basically support for huge concurrent connection counts combined with generating valid bidirectional TCP flows that react correctly to network events such as lost packets; not just replaying a capture or generating semi-random packets. If you're aware of an already existing open source traffic generator like that, please let me know in the comments).

So they wondered if they could maybe have access to our traffic generator. Unfortunately the program in the current state is not something that's useful to anyone else than us, since it's using a proprietary high-speed raw packet library that's specific to one obscure brand of NICs. We've got a few in the lab for load-testing (stopped using them for production years ago, so they're useless for anything else), but it's unlikely that anyone else does in this day and age. On hearing this, the customer suggested that they could just add DPDK support to the traffic generator themselves. And from there it was a short step to agreeing that it'd be in everyone's best interest for all of this to be released as open source. Again it should be an obvious win-win.

(Of course this might not actually happen, so please don't take this as any kind of a promise. But it's an instructive example anyway.)

The big difference to the previous case is that here the program is no longer anywhere near self-contained. It'd pull in ridiculous amounts of cruft from all over our code base. Some of the cruft is just useless for a standalone open source release, some is a little embarrassing, some of it is totally blocking. (And the blocking code can't just be cut out. You can maybe release a program in a rough state, but at a bare minimum it must compile and do something useful).

• First, there's a baroque build system that depends on dozens of makefile fragments and external scripts. Reasonable for a large project with a long history and odd requirements, not something anyone would like to work with for a small standalone tool. Even if we used a less insane tool than make (e.g. CMake), figuring out exactly which build steps we actually need and rewriting a minimal build file would take a while.
• Then there's a custom JSON-over-UDP RPC system, for a program that doesn't fundamentally need to do any RPC. But does it anyway, since as a matter of policy all of our programs are RPC servers, and all of them support a few RPC methods by default (e.g. querying counters). Nobody outside the company would actually want to use these facilities. They even couldn't use them, unless we also released some additional tools for actually making RPC requests... The downside of including the RPC server in the code is that it pulls in lots and lots of additional code that's not really relevant for this particular program.
• There's also a horrible system for generating C++ classes with automated JSON marshaling / unmarshaling from data structure description files. (Written in sh and the C preprocessor, to give an idea of what a kludge it is; I'm feeling a little bit ill just at the thought of more people seeing that code). Including this code is kind of necessary, since that's how the traffic generator is configured.
• And a little mini-language that can describe the link aggregation setup in use for a set of physical network interfaces. Any time we pass interface names e.g. on the command line, they're passed using this language. Because anything we run in production will need to handle link aggregation. But there's not really any compelling reason for the load generator to support EtherChannel (or whatnot). All that code is just extra baggage.
• There are various in-house utility libraries that we use all over the place. Timers, realtime clock libraries, select/epoll/etc wrappers, logging, minimal C++ wrappers for C libraries, small utilities made obsolete by C++11, and so on. Most of these are doing jobs that would be better done by mature open source libraries, but where we ended up with something in-house instead for some (usually path-dependent) reason. None of those reasons would actually apply to an open source program living completely outside our main repository.
• And then there are hardwired dependencies to third-party libraries just for minor conveniences, like linking in libunwind for backtraces. Critical for anything we run in production, and thus habitually linked in to everything. Somewhere between useless and mostly harmless for a tool like this.
• Finally there are a bunch of raw packet IO backends. Some that nobody in the world would actually want to use, and some that we might not even be able to legally distribute in a buildable form due to proprietary third party library dependencies.

And that was just what I could think of off the top of my head; there's probably stuff that I've forgotten about completely. As soon as you get outside of the "one self-contained file or directory" level of complexity, the threshold for releasing code becomes much higher. And likewise every change to a program that was made in order to open source it will make it less likely that the two versions can really be kept in sync in the long term.

In this case the core code is maybe 2k-3k lines and won't require much work. It's all the support infrastructure that's going to be an issue. Getting it to just a minimally releasable state is likely to be at least a few days of work, and that "minimum" is going to be rough. That's not huge amounts of work, and still probably worth it. But significant enough that it needs to be properly scheduled and prioritized.

### What should have been done differently?

At first it seems that this trouble is a sign of some kind of software engineering failure. Why do we have code around that's embarrassing, why not fix it? Why does the transitive dependency closure of the program include code that can't possibly get executed? Why is the build-system monolithic and a bit cumbersome?

But after giving this some thought, I'm less sure. Except for a couple of unnecessary instances of NIH, it's not clear to me that there was any problem with the development process as such.

For example one could imagine maintaining a strict separation between projects / components. "This is our RPC library, it lives in this repository, it has its own simple build system, it has its own independent existence of everything else, its own release process and schedule, and could be trivially open sourced". The top level build could be as simple as pulling in the latest version of each component, building them separately, and bundling them up somehow. Some people would argue that such forced modularization is a good thing. But I take it almost as axiomatic that a single repository for all code is a major benefit that you should not give up unless you really have to. (E.g. being able to atomically update both an interface and all users of an interface is just so nice).

Or all of that old and crufty code? Well, the reality is that a lot of the old code has been completely rewritten over the years. That's code that actually was causing trouble. What survives might be ugly, but it also works. No matter how good the test coverage, rewriting or even refactoring a lot of that stuff would be a totally unnecessary and unjustifiable risk.

And it's obvious that when writing a tool mainly for internal use, you should use the normal internal conventions and convenience libraries. Writing crippled bespoke versions of those things just to make a possible future open source release easier would be insane.

Even if making real changes to the development process isn't feasible, there's one thing that could be improved with little effort. Simply keep an eye out for situations where small, useful and mostly self-contained bits of code could be opened up. They're going to be easiest to release, and are well positioned to create a useful feedback cycle. If the bottleneck really is finding the time to do the work, this should give pretty good bang for the buck.

Any obstacles to doing this are more likely to by psychological than technical or political. "Of course nobody else in the world could possibly want to do such a special purpose task". "No point in releasing something that trivial". "Just how anticlimactic is it to release one file of code and a makefile, a lame near-empty repository?". Obviously these kinds of fears are nonsense.

So I'm going to make a conscious effort to find these opportunities, and actually take advantage of them. As an extra incentive I'm talking about it publicly in this post, so that you can mock me in a year if I still just talked the talk :)

## Podcast on mobile TCP optimization

I was recently a guest on Ivan Pepelnjak's (ipspace.net) Software Gone Wild podcast, talking about TCP acceleration in mobile networks, as well as whining in general about how much radio networks suck ;-) Thanks a lot to Ivan for the opportunity, it was fun!

You can listen to the podcast episode here.

## Command languages as game user interfaces

In the previous post in this series, I promised to discuss in detail some of the positive and negative consequences of the less conventional design choices of my online Terra Mystica implementation. If you have no idea of what that is, reading at least the intro of that post might be a good idea. This post will just deal with one design choice, but it's the elephant in the room: the command language.

The canonical internal representation of a game in my TM implementation is as a sequence of rows, each describing a some number of player actions specified in an ad hoc mini language, or administrative commands that change the game setup in some way (for example setting game options, or dropping a player from the game partway through). This is what it might look like:

```yetis: action ACT4
cultists: +FAV6
giants: Leech 3 from cultists
giants: pass BON4
yetis: Leech 2 from cultists
cultists: +WATER
dragonlords: Decline 2 from cultists
dragonlords: dig 1. build G6
yetis: send p to EARTH
cultists: action FAV6. +AIR
dragonlords: pass BON7
yetis: upgrade E7 to TE. +FAV11
giants: Leech 3 from yetis
dragonlords: Leech 2 from yetis
cultists: Leech 2 from yetis
```

That's a short excerpt from the middle of a random game. A full game generally runs for about 400 rows.

What do I mean by this being the canonical internal representation? Only a few parts of the game state are actually persisted separately in the DB; these are things that might almost qualify as metadata, such as whose turn is it to move, is the game still running, and what were the final rankings of a finished game. But in general the only way to find out the current state of the game is to evaluate the whole sequence of commands from start to finish. This is in fact done for almost every operation on the site (viewing a game, previewing a move, saving a move, viewing the or editing the game in an admin mode, and so on).

In addition to being the canonical internal representation, the command language is also the canonical user interface; the fundamental operation players do is enter new rows into the command sequence. Often this is done by writing the commands manually, though there are GUI shortcuts of one form or another available for almost all operations.

This might sound like a slightly insane way of doing things, but it does have some benefits as well. I've made several digital board game adaptations of varying levels of completeness over the years, used tens of other ones, and this solution hits the closest to my personal sweetspot.

### A taxonomical diversion

Before discussing the fallout of this design decision in more detail, it's probably useful to do a quick tour of some of the main axes in the design space. (I'm of course just describing the extremes, while in the real world most examples would fall on a continuum).

First, there's the question of the interaction model which might be abstract or skeuomorphic. In a skeuomorphic design the player doing input on a computer would still be mimicking the actions of someone playing the game with physical pieces and no computer assistance.

In an abstract design the player would only input the parts of the move that are necessary to uniquely distinguish it from other possible moves, with any bookkeeping and mandatory intermediate steps being carried out automatically. Likewise in a skeuomorphic design the software provides information through the same methods as the original physical game, while an abstract design will automate some of the mechanical parsing of the game state. Or even just the question of using the graphical assets of the original game, generally optimized for sales, versus using digital-first assets optimized for clarity.

As an example of this axis, in the 18xx series of games a substantial amount of playtime is spent computing the exact routes of a number of trains on a complex rail network. I'm aware of three solutions that are actually in use, and there is a fourth plausible one, in order from least to most abstract:

• The user manually decides on the routes, computes their values with no computer assistance, and those values are used with no validation. Examples: ps18xx, early versions of Rails.
• The user enters valid routes through a user interface. The software computes the values of the routes, and distributes the income from the company appropriately. Example: rr18xx.
• In games with requirements that all routes must be optimal, the software could compute an optimal route but only for the purpose of rejecting any manually computed unoptimal ones. Examples: None. (Though it's similar to what's done in the SlothNinja implementation of Indonesia, a game that probably counts as an honorary 18xx)
• The software automatically finds an optimal set of routes and computes their values. Examples: The ancient DOS-based 1830 from Simtex, recent versions of Rails.

My own tastes run toward maximum abstraction, I've rarely if ever seen a digital boardgame conversion that needed to be more skeuomorphic. But this is not a universal view. There are definitely people who will refuse to play a conversion that does not use the same graphics as the physical version. Or who will strenuously argue against automatic finding of optimal routes in 18xx, on the basis that being evaluating routes is a core skill in the game when making decision about route building, and that skill can only be acquired by getting sufficient practice in manual route computation.

A second axis is the internal representation, which could be based on either log replay or stored state. In a log replay system the game is stored as a series of steps from the starting setup to the current state. In a stored state system the game is stored as the current values of all pieces of the game. How much money does every player have, which round is it right now, what's in this exact space on the map, and so on.

A third axis is the input model. Moves could be entered either through direct or indirect manipulation. In a system using direct manipulation, the player would for example see a graphical display a map and be able to click or drag on a unit to enter a move for it. In an indirect system the player observes the game state in one place, and enters their moves using some completely unrelated system.

I think most digital boardgames use a direct input model, but there are also a fair number that have a menu-driven system of some sort. The only examples I know of that go a bit further with indirection by providing a command language are my ancient Paths of Glory mapper and the even older Diplomacy PBEM judges. If you have other examples, I'd love to hear of them.

Direct manipulation is often, but not always, linked to excessive skeuomorphism in the interaction model. For example I find it almost painful to play most Vassal modules, with their hyper-direct interaction model of dragging and dropping counters around, manually drawing cards from a deck or rolling dice. Digital boardgames are not the same media as physical boardgames, and should play to their unique strengths. But these are in fact orthogonal concerns, and there's no reason for why a direct manipulation model couldn't also provide useful input and computational abstractions.

Whew, so much for the theory. In this taxonomy Online Terra Mystica is pretty far toward the abstract end, and is fully in the log replay camp. While it has a half-hearted attempt at adding some direct manipulation concepts to the UI, it started off as an indirect system and deep inside that's what it is. It also chooses to merge the input format and the log format into one entity. So what does this mean?

### Feature set

Perhaps the signature feature of the site is the planner. This tool allows the player to enter an arbitrarily long sequence of actions - all the way to the end of the game - and see what the effects would be. Are all the moves valid? Are there sufficient resources available to do all of this? Oh, I don't have enough resources? Well what if I do this on round 5, and delay that action to round 6. In cases where the plan fundamentally depends on the opponents doing something, it's possible for the plan to also contain arbitrary resource adjustments. And finally, since the command language supports comments, these plans can be properly documented so that when you return to them in a day or two, you can remember why you wanted to do these particular moves.

I think this feature is intrinsically linked to the command language as a user interface, and it might actually be unique. There are some games with other kinds of interfaces that allow you to play the game forward, and then undo / rewind / reload. But simply being able to play the game forward is not sufficient to make this a useful tool. It's only the ease of inserting, reordering and deleting moves that makes it possible to use this as a matter of course, rather than only under the most exceptional circumstances.

A somewhat related feature is undo. Inflexibility in allowing moves to be taken back is the bane of many forms of digital boardgames. When playing a game face to face, most groups will generally allow at least some level of taking back moves. In some cases all moves are final immediately (this has always been the primary problem of the otherwise brilliant implementation of Brass at Order of the Hammer). In some other implementations there are distinct checkpoints, for example BGO's Through the Ages allows undoing back to the start of your full turn, but no other rollbacks (clicking 'finish turn' is final, as is any kind of action during an auction or war resolution). These two are, I believe, examples of undo being limited for design reasons. At rr18xx meanwhile rollbacks are possible until the previous action of each player. Here my understanding is that the overriding issue is technical, as the rollback is essentially a full restore to a previous database snapshot, and there are resource constraints on how many snapshots can be kept.

The solution Online TM takes to this is to grant the creator of the game arbitrary powers to edit the history at will, the admin mode. Not only can they undo the last move or couple of moves. If there was a mistake made three moves back, they can go and fix it (and they can fix it without forcing the intervening moves to be redone). This feature is fully tied to a log replay mode of operation. While more limited forms of undoing could be implemented as a reverse log replay from the end state or through state snapshots, this more complete form depends on the log being directly editable. And realistically the log also needs to be the input format; it would not be reasonable to expect the admin to be able to edit a more formal log representation correctly (whether the log format is XML, protocol buffers, JSON, or something else). But in the case where the log format and the move input system match, just playing the game has taught the game admin the necessary skills.

This is a very nice feature for friendly games. It does have downsides though, more on that later in the section on the social implications.

There's also a potential as yet unimplemented feature of pre-programmed actions, that people frequently ask for. "I know exactly what I want to do next turn, why can't I just pre-enter my move". This would be a pretty interesting thing for speeding up games, but to my mind would not be conducive for good play. Circumstances change, often in ways you did not anticipate at all. The only way this could be even remotely usable would be if the language was extended to have some kind of conditional execution. And that's a can of worms I'm interested in opening, and I suspect also a bridge too far for 99% of my users.

It's worth noting that many of the above features are closely tied to a game with no randomness (or at most setup randomness) and no hidden information. As such their existence is something of an anti-feature, preventing other additions to the game.

For a non-hypothetical example, I'm currently thinking about how to implement the faction auction variant from the TM expansion. A full open auction in the beginning would be painfully slow. The most obvious, though still slightly imperfect, solution is a series of blind second price auctions. But this is not a good fit for the site's existing design. The problem is that the blind bid introduces momentary hidden information into the game, and it's possible for that information to leak through either the preview or admin modes. For example the admin could wait for everyone else to bid, peek into the log and see everyone else's bids, and then bid in such a way as to force the winner to pay the maximum amount.

### UX

The most obvious UX consequence of using a command language is that it tends to be harder to learn. The following quote, said partly in jest, certainly contains a kernel of truth:

... has done a bang-up job providing a PBEM Terra Mystica experience that includes just enough extra layers of complexity via the interface and game administration tools to keep TM as confusing as ever, long after you master the actual game!

Non-natural languages are simply not a mode of human computer interaction that most people are comfortable with in this day and age. It actually continues to amaze me that I could get non-programmers to play using this implementation at all. Is it possible to evaluate how big a hurdle this has been for people? The best number I can come up with is that around 20% of the players who joined at least one game never finished even one game without dropping out. Note that these are players who have already jumped through hoops such as email validation during account registration. It's possible that there's some other issue beside the UI that's a problem for these players, but it does seem like the most likely candidate.

A smaller problem is that it essentially forces the introduction of a move preview. For those who haven't played the game, when entering moves you need to first enter the moves, then click 'preview', check that the results match what you want, and finally click 'save' to commit the moves. In a game that uses a direct manipulation paradigm, a preview could be skipped. But with a more obscure UI like here, it's absolutely essential since the move might not have had the intended effect. Whether it's doing the entirely wrong move, picking the wrong tile, building on the wrong location, etc. Even with a preview step somebody will request a rollback on average once or twice a game.

So why do I call this a problem? Because despite my best efforts, especially new players will frequently forget to 'save', leaving the game in a limbo state where they think they've done their move, until some other player gets impatient. (To mitigate this a little, the system will automatically do a 'preview' when using the GUI tools to generate the commands rather than type them. Unfortunately performance problems make it unfeasible to trigger continuous parsing + updates when typing).

A horrible mistake I made in the design of the language was the lack of (mandatory) turn delimiters. Originally my implementation treated each row as a complete turn. This caused more confusion than any other part of the command language. In the end I ended up writing a lot of very complicated code for automatically detecting the turn breaks in a command stream.

But that wasn't actually good enough, there are valid command streams where the splitting isn't unambiguous, e.g. the tunneling ability of dwarves, where `transform E10. build E10`. I had to make an arbitrary choice on that (basically the behavior now is greedy, as many commands as possible are stuffed into the same move). So I had to include the `done` command to allow players to disambiguate in the few cases where it's needed. This is still supremely confusing for people. All of this could have been avoided by taking this into account right at the start.

Finally, one very surprising outcome is that having a compact vocabulary for game actions makes it much easier to display a useful player-readable log of what happened in the game. The typical user-visible log is structured as natural language, and so verbose as to be hard to read especially when trying to piece together the flow of the game after the fact. It's easy to see why that design choice is made, but it's not necessary when all players are almost by definition going to know how to read a more compact representation.

Likewise this makes it really easy to display a concise summary of what has happened in the game since the player last looked at it (done both in the notification emails and the 'recent moves' tab of games).

### Social issues

The unlimited admin access to games has a dark side. Admin malfeasance is rare but I do get about one complaint a month about it. Sometimes these are games where the admin will change their moves after others have already taken moves, rolling the game back by a huge amount, taking over entirely for another player for example forcibly passing them, applying different standards to allowing others to undo vs. doing it themselves, and so on.

This is the kind of drama that I really do not want to deal with, but the general solution is to just mark the game as unrated, and let the players sort out between themselves whether and how the game will continue. And it is a bit of a miracle that it hasn't yet become a more widespread problem, as one might expect to happen for the anonymity + internet combo. If it does ever become intolerable, the solution will almost certainly be to disable admin mode entirely for public games. The TM tournament has already shown that it's at least workable, even if people do occasionally get a little bit screwed by the 'no manual administration' policy.

One consequence of a command language is that everything needs to be named. The map needs to have a coordinate system, every component needs a identifier of some sort, and every interaction needs a short and snazzy name. Old school wargames will do this as a matter of course. Of course every hex has an id! Of course the cards are both numbered and uniquely titled! But not so much for eurogames.

The naming we ended up with on the site is far from optimal, and caused yet more drama due to non-online players feeling excluded from conversations. (If you want to know more, you can see an explanation for where the names came from, and why they won't change). That bit is unfortunate. But at least I actually find real value in having convenient shorthands available for everything, when discussing the game, whether when theorycrafting or conducting some tabletalk on IRC during a game.

### Implementation issues

The obvious problem for a log replay system is performance. Replaying a full game, which is done for almost every operation, can take around 0.15 seconds in the current implementation, with no obvious low hanging fruit to fix. On the current traffic levels server load is not a problem, but I would start to get worried if usage increased by a factor of 10. As discussed above, there are features I'm unwilling to implement due to CPU load concerns. And it is actually causing real development pain for testing (see below).

It's hard to say exactly how much of the CPU overload is related to command parsing, a step that could be avoided with the use of a more structured log format. Some crude profiling suggests that the parsing takes only 5-10% of the runtime, certainly nowhere enough to warrant using a different format.

A rewrite in a language with higher performance implementations than Perl would almost certainly give a factor of 10 improvement on the actual game evaluation code, moving the bottlenecks to IO. But a full rewrite is not in the cards.

Another potential implementation worry is storage. The current DB size is about 250MB. Unlike CPU usage, this is a cost that accumulates over time. Out of that 250MB maybe 75% is used by the game logs. The logs, stored as a sequence of commands, are not a particularly efficient form of encoding the game data. Simple lossless compression could easily compress them by 80-90%. Luckily disk is cheap (this server still has 600GB free), so this should never become a real issue.

Another consequence of a log replay system is that any change in the game evaluation might break existing games. That change might be a bugfix for a place where the effect of a move was miscomputed, it might be extra validation to prevent illegal moves of some kind, cheating prevention, or something else entirely. This is not a theoretical possibility. Basically every single game evaluation change I make, there are already multiple affected games. No matter how elementary a rule is, somebody has already broken it.

Obviously in a stored state implementation changes like this don't matter. The current state is the current state no matter what. But in a log replay system you need to have some story on how to deal with retroactive changes. I can think of the following strategies:

• Punt: Don't make any changes at all.
• Ignore: Just make the change, and don't worry about games breaking or the results changing part way through.
• Delete: Just delete any games that would be broken.
• Fixups: Find all games where the old and new behavior differ, and change the appropriate logs in such a way that the results with the new log and version will be the same as the result with the original log and old version. This change could be manual or automated.
• Versioning: Each game file carries a version number. When making a breaking change, keep both the original and new code paths, and choose one of the two based on the version number. Any newly created games use the new version number and get the fixes, existing games keep their original version number and the original behavior.
• Positive options: Conditionalize the behavior on an option. Turn that option on for new games, as well as any existing games for which the new and old versions behave the same.
• Negative options: Conditionalize the old behavior on an option. Turn that option on only for existing games where the results for old and new versions differ. Never turn the option on for newly created games.

During the lifespan of the site I've used most of these at one time or another. The 'ignore' strategy was appropriate a couple of times (for changes where I decided that the the new behavior was always acceptable, such as situations where a player had ended up overpaying for an action). The 'delete' strategy would be exceptional, the only situations where I used it were games that were aborted, and one case of a single game being completely unsalvageable due to bug abuse by a player. The 'fixup' strategy has the nice benefit that it avoids introducing a new code path, and was my default choice early on. But at this point it'd be an unacceptable amount of manual work, and it's not readily automatable. Especially with the relatively freeform input from the command language. My next default was 'positive options', but after about 3-4 of those I switched to 'negative options'. Positive options had a slightly more complicated rollout procedure, and also permanently clutter up all games, confusing people. ("What's this `strict-darkling-sh` option?").

None of these options are good, in this instance a log replay model does introduce some major costs either to the developer (who has to do extra work) or the users (who have some games screwed up or completely lost).

But it's not all bad! A log replay model makes testing much easier. First, it'd be very easy to write test cases since there is a very natural serialization format for games already, the command language. I don't actually write explicit tests for TM, but for example at work we need absurd amount of infrastructure for making it easy to write unit tests for TCP/IP packet handling. This kind of design gives the test cases for free. Likewise a Age of Steam implementation I was once doodling around with had lots of test cases, but even with the reasonably friendly format (protocol buffers) they were an absolute pain to write due to the boilerplate.

If I don't write unit tests, how do I test? Mostly by side by side testing; I have a small script that runs every single game in the database against both the new and the previous version. It munges the results a bit removing known harmless diffs, and then displays any changes from game to game. I can then look at those games, and decide whether it's indicating some kind of a problem with my change, an expected result of my change, or a problem of some sort in the game. It also acts as a great regression test that prevents failures from creeping in, and is the source of data for finding the games that would be broken by a game, so that one of the fixes discussed in the previous section can be applied.

This has been one of my favorite forms of testing for a long time, and works tremendously well in a case like Online TM where we have access to all games ever played. Thinking specifically of digital boardgames, it's also a model that wouldn't work well without a replayable log. The only problem is, as alluded to above, the CPU usage. Right now a full `diffgame` run takes about 90 minutes of CPU time on a rather beefy machine. Even with parallelization it's not a fast feedback cycle. (Makes me kind of miss being able to just casually run a sxs test on a thousand machines).

### Conclusion

I'm afraid this ended up longer than intended, despite only covering one design decision. It's also a design decision that I feel is overall a win. You'll have to wait for the next post for the embarrassing technical missteps.

## How buying a SSL certificate broke my entire email setup

### Everything looks ok to me, the error must be on your end!

Earlier this week I got a couple of emails from different people, both telling me that they'd just created a new user account, but hadn't yet received the validation email. In both cases my mail server logs showed that the destination SMTP server had rejected the incoming message due to a DNS error while trying to resolve the hostname part of the envelope sender. Since I knew very well that my DNS setup was working at the time, I was already ready to reply with something along the lines of "It's just some kind of a transient error somewhere, just wait a while and it'll sort itself out".

But I decided to check the outgoing mail queue just in case. It contained 2700 messages, going around 5 days back, most with error messages that looked at least a little bit DNS-related.

Oops.

Now, 5 days for this server is usually something like 50-60k outgoing email messages, so those 2700 queued messages represented a pretty decent chunk of traffic. The mail logs suggested that the errors had started weeks ago, around November 12th. And indeed while an `A` query for the hostname was working fine, a `MX` query returned no results.

### I didn't touch anything, it just broke!

I was completely sure that the `MX` setup used to work just fine. And I had not done any kind of DNS changes at all for at least a year. Any computer setup will rot eventually, but it shouldn't be happening quite this fast.

Wait... Was that date November 12th? That's when I bought a new SSL certificate through my registrar, who is also doing my DNS hosting. Hmm... And I even chose to use DNS-based domain ownership validation rather than the 'email a confirmation code to hostmaster@example.com' method, and allowed my registrar to automatically create and delete the temporary authentication record.

Ok, so technically I did make a change, even if it was just to authorize another system to make an automated DNS configuration change on my behalf. But clearly my registrar must have screwed up these automated config changes, and completely deleted the `MX` record along the way!

### That config looks valid, just apply it!

Well kind of, but not really.

When I logged into the DNS management UI yesterday, it turned out that the `MX` record was still there, but it was marked with a validation error complaining about a conflict with the `CNAME`. When I did the original configuration, I'd set up the relevant host with both `MX` and `CNAME` records. That is apparently not best practice and can cause problems with some mail servers. And who am I to argue with that, even if it had seemingly worked for the past year.

I changed the `CNAME` to an appropriate `A` record, the validation error was cleared, and as if by magic my queries were now working. 5 minutes later the outgoing mail queue was draining rapidly.

So clearly my service provider had added some helpful functionality to prevent bogus DNS setups from being exported. The old zone files would still be there working properly during the transition period, but the next time the user would make changes they'd need to fix the setup before it'd be exported.

That's a reasonable design, and I'm sure it would work marvelously most of the time. But in this case the zone file export was actually triggered by an automated process, so there was no way for me to notice that the configuration was now considered erroneous, or to fix it. The DNS host was serving a mostly functional zone; it was just missing one record, and even that one record only mattered for a fraction of my outgoing mail making the problem even harder to spot. (Just a fraction of my mail, since it looks like most mail servers either don't validate the sender domain, or validate it with some other kind of query).

There's a bit of guesswork involved in the last couple of paragraphs. The error can no longer be replicated, since management UI will no longer allow me to create a setup analogous to the original. So it's hard to be completely certain of the mechanisms on that management UI's side of the story. But I'm still fairly confident that this is at least a pretty close approximation of what happened. The timings of me buying the certificate and the start of a spike in DNS-related mail delivery errors match up way too well for any other explanation to be credible.

### Of course frobnicating the wibblerizer could break the dingit! Everyone knows that...

There's all kinds of morals one could draw from this story. Proper monitoring would have detected this immediately, the registrar should have accounted for this corner case, I should maybe not have the default assumption that the other party is at fault when something breaks, you should always check the results of any kind of automated config change when it's done for the first time, and probably many other excellent lessons in either life or systems engineering.

But really I'm just telling this story because I find the endpoints in the chain of causality completely hilarious. In a sensible world my action A really should not have led to the final result B, but it did. It's unfortunate that the title of this blog post ended up looking like linkbait of the worst kind, when it's actually the best 10 word summary I could think of :-)

## A brief history of Online Terra Mystica

### What's this Online Terra Mystica thing?

For the last couple of years my main hobby hacking project (over a thousand commits, and probably an order of magnitude more time spent on it than all other non-work projects combined) has been an asynchronous multiplayer web implementation of the brilliant board game Terra Mystica (Feuerland Spiele, 2012). At the moment it's roughly 2/3 Perl, 1/3 Javascript, and uses Postgres as the data storage.

It's been a fairly successful project for something that was originally intended as a one-off. The usage statistics at the end of November 2014 are:

• Almost 6000 registered users
• About 1200 monthly active users (as in playing at least one game; not passive use like looking at the statistics pages).
• 14000 moves executed on a normal weekday (10000 on weekends)
• 16500 games either ongoing or finished.
• Bi-monthly online TM tournament run by Daniel Åkerlund with 400+ players.
• 1038 commits as of this writing.

This was not supposed to be a general use program. It was originally a one night hack to help keep track of a hand-moderated play-by-forum game of TM, which was obviously headed for failure due to the massive amount of errors people were making while describing their moves in natural language or when manually tracking their resources in the game.

From there the project snowballed, slowly gathering features including just about everything I ever marked in the TODO as being 'out of scope'. Since I often had only very limited amounts of time to work on this, and my expectation was always that the interest in the site would soon fizzle out, the project management method was to always get the maximum short-term bang for the buck.

A project whose direction is literally guided by 'what can I get done in the next two hours' is of course massively path dependent; the early decisions made with very little consideration had outsized influence on where the site ended up. Sometimes the expedient gambles on 'do the simplest possible thing' failed, and the results were just rubbish. At other times things ended up at a slightly odd local maximum. And in some rare cases the gamble turned out to produce wonderful and unexpected results.

### Timeline

Future posts will discuss the actual lessons learned; what didn't work and what did work - both in the mechanics of programming and in the peculiarities of online boardgames. But in this one let's just have a look at the history of the site, how long it took for it to get features that one might consider absolutely necessary, and how amazingly bad user experience people are willing to put up with when it's the only way they can play their favorite game online.

Feel free to skip past the bulleted list if you get bored, it's still a bit long even if I include only changes I consider fairly major (indeed, a lot has to get filtered out given it's 1000+ commits).

2012

• December - Early January: The smallest program that did anything useful related to a game. I'd enter moves into a text file and run the script to produce the final game state as JSON. This JSON was rendered to HTML + Canvas by some Javascript code that was half ripped off from an old project. There was some minimal rules checking and automation, and support for only 5 out of the 14 factions in the game. Users of the current site might want to see the old look.
2013
• January: A rudimentary dynamic web site, implemented simply as a wrapper CGI script around the JSON generator script. After that a clumsy web-based editor was added for game files (a textarea that could be used to edit specific files in a git repository, no authentication except for each game having a random 160 bit identifier as part of the URL). This allowed other people to moderate their own games, as long as I created a game for them and sent the link with the secret embedded. Players would post / email a natural language description of their move to the moderator, who would then enter the moves into the admin tool using the correct syntax. Amazingly some 20 games were run using this insane system, while by all rights the project should have died there.

This version of the software had automation for resolving the effect most game events, but did very little validation to notice completely invalid moves.
• February: Added an ability to easily rewind the game state back to any time in history, to help with post-game strategy analysis. Also added a way for players to enter their own moves (a textarea in the main game view, a preview button and a save button, and some verification to make sure they could only enter their own moves). Again there was no real authentication here, just links with an embedded faction token derived from the per-game secret key.
• March: The hackiest email integration in the world: Store the email addresses players in the same text file with the commands. After a player has entered a move, the software would create a mailto: link with prefilled subject, content and receivers (the other players). The player would clicks on the mailto: link, the email loads up in their mailer (even GMail), and they'd press send.

Compute and display a VP projection on the last round assuming no further moves, to give players some idea of who is really winning.
• April: I continued to resist adding any user management or authentication. But my friend Gareth wanted a better way to manage his ongoing games than a spreadsheet, and wrote a small App Engine site into which players entered their secret game URLs. His site then used my site's API to figure out which games the player needed to act in. And it went even a bit further, by embedding the move entry UI into the same app.

Finally, this month also saw the addition of a statistics page on how often each faction was winning (since balance was a hot topic on the BGG forums of Terra Mystica right from the start), and soon after a list of achieved high scores for each faction and player count.
• May: This month mostly introduced all kinds of stricter validation, as the reduced barrier to entry for playing was causing significantly more illegal moves to be entered (early on players were enthusiasts of the game and thus had good knowledge of the rules; at this point people started to learn the game through the site, which was quite scary).

The main new feature of the month was the 'planner', an alternate text entry box which could be used to enter commands arbitrarily far into the future, and check that the moves are valid and what kind of effect they have. This is useful for example for checking that you have sufficient resources for making certain moves without manual computation. Another use is leaving 'notes to self', so that the player doesn't need to re-evaluate the board for every single move. (Some people were suddenly playing tens of games at a time, so this was a real problem).
• June-August: This time period saw only minor fixes and improvements from the user's point of view. There was a bit of infrastructure work behind the scenes, such as moving the actual game moves into the database, though they still remained just plaintext.
• October: The mini expansion for TM was released at the Spiel fair in Essen. I implemented the new features the very next morning in lobby of my hotel at Essen, with a ChromeBook, a ssh connection to the production server, and and the world's worst WiFi. After some reflection I decided not to make the change visible to the public before getting back home and a more reliable work environment :-)
• November: I finally made the site automatically send email notifications, rather than require players to jump through the fragile mailto: hoops to let other players know whose turn it is. Replacement of the mailto-style notification of moves also required the addition of an in-site chat feature for communication.
• December: Another consequence of the real email support from the previous month was that players no longer needed to expose their email addresses to other players. This finally made it possible to allow players to create 'public games' that anyone can join, rather than only play people with whom they've done some kind of an out-of band email address exchange. (At this point 1500+ games had been started, amazing how far such a kludgy system could go).

At the time 25-30% of moves were being entered from smartphones or tablets. But the move entry interface was typing commands like `'convert 2pw to 2c. upgrade d3 to tp'` into a text box. What's wrong with this picture? :-) In the month we finally got a slightly friendlier UI, though the textual command representation still remained the canonical one.

The site finally got a ranking system: a multi-iteration version of the ELO algorithm, which computed not only player strengths but also faction strengths, and credited good results with the weaker factions more than good results with the strong ones.

Finally, in very late December I went on a big refactoring spree to move the game from CGI scripts to a more persistent application server (FCGI with Plack and CGI::PSGI, but no framework). Eradicating all global data and all modification of literal data structures was way too much work, those were not corners worth cutting in the first place.

The new UI went live a year from starting the project (almost exactly; from December 22nd 2012 to December 21st 2013), and is the point where I'd consider the site to be actually usable by mere mortals.
2014
• February: Support for variant maps, for testing parts of the upcoming Terra Mystica expansion for the designers. I also added a map editor that could import map definitions from Lode's TM AI, which the design team had been using for the map. The online playtest team proceeded to play 100 games with different map versions before the expansion finally went to print.
• April: A bunch of work on the expansion, which was still being kept under wraps. So the support for the new final scoring types and four of the new factions was not visible to most users at this time.

The main user-visible change was automatically dropping players from games after a week of inactivity, to support the inaugural season of the online Terra Mystica tournament. People's irritation about others playing slowly had been constant ever since the addition of public games (95% of my games are private with a few separate groups of friends, so I'm pretty isolated from this myself). Unfortunately this change appears did not appear to help enough.

This month also saw the addition of individual profile pages, showing all kinds of statistics for each player (games started, finished, performance with given factions, performance and play counts against specific opponents, etc).
• September:The next attempt at reducing the anguish caused by slow players was to allow setting shorter move timers than the default one week (from 12 hours to 14 days). Lots of people started 12 hour deadline games, and moved on to complaining about so many people dropping out. Sometimes you just can't win.
• October:Public support for the two new expansion maps, as well as the new final scoring types.
• November:Public support for all six new factions from the expansion, as well as the variable turn order variant.

I find it interesting that it really did basically take a year of real time (and maybe 2 months of hacking time) before the implementation was in a shape where I would've thought about publishing it. And there's no way I'd put that amount of time into a project like this up front. Usually these projects are active for a couple of weekends before getting abandoned; fun parts are done but all the hard work of making it really usable remains.

In this case people were eager to use even the incredibly crude early versions, so I got over that hump very quickly. And at that point every incremental improvement to the site was affecting tens, hundreds, or thousands of people. This is of course always more motivating than working on polishing the perfect piece of software that nobody is using.

There were many architectural and design decisions done along the way that I ended up deeply regretting, and which cost me lots of time later on. But without all those early shortcuts there would've been no implementation at all. Easily the best example of Worse is Better that I've been personally involved with.

## HTTPS is taking over in mobile (over 35% of traffic in some networks)

### Summary

This is a post in two parts. The first part discusses the current state and historical trends of HTTPS adoption in traffic carried over mobile networks. The second part explains why this change might well be beneficial to mobile operators, for whom the transition from unencrypted to encrypted traffic appears potentially harmful.

### What's the traffic share of HTTPS?

Based on measurements from a number of mobile operators around the world, a typical 3G or LTE network currently carries around 20-25% HTTPS/SSL traffic. In some networks we've seen SSL traffic shares of over 35%. The transition has been very rapid, and given the the technological and social trends, it seems likely to continue. (For example just this week's announcment of Let's Encrypt, which removes any remaining barriers to entry, could matter a lot for the long tail of websites.)

When we did the first deployments of Teclo's product for mobile TCP optimization around 4 years ago, the traffic mix was consistently dominated by HTTP. At the time a typical split would be something like 95% HTTP, 2-3% SSL, and the remaining 2-3% would be split among other kinds of TCP traffic and UDP. (I have heard that operators in some geographical regions are seeing much higher levels of UDP traffic than that).

Over time, there was at best a very slow trend for SSL replacing HTTP, maybe a percentage point a year, such that 5% would have been a typical proportion of SSL traffic even 2 years later. But in 2013 something changed, and by the end of the year the average network was 10-15% SSL. The share has been increasing rapidly ever since. In the most extreme case I'm aware of, SSL traffic grew from X% to (X+10)% of the total volume in three months between us doing a trial in late 2013 and a final deployment in early 2014.

(Note: All the above numbers are specifically for downlink traffic. Uplink traffic is much more encryption-heavy. The SSL traffic share for uploads can easily be over 70%. But my experience is that in mobile networks downlink traffic volumes are 10x higher than uplink volumes.)

These numbers don't exactly line up with other reports I've seen. For example the numbers from Sandvine suggest much lower SSL shares, somewhere in the 6-8% range. My guess is that in that study some SSL traffic is being categorized as specific application traffic based on IP addresses. Since these things can vary a lot between countries and networks, I'd be very interested in hearing of any other public sources for this kind of information.

### HTTP handling in mobile networks

So what does the rise of SSL mean for mobile networks? Theoretically it could mean nothing; it's all just TCP in the end. But actually many operator networks contain amazing amounts of HTTP-specific or HTTP-aware nodes. Caching, legal intercept / request logging, DPI, video compression or pacing, image compression, ad insertion, header enrichment, and probably many other types that I'm not familiar with. The presence of all these nodes makes perfect sense in a world where 95% of the traffic is HTTP.

To a first approximation none of these boxes are going to be useful in an encrypted world. The arguments in favor of HTTPS (example) concentrate mostly on why encrypted traffic is a win for both the end user and the content provider. You will after all not find many users who are happy at being tracked through header enrichment, or content providers who like having ads (or other content) inserted into their websites by a middlebox.

Now, of course from an operator's point of view most of these network elements either made business sense at some point (for example reducing infrastructure costs through caching and compression), or they are legally required. One could therefore expect that the effective obsoletion of these boxes would be harmful to the operator. And that might be true, I certainly don't know enough to estimate how important each of these devices is to the bottom line.

But I do know a bit about TCP performance problems in mobile networks. And at least in that area I believe that many operators should be very happy to see a move away from HTTP. Because in an astoundingly high percentage of networks, the single network element that's most harmful to network performance is some kind of HTTP proxy. Below is a list of some of the problems introduced by HTTP-handling core network nodes that we've seen, many in multiple networks:

• No support for TCP window scaling on one or both of directions of traffic.
• Retransmits of tens of kilobytes of data instead of one segment on a retransmit timout (which commonly happen in radio networks). 1.5% of the radio capacity of this network was wasted on the spurious retransmits from this node.
• Using New Reno as the congestion control algorithm in networks with high levels of random packet loss.
• An initial TCP congestion window of 2, with no configuration options to increase it.
• No MTU clamping on the mobile side of the proxy, resulting in an effective MSS higher than 1380 bytes. This causes the GTP tunneling on the Gn interface to split each full size TCP packet into multiple UDP packets, which can cause up to 5% waste of backhaul capacity.
• Mismatching MTU sizes on the mobile and internet facing interfaces, causing a high proportion of tiny packets mixed in with maximum size packets, due to the packet sizes on the two interfaces not being multiples of each other. In the best case this introduces just a small amount of header overhead. In the worst case it introduces massive amounts of reordering.
• Mysterious delays of up to half a second between GET request being received by the terminating proxy, and being forwarded to the server. (Possibly due to a name resolution step that a well-done transparent proxy shold not need in the first place).
• Stricter or buggy HTTP header parsing in middlebox, causing a request that would be accepted by both client and server to be rejected.

Some of these problems would have been absolutely crippling to network performance, with connections bypassing the middlebox being twice as fast as those going through it. Others would just have been wasting resources but in a way that'd be very hard to track down.

## TCP is harder than it looks

In theory it's quite easy to write software that deals with TCP at the packet level, whether it's a full TCP stack or just a packet handling application that needs to be TCP-aware. The main RFCs aren't horribly long or complicated, and specify things in great detail. And while new extensions appear regularly, the option negotiation during the initial handshake ensures that you can pick and choose which extensions to support. (Of course some of the extensions are in effect mandatory -- a TCP stack without support for selective acknowledgements or window scaling will be pretty miserable.)

If all you want to do is to e.g. load the frontpage of Google, writing a TCP stack can be a one afternoon hack. And even with a larger scope, if all you need to do is to connect to arbitrary machines running vanilla Linux, BSD, or Windows on the same internal network, you can fairly quickly validate both interoperability and performance.

But since TCP looks so easy to implement, there are a lot of implementations around. Some full TCP stacks, some TCP mangling middleboxes, and some that are simply trying to track the state of TCP connections such as firewalls.

At work our TCP stack doesn't just need to interoperate with just a limited number of top operating systems. We handle hundreds of terabytes of traffic every day, with a traffic mix that's not really under our control. In practice it's completely arbitrary traffic, dealing with any device that could possibly get connected to a cellular network or any device that might have a public IP. Under those circumstances you basically have to be bug-compatible with everything.

There's some well established folklore about which areas tend to be buggy in other systems, and that you thus need to be particularly careful with. For example TCP option ordering and alignment is a common source of such problems, to the extent that at some point you might as well just use the same exact option placement as Linux or Windows, on the assumption that even the sloppiest firewall vendor will have tested at least against those systems!

Zero windows are another frequent source of grief, to such lengths that at multiple mobile operators the technical staff have quizzed us extensively on our use of zero windows. I don't quite know why zero windows have that reputation, but we have definitely seen that class of problems in the wild occasionally (for example the FreeBSD problem from a few years back was very annoying).

But here's a new one we saw recently, which was good for an afternoon of puzzling and is a case that I hadn't heard any scary stories about. A customer reported failures for a certain website when using our TCP implementation, but a success when using a standard one. Not consistently though, there were multiple different failure / succcess cases.

Sometimes we were seeing connections hanging right after the handshake; the SYNACK would have no options at all set (a big red flag), advertised a zero window, and the server would never reply to any zero window probes or otherwise open any window space:

```19:53:40.384444 IP 10.0.1.110.34098 > x.x.x.x.443: Flags [S], seq 2054608140, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
19:53:40.779236 IP x.x.x.x.443 > 10.0.1.110.34098: Flags [S.], seq 3403190647, ack 2054608141, win 0, length 0
19:53:40.885177 IP 10.0.1.110.34098 > x.x.x.x.443: Flags [S], seq 2054608140, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
19:53:41.189576 IP 10.0.1.110.34098 > x.x.x.x.443: Flags [.], ack 1, win 29200, length 0
19:53:41.189576 IP 10.0.1.110.34098 > x.x.x.x.443: Flags [.], ack 1, win 29200, length 0
19:53:42.189892 IP 10.0.1.110.34098 > x.x.x.x.443: Flags [.], ack 1, win 64000, length 0
19:53:43.391186 IP 10.0.1.110.34098 > x.x.x.x.443: Flags [.], ack 1, win 64000, length 0
19:53:44.832112 IP 10.0.1.110.34098 > x.x.x.x.443: Flags [.], ack 1, win 64000, length 0
```
Other times the SYNACK would be a lot more reasonable looking, and the connection would work fine:
```19:29:16.457114 IP 10.0.1.110.33842 > x.x.x.x.443: Flags [S], seq 1336309505, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
19:29:17.264497 IP x.x.x.x.443 > 10.0.1.110.33842: Flags [S.], seq 2619514903, ack 1336309506, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 6], length 0
19:29:17.264556 IP 10.0.1.110.33842 > x.x.x.x.443: Flags [.], ack 1, win 229, length 0
19:29:17.265665 IP 10.0.1.110.33842 > x.x.x.x.443: Flags [P.], seq 1:305, ack 1, win 229, length 304
19:29:18.059278 IP x.x.x.x.443 > 10.0.1.110.33842: Flags [.], ack 305, win 995, length 0
19:29:18.087425 IP x.x.x.x.443 > 10.0.1.110.33842: Flags [.], seq 1:1461, ack 305, win 1000, length 1460
```
And there were also occasions where we'd get back two SYNACKs with different sequence numbers, which of course didn't always work too well:
```19:37:41.677890 IP 10.0.1.110.33933 > x.x.x.x.443: Flags [S], seq 2689636737, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
19:37:41.877046 IP 10.0.1.110.33933 > x.x.x.x.443: Flags [S], seq 2689636737, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
19:37:42.076611 IP x.x.x.x.443 > 10.0.1.110.33933: Flags [S.], seq 3107565270, ack 2689636738, win 0, length 0
19:37:42.275471 IP x.x.x.x.443 > 10.0.1.110.33933: Flags [S.], seq 3109157454, ack 2689636738, win 0, length 0
```

You might be able to guess the problem just from the above traces, but actually verifying it required quite a few attempts with slightly tweaked parameters to find the boundary conditions. Who knew that there are systems around that can't handle receiving a duplicate SYN? The three different behaviors seem to correspond with no SYN being retransmitted, the retransmission arriving to the middlebox before it emits a SYNACK, and the retransmission arriving after the middlebox has emitted a SYNACK.

The middlebox was located in Australia, but most likely that IP was just a loadbalancer, transparent reverse proxy, or some similar form of traffic redirection with a real final destination somewhere in the US. When being accessed from Europe, this resulted in an aggregate RTT of something like 450-550ms. Our TCP implementation has a variable base SYN retransmit timout, and in this case it was roughly 500ms. So most of the time the page load would fail with our TCP stack, but succeed with an off the shelf one that had a SYN retransmit timeout of 1 second.

(I said above that I had not heard any scary stories on this, which of course does not mean those scary stories don't exist. After figuring out the root cause, it was easy enough to find more reports of connection breakage due to SYN retransmits, for example this one involving satellites).

It's easy to see how the developers of that unidentified piece of traffic redirecting kit missed a bug in this bit of functionality. Outside of 2G cellular connections, satellites communications, or networks suffering from extreme buffer bloat, it's rare to see RTTs that are long enough to trigger a SYN retransmit. Heck, in this case we were most likely talking of the packets going round the world the long way around.

But from my point of view this is a particularly annoying bug. It is by definition triggered before we have any information at all on the other end of the connection, so there's no possible heuristic we could use to conditionally disable the offending feature just for hosts that are at risk. The options are either to tell a customer that some traffic won't work (which is normally unacceptable, even if the root cause is undeniably on the other end), or to water down a useful feature a little bit to at least fail no more often than the "competition" does.

And it's the slowly aggregating pile of cases like this that makes dealing with TCP hard in practice, no matter how simple it looks to start with.

## Comparison of Intel and CloudFlare zlib patches

These days I care a bit about zlib deflate speed for work. Last week CloudFlare made some noise about having 'rewritten gzip' for massive performance improvements. Which was perhaps a bit exaggerated, since it turns out to mostly be vectorization changes similar to those included in the earlier zlib patches from Intel.

The changes from Intel are somewhat more extensive, since they include two completely new deflate strategies ('quick' for use at level 1, 'medium' for use at levels 4-6). There's a higher risk of bugs with such changes than with 'just' vectorizing existing scalar code. But those new strategies can be disabled at compile time, so even if that's a worry it would've been nice if the latter changes had been built on top of the earlier ones. (The most likely reason is probably not knowing about it, but then it becomes a case of a single web search saving a few days of programming. Intel's changes were pretty well publicized last year and have been going through the zlib mailing list).

Anyway, since there are now two separate zlib performance forks, I thought it'd be useful to compare them a bit.

The summary is that CloudFlare's fork is much faster with decompression and a little faster with high compression levels, while Intel's is faster with low and medium compression levels.

It seems likely that one could get the best of both worlds. At least a superficial analysis suggests that many of the changes don't conflict in principle (even if they often conflict in practice, due to messing with the same areas in the code). For example the decompression speedups in CloudFlare's version appear to come mostly from a SSE-optimized CRC32 implementation. Intel's version also includes similar optimized CRC32 code, but as a separate entry point used only for compression (doing a combined memory copy and checksum).

As for production use, the new compression strategies in Intel's version had some early bugs. They are fixed now, and at least in our internal testing we haven't seen any new problems. So we'll probably switch to it in our next release. CloudFlare's changes are more conservative in that sense. Unfortunately there's a license time-bomb in that version, since it now includes a GPLv2 source file. That matters to me, but of course won't be an issue for some.

### Details

The test script checks out and compiles the specified zlib versions, compiles them, and then uses the generated `minigzip` binaries to compress and decompress a few test files, possibly at different compression levels. When testing decompression, all zlib versions are tested using the same compressed file, generated using the baseline version at the default compression level.

Each benchmark run (of 10 compressions or 200 decompressions) was repeated 5 times, with different zlib versions being interleaved to eliminate any systematic biases (e.g. system load or thermal throttling). The numbers reported in the following tables are the mean and the standard error of the execution times.

The test corpus included English text in HTML format (The Count of Monte Crisco), a x86-64 executable, a basically uncompressable jpeg file, and the pixel data from a RGB image after png compression filtering had been applied.

The benchmarks were compiled using gcc 4.8.2 and run on a i7-4770, using the following command line.

```CFLAGS='-msse4.2 -mpclmul -O3' perl bench.pl --output-format=json --output-file=results.json --compress-levels=1,3,5,7,9 --compress-iters=10 --decompress-iters=200 --recompile
```

(Note the CFLAGS if you want to rerun the test. At least on my machine the configure script in CloudFlare's version doesn't autodetect SSE 4.2 / PCLMULQDQ support without those flags.)

### Results

#### Compression level 1

This is essentially a comparison between the original 'fast' deflate strategy and the new 'quick' one. Which is to say, these results aren't very comparable at all. Basically the Intel version adds a completely new point to the performance vs. compression ratio curve (trading a fairly big amount of compression for an even bigger speedup). It seems like a worthwhile option to have.

 baseline cloudflare intel compress executable -1 (10 iterations) Compression ratio 0.37 0.37 0.46 Execution time 0.84s±0.00 (100%) 0.59s±0.00 (70%) 0.30s±0.01 (36%) compress html -1 (10 iterations) Compression ratio 0.39 0.39 0.54 Execution time 0.44s±0.00 (100%) 0.33s±0.00 (73%) 0.22s±0.02 (49%) compress jpeg -1 (10 iterations) Compression ratio 1.00 1.00 1.05 Execution time 0.68s±0.01 (100%) 0.58s±0.00 (85%) 0.25s±0.00 (36%) compress pngpixels -1 (10 iterations) Compression ratio 0.17 0.17 0.23 Execution time 0.51s±0.00 (100%) 0.34s±0.00 (66%) 0.18s±0.00 (35%)

#### Compression level 3

This is very close to a like-for-like comparison, since all library versions are using the same strategy ('fast') with the same parameters. The results aren't identical, but the achieved compression ratios are essentially the same. Intel's version is marginally faster.

 baseline cloudflare intel compress executable -3 (10 iterations) Compression ratio 0.35 0.35 0.36 Execution time 1.21s±0.01 (100%) 0.79s±0.00 (64%) 0.77s±0.01 (63%) compress html -3 (10 iterations) Compression ratio 0.36 0.36 0.35 Execution time 0.70s±0.00 (100%) 0.54s±0.00 (76%) 0.46s±0.00 (66%) compress jpeg -3 (10 iterations) Compression ratio 1.00 1.00 1.00 Execution time 0.68s±0.00 (100%) 0.59s±0.00 (86%) 0.60s±0.00 (87%) compress pngpixels -3 (10 iterations) Compression ratio 0.15 0.15 0.16 Execution time 0.95s±0.00 (100%) 0.61s±0.00 (63%) 0.48s±0.00 (50%)

#### Compression level 5

This is a comparison of the new 'medium' strategy to the old 'slow'. The compression ratios are essentially the same. Intel's version is significantly faster on the compressible data, slower on the uncompressable data.

 baseline cloudflare intel compress executable -5 (10 iterations) Compression ratio 0.33 0.33 0.34 Execution time 1.76s±0.01 (100%) 1.20s±0.00 (68%) 0.99s±0.01 (56%) compress html -5 (10 iterations) Compression ratio 0.34 0.34 0.33 Execution time 1.08s±0.00 (100%) 0.85s±0.00 (79%) 0.57s±0.01 (52%) compress jpeg -5 (10 iterations) Compression ratio 1.00 1.00 1.00 Execution time 0.69s±0.00 (100%) 0.61s±0.00 (88%) 0.75s±0.00 (108%) compress pngpixels -5 (10 iterations) Compression ratio 0.14 0.14 0.14 Execution time 1.35s±0.01 (100%) 0.85s±0.00 (62%) 0.64s±0.00 (47%)

#### Compression level 7

Another like-for-like comparison, this time with the 'slow' strategy. Both optimized versions are noticeably faster than the original, but CloudFlare's version is marginally faster.

 baseline cloudflare intel compress executable -7 (10 iterations) Compression ratio 0.33 0.33 0.33 Execution time 3.90s±0.01 (100%) 2.62s±0.01 (67%) 2.73s±0.01 (69%) compress html -7 (10 iterations) Compression ratio 0.33 0.33 0.33 Execution time 1.89s±0.01 (100%) 1.58s±0.01 (83%) 1.66s±0.01 (87%) compress jpeg -7 (10 iterations) Compression ratio 1.00 1.00 1.00 Execution time 0.69s±0.00 (100%) 0.61s±0.00 (88%) 0.61s±0.01 (88%) compress pngpixels -7 (10 iterations) Compression ratio 0.13 0.13 0.13 Execution time 4.09s±0.01 (100%) 2.74s±0.00 (67%) 3.17s±0.01 (77%)

#### Compression level 9

Basically the same as level 7.

 baseline cloudflare intel compress executable -9 (10 iterations) Compression ratio 0.33 0.33 0.33 Execution time 9.88s±0.01 (100%) 7.49s±0.01 (75%) 7.72s±0.01 (78%) compress html -9 (10 iterations) Compression ratio 0.33 0.33 0.33 Execution time 2.94s±0.01 (100%) 2.55s±0.00 (86%) 2.63s±0.01 (89%) compress jpeg -9 (10 iterations) Compression ratio 1.00 1.00 1.00 Execution time 0.69s±0.00 (100%) 0.61s±0.01 (89%) 0.60s±0.00 (87%) compress pngpixels -9 (10 iterations) Compression ratio 0.12 0.12 0.12 Execution time 27.42s±0.03 (100%) 20.19s±0.04 (73%) 22.06s±0.02 (80%)

#### Decompression

CloudFlare's version decompresses a lot faster than either of the other two.

 baseline cloudflare intel decompress executable (200 iterations) Execution time 5.48s±0.01 (100%) 4.71s±0.02 (85%) 5.51s±0.02 (100%) decompress html (200 iterations) Execution time 3.43s±0.04 (100%) 3.01s±0.07 (87%) 3.42s±0.04 (99%) decompress jpeg (200 iterations) Execution time 2.43s±0.12 (100%) 1.39s±0.25 (57%) 2.48s±0.15 (102%) decompress pngpixels (200 iterations) Execution time 3.68s±0.03 (100%) 2.97s±0.03 (80%) 3.70s±0.03 (100%)

## Summary

An awesome, somewhat addictive, fairly dark and finished free novel about people with superpowers, published as a very long web serial. I recommend this to anyone who generally likes SF, and either has plenty of free time or sufficient self control to only read a few chapters a day.

## Longer review

Worm by Wildbow (John McCrae) is a superhero web serial that Christophe pointed me at. He sold it to me roughly as "it's about superheroes who use their powers in a mostly intelligent and often surprising manner". Which was vague enough to spoil nothing and specific enough to make me have a look.

I came to work the following day extremely blurry-eyed, having read until the morning and only having time for a a couple of hours of sleep before I had to be back at work. And that was kind of the pattern for the next couple of weeks, because this thing is long, and sufficiently compelling that most of my free time was spent reading it. It's all the more impressive for being written by one guy, part time, updating 2-3 times a week without fail, over a couple of years.

There's a couple of separate bits to Worm that I find interesting. First, there's the question of the work itself. And then there's the publishing / business model. I'll discuss these in that order.

### The Work

The world building is absolutely top notch. Incredibly imaginative, and mostly internally consistent. I like the characters (even when I don't like the persona, I like the characterization). A decent comparison would be the early Vorkosigan books by Bujold. I love the way the protagonist's angle of figuring out creative ways to abuse her superpower, and her continuing annoyance at other superheros not doing it. It's got a bit of Harry Potter and the Methods of Rationality feel to it (but doesn't suffer from the many weaknesses that HP:MoR has as a story rather than as a rationalist screed).

McCrae is very skilled at setting up cliffhangers. That's probably a requirement for the web serial format where the most important part is to make sure people come back regularly for their next fix. But it's really dangerous for a completed work. It never feels like there's a satisfying place to stop :-)

As a particular rarity, I love the ending. So many epic scifi or fantasy tomes either end up with a whimper, or end up with supposedly cosmic nonsense that I just can't make any sense of. Worm avoids this completely. It builds up to an awesome ending sequence, which is clearly foreshadowed right from the beginning but that I never would've guessed would actually happen. And then it even resolves everything in a very satisfying and fitting way. I kind of wish the epilogues hadn't been written, or at least that I hadn't read them. But I will say no more about that, and understand why the epilogues exist and why some (most?) people would like them, it's purely a matter of taste.

There are some bad parts as well.

The book has not been professionally edited. I have no major complaints about the language, though there are some annoying verbal ticks, and I'm not a huge fan of the longer dialogues. As a more serious matter many parts of the story could be tightened significantly. One friend commented that during particularly combat heavy sequences he'd start just reading the first sentence of each paragraph, because the action felt like it was moving at one second per 5 paragraphs. I don't agree entirely, but some sections were a bit more of a slog. It feels like a ruthless editor could do the book a big service by cutting away the right 20%.

It's also a very dark and story. Can you imagine something bad happening to someone? There's a pretty good chance it'll happen to a major character, or happened as part of their backstory. Good things happen, but much less often. It's not casual cruelty, nor any kind of misery porn (unlike e.g. Song of Ice and Fire seemed to devolve to). But it is a bleak and potentially quite depressing world. Definitely wouldn't let kids read this.

But anyway, overall I really like it, and recommend it with the above caveats.

There's been a lot of talk about what's going to happen to the publishing industry in the future (see for example the related posts on the blog of Charlie Stross, both by him and guest bloggers). And there have been all kinds of experiments in that space. Worm is at the extreme end of those experiments.

This is the first web serial I've ever come across that felt like an actual business rather than hobbyist fanfiction. (I gather from some interviews that it wasn't set up as such, but just a way for the author to force themselves to stick to one project). It maybe should not be a surprise that the model exists. A bunch of people appear to be able to make a living from long running webcomics (geez, have I really been reading Sluggy Freelance for almost half my life?). Why wouldn't it be possible for a novelist to do the same?

One major difference in the business model is that Worm ran entirely on donations, not on ads. This is quite admirable. Another difference is that from the outside it feels like the amount of continuous work going into Worm must have been much higher than for the typical "3 panes, 6 times a week" webcomic. And finally, while I don't know the full financials of the operation, part of the donations are done over the recurring donation system Patreon, and those donations are public. Currently it's at a recurring \$1200 per month. Maybe people are uncomfortable with that method (I certainly was, and used PayPal instead). But at least based on the public information, the work/donations combination doesn't really look sustainable, which is a real shame.

But then again, authors being published for the first time through traditional channels are unlikely to make a living just by that either. And while it seems likely that the ARPU of a web serial must be pretty low, it must be way easier to reach new readers than in traditional publishing. I buy very little fiction from new authors these days. There's just no point since I already have a big backlog of books from authors I'm a fan of. From my point of view a publisher's brand gives absolutely no marketing benefit, or even credibility, to a new author. The only way I'll find new authors is through positive word of mouth, and a web serial is clearly much more viral than a book, whether physical or e-book, as long as the author can hook the reader within the first 10 minutes.

This feels like the future. It might be a depressing future, but then again that's what every kind of business feels like these days.

McCrae has already started a new serial Pact.