In the previous post in this series, I promised to discuss in detail some of the positive and negative consequences of the less conventional design choices of my online Terra Mystica implementation. If you have no idea of what that is, reading at least the intro of that post might be a good idea. This post will just deal with one design choice, but it's the elephant in the room: the command language.

The canonical internal representation of a game in my TM implementation is as a sequence of rows, each describing a some number of player actions specified in an ad hoc mini language, or administrative commands that change the game setup in some way (for example setting game options, or dropping a player from the game partway through). This is what it might look like:

yetis: action ACT4
cultists: upgrade E6 to TE
cultists: +FAV6
giants: Leech 3 from cultists
giants: pass BON4
yetis: Leech 2 from cultists
cultists: +WATER
dragonlords: Decline 2 from cultists
dragonlords: dig 1. build G6
yetis: send p to EARTH
cultists: action FAV6. +AIR
dragonlords: pass BON7
yetis: upgrade E7 to TE. +FAV11
giants: Leech 3 from yetis
dragonlords: Leech 2 from yetis
cultists: Leech 2 from yetis

That's a short excerpt from the middle of a random game. A full game generally runs for about 400 rows.

What do I mean by this being the canonical internal representation? Only a few parts of the game state are actually persisted separately in the DB; these are things that might almost qualify as metadata, such as whose turn is it to move, is the game still running, and what were the final rankings of a finished game. But in general the only way to find out the current state of the game is to evaluate the whole sequence of commands from start to finish. This is in fact done for almost every operation on the site (viewing a game, previewing a move, saving a move, viewing the or editing the game in an admin mode, and so on).

In addition to being the canonical internal representation, the command language is also the canonical user interface; the fundamental operation players do is enter new rows into the command sequence. Often this is done by writing the commands manually, though there are GUI shortcuts of one form or another available for almost all operations.

This might sound like a slightly insane way of doing things, but it does have some benefits as well. I've made several digital board game adaptations of varying levels of completeness over the years, used tens of other ones, and this solution hits the closest to my personal sweetspot.

A taxonomical diversion

Before discussing the fallout of this design decision in more detail, it's probably useful to do a quick tour of some of the main axes in the design space. (I'm of course just describing the extremes, while in the real world most examples would fall on a continuum).

First, there's the question of the interaction model which might be abstract or skeuomorphic. In a skeuomorphic design the player doing input on a computer would still be mimicking the actions of someone playing the game with physical pieces and no computer assistance.

In an abstract design the player would only input the parts of the move that are necessary to uniquely distinguish it from other possible moves, with any bookkeeping and mandatory intermediate steps being carried out automatically. Likewise in a skeuomorphic design the software provides information through the same methods as the original physical game, while an abstract design will automate some of the mechanical parsing of the game state. Or even just the question of using the graphical assets of the original game, generally optimized for sales, versus using digital-first assets optimized for clarity.

As an example of this axis, in the 18xx series of games a substantial amount of playtime is spent computing the exact routes of a number of trains on a complex rail network. I'm aware of three solutions that are actually in use, and there is a fourth plausible one, in order from least to most abstract:

  • The user manually decides on the routes, computes their values with no computer assistance, and those values are used with no validation. Examples: ps18xx, early versions of Rails.
  • The user enters valid routes through a user interface. The software computes the values of the routes, and distributes the income from the company appropriately. Example: rr18xx.
  • In games with requirements that all routes must be optimal, the software could compute an optimal route but only for the purpose of rejecting any manually computed unoptimal ones. Examples: None. (Though it's similar to what's done in the SlothNinja implementation of Indonesia, a game that probably counts as an honorary 18xx)
  • The software automatically finds an optimal set of routes and computes their values. Examples: The ancient DOS-based 1830 from Simtex, recent versions of Rails.

My own tastes run toward maximum abstraction, I've rarely if ever seen a digital boardgame conversion that needed to be more skeuomorphic. But this is not a universal view. There are definitely people who will refuse to play a conversion that does not use the same graphics as the physical version. Or who will strenuously argue against automatic finding of optimal routes in 18xx, on the basis that being evaluating routes is a core skill in the game when making decision about route building, and that skill can only be acquired by getting sufficient practice in manual route computation.

A second axis is the internal representation, which could be based on either log replay or stored state. In a log replay system the game is stored as a series of steps from the starting setup to the current state. In a stored state system the game is stored as the current values of all pieces of the game. How much money does every player have, which round is it right now, what's in this exact space on the map, and so on.

A third axis is the input model. Moves could be entered either through direct or indirect manipulation. In a system using direct manipulation, the player would for example see a graphical display a map and be able to click or drag on a unit to enter a move for it. In an indirect system the player observes the game state in one place, and enters their moves using some completely unrelated system.

I think most digital boardgames use a direct input model, but there are also a fair number that have a menu-driven system of some sort. The only examples I know of that go a bit further with indirection by providing a command language are my ancient Paths of Glory mapper and the even older Diplomacy PBEM judges. If you have other examples, I'd love to hear of them.

Direct manipulation is often, but not always, linked to excessive skeuomorphism in the interaction model. For example I find it almost painful to play most Vassal modules, with their hyper-direct interaction model of dragging and dropping counters around, manually drawing cards from a deck or rolling dice. Digital boardgames are not the same media as physical boardgames, and should play to their unique strengths. But these are in fact orthogonal concerns, and there's no reason for why a direct manipulation model couldn't also provide useful input and computational abstractions.

Whew, so much for the theory. In this taxonomy Online Terra Mystica is pretty far toward the abstract end, and is fully in the log replay camp. While it has a half-hearted attempt at adding some direct manipulation concepts to the UI, it started off as an indirect system and deep inside that's what it is. It also chooses to merge the input format and the log format into one entity. So what does this mean?

Feature set

Perhaps the signature feature of the site is the planner. This tool allows the player to enter an arbitrarily long sequence of actions - all the way to the end of the game - and see what the effects would be. Are all the moves valid? Are there sufficient resources available to do all of this? Oh, I don't have enough resources? Well what if I do this on round 5, and delay that action to round 6. In cases where the plan fundamentally depends on the opponents doing something, it's possible for the plan to also contain arbitrary resource adjustments. And finally, since the command language supports comments, these plans can be properly documented so that when you return to them in a day or two, you can remember why you wanted to do these particular moves.

I think this feature is intrinsically linked to the command language as a user interface, and it might actually be unique. There are some games with other kinds of interfaces that allow you to play the game forward, and then undo / rewind / reload. But simply being able to play the game forward is not sufficient to make this a useful tool. It's only the ease of inserting, reordering and deleting moves that makes it possible to use this as a matter of course, rather than only under the most exceptional circumstances.

A somewhat related feature is undo. Inflexibility in allowing moves to be taken back is the bane of many forms of digital boardgames. When playing a game face to face, most groups will generally allow at least some level of taking back moves. In some cases all moves are final immediately (this has always been the primary problem of the otherwise brilliant implementation of Brass at Order of the Hammer). In some other implementations there are distinct checkpoints, for example BGO's Through the Ages allows undoing back to the start of your full turn, but no other rollbacks (clicking 'finish turn' is final, as is any kind of action during an auction or war resolution). These two are, I believe, examples of undo being limited for design reasons. At rr18xx meanwhile rollbacks are possible until the previous action of each player. Here my understanding is that the overriding issue is technical, as the rollback is essentially a full restore to a previous database snapshot, and there are resource constraints on how many snapshots can be kept.

The solution Online TM takes to this is to grant the creator of the game arbitrary powers to edit the history at will, the admin mode. Not only can they undo the last move or couple of moves. If there was a mistake made three moves back, they can go and fix it (and they can fix it without forcing the intervening moves to be redone). This feature is fully tied to a log replay mode of operation. While more limited forms of undoing could be implemented as a reverse log replay from the end state or through state snapshots, this more complete form depends on the log being directly editable. And realistically the log also needs to be the input format; it would not be reasonable to expect the admin to be able to edit a more formal log representation correctly (whether the log format is XML, protocol buffers, JSON, or something else). But in the case where the log format and the move input system match, just playing the game has taught the game admin the necessary skills.

This is a very nice feature for friendly games. It does have downsides though, more on that later in the section on the social implications.

There's also a potential as yet unimplemented feature of pre-programmed actions, that people frequently ask for. "I know exactly what I want to do next turn, why can't I just pre-enter my move". This would be a pretty interesting thing for speeding up games, but to my mind would not be conducive for good play. Circumstances change, often in ways you did not anticipate at all. The only way this could be even remotely usable would be if the language was extended to have some kind of conditional execution. And that's a can of worms I'm interested in opening, and I suspect also a bridge too far for 99% of my users.

It's worth noting that many of the above features are closely tied to a game with no randomness (or at most setup randomness) and no hidden information. As such their existence is something of an anti-feature, preventing other additions to the game.

For a non-hypothetical example, I'm currently thinking about how to implement the faction auction variant from the TM expansion. A full open auction in the beginning would be painfully slow. The most obvious, though still slightly imperfect, solution is a series of blind second price auctions. But this is not a good fit for the site's existing design. The problem is that the blind bid introduces momentary hidden information into the game, and it's possible for that information to leak through either the preview or admin modes. For example the admin could wait for everyone else to bid, peek into the log and see everyone else's bids, and then bid in such a way as to force the winner to pay the maximum amount.

UX

The most obvious UX consequence of using a command language is that it tends to be harder to learn. The following quote, said partly in jest, certainly contains a kernel of truth:

... has done a bang-up job providing a PBEM Terra Mystica experience that includes just enough extra layers of complexity via the interface and game administration tools to keep TM as confusing as ever, long after you master the actual game!

Non-natural languages are simply not a mode of human computer interaction that most people are comfortable with in this day and age. It actually continues to amaze me that I could get non-programmers to play using this implementation at all. Is it possible to evaluate how big a hurdle this has been for people? The best number I can come up with is that around 20% of the players who joined at least one game never finished even one game without dropping out. Note that these are players who have already jumped through hoops such as email validation during account registration. It's possible that there's some other issue beside the UI that's a problem for these players, but it does seem like the most likely candidate.

A smaller problem is that it essentially forces the introduction of a move preview. For those who haven't played the game, when entering moves you need to first enter the moves, then click 'preview', check that the results match what you want, and finally click 'save' to commit the moves. In a game that uses a direct manipulation paradigm, a preview could be skipped. But with a more obscure UI like here, it's absolutely essential since the move might not have had the intended effect. Whether it's doing the entirely wrong move, picking the wrong tile, building on the wrong location, etc. Even with a preview step somebody will request a rollback on average once or twice a game.

So why do I call this a problem? Because despite my best efforts, especially new players will frequently forget to 'save', leaving the game in a limbo state where they think they've done their move, until some other player gets impatient. (To mitigate this a little, the system will automatically do a 'preview' when using the GUI tools to generate the commands rather than type them. Unfortunately performance problems make it unfeasible to trigger continuous parsing + updates when typing).

A horrible mistake I made in the design of the language was the lack of (mandatory) turn delimiters. Originally my implementation treated each row as a complete turn. This caused more confusion than any other part of the command language. In the end I ended up writing a lot of very complicated code for automatically detecting the turn breaks in a command stream.

But that wasn't actually good enough, there are valid command streams where the splitting isn't unambiguous, e.g. the tunneling ability of dwarves, where transform E10. build E10. I had to make an arbitrary choice on that (basically the behavior now is greedy, as many commands as possible are stuffed into the same move). So I had to include the done command to allow players to disambiguate in the few cases where it's needed. This is still supremely confusing for people. All of this could have been avoided by taking this into account right at the start.

Finally, one very surprising outcome is that having a compact vocabulary for game actions makes it much easier to display a useful player-readable log of what happened in the game. The typical user-visible log is structured as natural language, and so verbose as to be hard to read especially when trying to piece together the flow of the game after the fact. It's easy to see why that design choice is made, but it's not necessary when all players are almost by definition going to know how to read a more compact representation.

Likewise this makes it really easy to display a concise summary of what has happened in the game since the player last looked at it (done both in the notification emails and the 'recent moves' tab of games).

Social issues

The unlimited admin access to games has a dark side. Admin malfeasance is rare but I do get about one complaint a month about it. Sometimes these are games where the admin will change their moves after others have already taken moves, rolling the game back by a huge amount, taking over entirely for another player for example forcibly passing them, applying different standards to allowing others to undo vs. doing it themselves, and so on.

This is the kind of drama that I really do not want to deal with, but the general solution is to just mark the game as unrated, and let the players sort out between themselves whether and how the game will continue. And it is a bit of a miracle that it hasn't yet become a more widespread problem, as one might expect to happen for the anonymity + internet combo. If it does ever become intolerable, the solution will almost certainly be to disable admin mode entirely for public games. The TM tournament has already shown that it's at least workable, even if people do occasionally get a little bit screwed by the 'no manual administration' policy.

One consequence of a command language is that everything needs to be named. The map needs to have a coordinate system, every component needs a identifier of some sort, and every interaction needs a short and snazzy name. Old school wargames will do this as a matter of course. Of course every hex has an id! Of course the cards are both numbered and uniquely titled! But not so much for eurogames.

The naming we ended up with on the site is far from optimal, and caused yet more drama due to non-online players feeling excluded from conversations. (If you want to know more, you can see an explanation for where the names came from, and why they won't change). That bit is unfortunate. But at least I actually find real value in having convenient shorthands available for everything, when discussing the game, whether when theorycrafting or conducting some tabletalk on IRC during a game.

Implementation issues

The obvious problem for a log replay system is performance. Replaying a full game, which is done for almost every operation, can take around 0.15 seconds in the current implementation, with no obvious low hanging fruit to fix. On the current traffic levels server load is not a problem, but I would start to get worried if usage increased by a factor of 10. As discussed above, there are features I'm unwilling to implement due to CPU load concerns. And it is actually causing real development pain for testing (see below).

It's hard to say exactly how much of the CPU overload is related to command parsing, a step that could be avoided with the use of a more structured log format. Some crude profiling suggests that the parsing takes only 5-10% of the runtime, certainly nowhere enough to warrant using a different format.

A rewrite in a language with higher performance implementations than Perl would almost certainly give a factor of 10 improvement on the actual game evaluation code, moving the bottlenecks to IO. But a full rewrite is not in the cards.

Another potential implementation worry is storage. The current DB size is about 250MB. Unlike CPU usage, this is a cost that accumulates over time. Out of that 250MB maybe 75% is used by the game logs. The logs, stored as a sequence of commands, are not a particularly efficient form of encoding the game data. Simple lossless compression could easily compress them by 80-90%. Luckily disk is cheap (this server still has 600GB free), so this should never become a real issue.

Another consequence of a log replay system is that any change in the game evaluation might break existing games. That change might be a bugfix for a place where the effect of a move was miscomputed, it might be extra validation to prevent illegal moves of some kind, cheating prevention, or something else entirely. This is not a theoretical possibility. Basically every single game evaluation change I make, there are already multiple affected games. No matter how elementary a rule is, somebody has already broken it.

Obviously in a stored state implementation changes like this don't matter. The current state is the current state no matter what. But in a log replay system you need to have some story on how to deal with retroactive changes. I can think of the following strategies:

  • Punt: Don't make any changes at all.
  • Ignore: Just make the change, and don't worry about games breaking or the results changing part way through.
  • Delete: Just delete any games that would be broken.
  • Fixups: Find all games where the old and new behavior differ, and change the appropriate logs in such a way that the results with the new log and version will be the same as the result with the original log and old version. This change could be manual or automated.
  • Versioning: Each game file carries a version number. When making a breaking change, keep both the original and new code paths, and choose one of the two based on the version number. Any newly created games use the new version number and get the fixes, existing games keep their original version number and the original behavior.
  • Positive options: Conditionalize the behavior on an option. Turn that option on for new games, as well as any existing games for which the new and old versions behave the same.
  • Negative options: Conditionalize the old behavior on an option. Turn that option on only for existing games where the results for old and new versions differ. Never turn the option on for newly created games.

During the lifespan of the site I've used most of these at one time or another. The 'ignore' strategy was appropriate a couple of times (for changes where I decided that the the new behavior was always acceptable, such as situations where a player had ended up overpaying for an action). The 'delete' strategy would be exceptional, the only situations where I used it were games that were aborted, and one case of a single game being completely unsalvageable due to bug abuse by a player. The 'fixup' strategy has the nice benefit that it avoids introducing a new code path, and was my default choice early on. But at this point it'd be an unacceptable amount of manual work, and it's not readily automatable. Especially with the relatively freeform input from the command language. My next default was 'positive options', but after about 3-4 of those I switched to 'negative options'. Positive options had a slightly more complicated rollout procedure, and also permanently clutter up all games, confusing people. ("What's this strict-darkling-sh option?").

None of these options are good, in this instance a log replay model does introduce some major costs either to the developer (who has to do extra work) or the users (who have some games screwed up or completely lost).

But it's not all bad! A log replay model makes testing much easier. First, it'd be very easy to write test cases since there is a very natural serialization format for games already, the command language. I don't actually write explicit tests for TM, but for example at work we need absurd amount of infrastructure for making it easy to write unit tests for TCP/IP packet handling. This kind of design gives the test cases for free. Likewise a Age of Steam implementation I was once doodling around with had lots of test cases, but even with the reasonably friendly format (protocol buffers) they were an absolute pain to write due to the boilerplate.

If I don't write unit tests, how do I test? Mostly by side by side testing; I have a small script that runs every single game in the database against both the new and the previous version. It munges the results a bit removing known harmless diffs, and then displays any changes from game to game. I can then look at those games, and decide whether it's indicating some kind of a problem with my change, an expected result of my change, or a problem of some sort in the game. It also acts as a great regression test that prevents failures from creeping in, and is the source of data for finding the games that would be broken by a game, so that one of the fixes discussed in the previous section can be applied.

This has been one of my favorite forms of testing for a long time, and works tremendously well in a case like Online TM where we have access to all games ever played. Thinking specifically of digital boardgames, it's also a model that wouldn't work well without a replayable log. The only problem is, as alluded to above, the CPU usage. Right now a full diffgame run takes about 90 minutes of CPU time on a rather beefy machine. Even with parallelization it's not a fast feedback cycle. (Makes me kind of miss being able to just casually run a sxs test on a thousand machines).

Conclusion

I'm afraid this ended up longer than intended, despite only covering one design decision. It's also a design decision that I feel is overall a win. You'll have to wait for the next post for the embarrassing technical missteps.