post in this series, I promised to discuss in detail some of the
positive and negative consequences of the less conventional design
choices of my online Terra
Mystica implementation. If you have no idea of what that is,
reading at least the intro of that post might be a good idea. This
post will just deal with one design choice, but it's the elephant in
the room: the command language.
The canonical internal representation of a game in my TM
implementation is as a sequence of rows, each describing a some number
of player actions specified in
an ad hoc mini
language, or administrative commands that change the game setup
in some way (for example setting game options, or dropping a
player from the game partway through). This is what it might look like:
yetis: action ACT4
cultists: upgrade E6 to TE
giants: Leech 3 from cultists
giants: pass BON4
yetis: Leech 2 from cultists
dragonlords: Decline 2 from cultists
dragonlords: dig 1. build G6
yetis: send p to EARTH
cultists: action FAV6. +AIR
dragonlords: pass BON7
yetis: upgrade E7 to TE. +FAV11
giants: Leech 3 from yetis
dragonlords: Leech 2 from yetis
cultists: Leech 2 from yetis
That's a short excerpt from the middle of a random game. A full game
generally runs for about 400 rows.
What do I mean by this being the canonical internal representation?
Only a few parts of the game state are actually persisted separately
in the DB; these are things that might almost qualify as metadata,
such as whose turn is it to move, is the game still running, and what
were the final rankings of a finished game. But in general the only
way to find out the current state of the game is to evaluate the whole
sequence of commands from start to finish. This is in fact done for
almost every operation on the site (viewing a game, previewing a move,
saving a move, viewing the or editing the game in an admin mode, and
In addition to being the canonical internal representation, the
command language is also the canonical user interface; the fundamental
operation players do is enter new rows into the command
sequence. Often this is done by writing the commands manually, though
there are GUI shortcuts of one form or another available for almost
This might sound like a slightly insane way of doing things, but it
does have some benefits as well. I've made several digital board game
adaptations of varying levels of completeness over the years, used
tens of other ones, and this solution hits the closest to my personal
A taxonomical diversion
Before discussing the fallout of this design decision in more detail,
it's probably useful to do a quick tour of some of the main axes in
the design space. (I'm of course just describing the extremes, while
in the real world most examples would fall on a continuum).
First, there's the question of the interaction model which might be
or skeuomorphic. In
a skeuomorphic design the player doing input on a computer would still
be mimicking the actions of someone playing the game with physical
pieces and no computer assistance.
In an abstract design the player
would only input the parts of the move that are necessary to uniquely
distinguish it from other possible moves, with any bookkeeping and
mandatory intermediate steps being carried out automatically. Likewise
in a skeuomorphic design the software provides information through the
same methods as the original physical game, while an abstract design
will automate some of the mechanical parsing of the game state. Or
even just the question of using the graphical assets of the original
game, generally optimized for sales, versus using digital-first assets
optimized for clarity.
As an example of this axis, in
the 18xx series of
games a substantial amount of playtime is spent computing the exact
routes of a number of trains on a complex rail network. I'm aware of
three solutions that are actually in use, and there is a fourth
plausible one, in order from least to most abstract:
- The user manually decides on the routes, computes their values with no computer assistance, and those values are used with no validation. Examples: ps18xx, early versions of Rails.
- The user enters valid routes through a user interface. The software computes the values of the routes, and distributes the income from the company appropriately. Example: rr18xx.
- In games with requirements that all routes must be optimal, the software could compute an optimal route but only for the purpose of rejecting any manually computed unoptimal ones. Examples: None. (Though it's similar to what's done in the SlothNinja implementation of Indonesia, a game that probably counts as an honorary 18xx)
- The software automatically finds an optimal set of routes and computes their values. Examples: The ancient DOS-based 1830 from Simtex, recent versions of Rails.
My own tastes run toward maximum abstraction, I've rarely if ever seen
a digital boardgame conversion that needed to be more skeuomorphic.
But this is not a universal view. There are definitely people who will
refuse to play a conversion that does not use the same graphics as the
physical version. Or who will strenuously argue against automatic
finding of optimal routes in 18xx, on the basis that being evaluating
routes is a core skill in the game when making decision about route
building, and that skill can only be acquired by getting sufficient
practice in manual route computation.
A second axis is the internal representation, which could be based on
either log replay or stored state. In a log replay
system the game is stored as a series of steps from the starting setup to
the current state. In a stored state system the game is stored as the
current values of all pieces of the game. How much money does every
player have, which round is it right now, what's in this exact space
on the map, and so on.
A third axis is the input model. Moves could be entered either through
direct or indirect manipulation. In a system using
direct manipulation, the player would for example see a graphical
display a map and be able to click or drag on a unit to enter a move
for it. In an indirect system the player observes the game state in
one place, and enters their moves using some completely unrelated
I think most digital boardgames use a direct input model, but there
are also a fair number that have a menu-driven system of some sort.
The only examples I know of that go a bit further with indirection by
providing a command language are
my ancient Paths
of Glory mapper and the even
PBEM judges. If you have other examples, I'd love to hear of them.
Direct manipulation is often, but not always, linked to excessive
skeuomorphism in the interaction model. For example I find it almost
painful to play most Vassal modules, with their hyper-direct
interaction model of dragging and dropping counters around, manually
drawing cards from a deck or rolling dice. Digital boardgames are not
the same media as physical boardgames, and should play to their unique
strengths. But these are in fact orthogonal concerns, and there's no
reason for why a direct manipulation model couldn't also provide
useful input and computational abstractions.
Whew, so much for the theory. In this taxonomy Online Terra Mystica is
pretty far toward the abstract end, and is fully in the log replay
camp. While it has a half-hearted attempt at adding some direct
manipulation concepts to the UI, it started off as an indirect system
and deep inside that's what it is. It also chooses to merge the input
format and the log format into one entity. So what does this mean?
Perhaps the signature feature of the site is the planner. This
tool allows the player to enter an arbitrarily long sequence of
actions - all the way to the end of the game - and see what the
effects would be. Are all the moves valid? Are there sufficient
resources available to do all of this? Oh, I don't have enough
resources? Well what if I do this on round 5, and delay that action to
round 6. In cases where the plan fundamentally depends on the
opponents doing something, it's possible for the plan to also contain
arbitrary resource adjustments. And finally, since the command
language supports comments, these plans can be properly documented so
that when you return to them in a day or two, you can remember why
you wanted to do these particular moves.
I think this feature is intrinsically linked to the command language as
a user interface, and it might actually be unique. There are some
games with other kinds of interfaces that allow you to play the game
forward, and then undo / rewind / reload. But simply being able to
play the game forward is not sufficient to make this a useful
tool. It's only the ease of inserting, reordering and deleting moves
that makes it possible to use this as a matter of course, rather than
only under the most exceptional circumstances.
A somewhat related feature is undo. Inflexibility in allowing
moves to be taken back is the bane of many forms of digital
boardgames. When playing a game face to face, most groups will
generally allow at least some level of taking back moves. In some
cases all moves are final immediately (this has always been the
primary problem of the otherwise brilliant implementation
of Brass at Order of the
Hammer). In some other implementations there are distinct
example BGO's Through the
Ages allows undoing back to the start of your full turn, but no
other rollbacks (clicking 'finish turn' is final, as is any kind of
action during an auction or war resolution). These two are, I believe,
examples of undo being limited for design reasons. At rr18xx meanwhile
rollbacks are possible until the previous action of each player. Here
my understanding is that the overriding issue is technical, as the
rollback is essentially a full restore to a previous database
snapshot, and there are resource constraints on how many snapshots can
The solution Online TM takes to this is to grant the creator of the
game arbitrary powers to edit the history at will, the admin
mode. Not only can they undo the last move or couple of moves. If
there was a mistake made three moves back, they can go and fix it (and
they can fix it without forcing the intervening moves to be
redone). This feature is fully tied to a log replay mode of
operation. While more limited forms of undoing could be implemented as
a reverse log replay from the end state or through state snapshots,
this more complete form depends on the log being directly editable.
And realistically the log also needs to be the input format; it would
not be reasonable to expect the admin to be able to edit a more
formal log representation correctly (whether the log format is XML,
protocol buffers, JSON, or something else). But in the case where the
log format and the move input system match, just playing the game
has taught the game admin the necessary skills.
This is a very nice feature for friendly games. It does have downsides
though, more on that later in the section on the social implications.
There's also a potential as yet unimplemented feature
of pre-programmed actions, that people frequently ask for.
"I know exactly what I want to do next turn, why can't I just
pre-enter my move". This would be a pretty interesting thing for
speeding up games, but to my mind would not be conducive for good
play. Circumstances change, often in ways you did not anticipate at
all. The only way this could be even remotely usable would be if the
language was extended to have some kind of conditional execution. And
that's a can of worms I'm interested in opening, and I suspect also a
bridge too far for 99% of my users.
It's worth noting that many of the above features are closely tied to
a game with no randomness (or at most setup randomness) and no
hidden information. As such their existence is something of an
anti-feature, preventing other additions to the game.
For a non-hypothetical example, I'm currently thinking about how to
implement the faction auction variant from the TM expansion. A
full open auction in the beginning would be painfully slow. The most
obvious, though still slightly imperfect, solution is a series
of blind second
price auctions. But this is not a good fit for the site's existing
design. The problem is that the blind bid introduces momentary hidden
information into the game, and it's possible for that information to
leak through either the preview or admin modes. For example the admin
could wait for everyone else to bid, peek into the log and see
everyone else's bids, and then bid in such a way as to force the
winner to pay the maximum amount.
The most obvious UX consequence of using a command language is that
it tends to be harder to learn. The following quote, said partly
in jest, certainly contains a kernel of truth:
... has done a bang-up job providing a PBEM Terra Mystica experience that includes just enough extra layers of complexity via the interface and game administration tools to keep TM as confusing as ever, long after you master the actual game!
Non-natural languages are simply not a mode of human computer
interaction that most people are comfortable with in this day and age.
It actually continues to amaze me that I could get non-programmers to
play using this implementation at all. Is it possible to evaluate how
big a hurdle this has been for people? The best number I can come up
with is that around 20% of the players who joined at least one game
never finished even one game without dropping out. Note that these
are players who have already jumped through hoops such as email
validation during account registration. It's possible that there's
some other issue beside the UI that's a problem for these players,
but it does seem like the most likely candidate.
A smaller problem is that it essentially forces the introduction of a
move preview. For those who haven't played the game, when entering
moves you need to first enter the moves, then click 'preview', check
that the results match what you want, and finally click 'save' to
commit the moves. In a game that uses a direct manipulation paradigm,
a preview could be skipped. But with a more obscure UI like here, it's
absolutely essential since the move might not have had the intended
effect. Whether it's doing the entirely wrong move, picking the wrong
tile, building on the wrong location, etc. Even with a preview step
somebody will request a rollback on average once or twice a game.
So why do I call this a problem? Because despite my best efforts,
especially new players will frequently forget to 'save', leaving the
game in a limbo state where they think they've done their move, until
some other player gets impatient. (To mitigate this a little, the
system will automatically do a 'preview' when using the GUI tools to
generate the commands rather than type them. Unfortunately
performance problems make it unfeasible to trigger continuous parsing
+ updates when typing).
A horrible mistake I made in the design of the language was the lack
of (mandatory) turn delimiters. Originally my implementation treated
each row as a complete turn. This caused more confusion than any other
part of the command language. In the end I ended up writing a lot of
very complicated code for automatically detecting the turn breaks in
a command stream.
But that wasn't actually good enough, there are valid command streams
where the splitting isn't unambiguous, e.g. the tunneling ability of dwarves, where
transform E10. build E10. I had to make an arbitrary
choice on that (basically the behavior now is greedy, as many commands
as possible are stuffed into the same move). So I had to include the
done command to allow players to disambiguate in the few
cases where it's needed. This is still supremely confusing for
people. All of this could have been avoided by taking this into account
right at the start.
Finally, one very surprising outcome is that having a compact
vocabulary for game actions makes it much easier to display
a useful player-readable log of what happened in the game. The
typical user-visible log is structured as natural language, and so
verbose as to be hard to read especially when trying to piece together
the flow of the game after the fact. It's easy to see why that design
choice is made, but it's not necessary when all players are almost by
definition going to know how to read a more compact representation.
Likewise this makes it really easy to display a concise summary of
what has happened in the game since the player last looked at it (done
both in the notification emails and the 'recent moves' tab of games).
The unlimited admin access to games has a dark side. Admin
malfeasance is rare but I do get about one complaint a month about
it. Sometimes these are games where the admin will change their moves
after others have already taken moves, rolling the game back by a huge
amount, taking over entirely for another player for example forcibly
passing them, applying different standards to allowing others to undo
vs. doing it themselves, and so on.
This is the kind of drama that I really do not want to deal with, but
the general solution is to just mark the game as unrated, and let the
players sort out between themselves whether and how the game will
continue. And it is a bit of a miracle that it hasn't yet become a
more widespread problem,
as one might
expect to happen for the anonymity + internet combo. If it does
ever become intolerable, the solution will almost certainly be to
disable admin mode entirely for public games. The TM tournament has
already shown that it's at least workable, even if people do occasionally
get a little bit screwed by the 'no manual administration' policy.
One consequence of a command language is that everything needs to be
named. The map needs to have a coordinate system, every component
needs a identifier of some sort, and every interaction needs a short
and snazzy name. Old school wargames will do this as a matter of
course. Of course every hex has an id! Of course the cards are both
numbered and uniquely titled! But not so much for eurogames.
The naming we ended up with on the site is far from optimal, and
caused yet more drama due to non-online players feeling excluded from
conversations. (If you want to know more, you can see an explanation
the names came from, and why they won't change). That bit is
unfortunate. But at least I actually find real value in having
convenient shorthands available for everything, when discussing the
game, whether when theorycrafting or conducting some tabletalk on IRC
during a game.
The obvious problem for a log replay system
is performance. Replaying a full game, which is done for almost
every operation, can take around 0.15 seconds in the current
implementation, with no obvious low hanging fruit to fix. On the
current traffic levels server load is not a problem, but I would start
to get worried if usage increased by a factor of 10. As discussed
above, there are features I'm unwilling to implement due to CPU load
concerns. And it is actually causing real development pain for testing
It's hard to say exactly how much of the CPU overload is related to
command parsing, a step that could be avoided with the use of a
more structured log format. Some crude profiling suggests that the
parsing takes only 5-10% of the runtime, certainly nowhere enough
to warrant using a different format.
A rewrite in a language with higher performance implementations than Perl
would almost certainly give a factor of 10 improvement on the actual
game evaluation code, moving the bottlenecks to IO. But a full rewrite
is not in the cards.
Another potential implementation worry is storage. The current
DB size is about 250MB. Unlike CPU usage, this is a cost that
accumulates over time. Out of that 250MB maybe 75% is used by the game
logs. The logs, stored as a sequence of commands, are not a
particularly efficient form of encoding the game data. Simple lossless
compression could easily compress them by 80-90%. Luckily disk is
cheap (this server still has 600GB free), so this should never become
a real issue.
Another consequence of a log replay system is that any change in the
game evaluation might break existing games. That change might be a
bugfix for a place where the effect of a move was miscomputed, it
might be extra validation to prevent illegal moves of some kind,
cheating prevention, or something else entirely. This is not a
theoretical possibility. Basically every single game evaluation change
I make, there are already multiple affected games. No matter how
elementary a rule is, somebody has already broken it.
Obviously in a stored state implementation changes like this don't
matter. The current state is the current state no matter what. But in
a log replay system you need to have some story on how to deal with
retroactive changes. I can think of the following strategies:
- Punt: Don't make any changes at all.
- Ignore: Just make the change, and don't worry about games breaking or the results changing part way through.
- Delete: Just delete any games that would be broken.
- Fixups: Find all games where the old and new behavior differ, and
change the appropriate logs in such a way that the results with the
new log and version will be the same as the result with the original log
and old version. This change could be manual or automated.
- Versioning: Each game file carries a version number. When making
a breaking change, keep both the original and new code paths, and choose
one of the two based on the version number. Any newly created games use
the new version number and get the fixes, existing games keep their original
version number and the original behavior.
- Positive options: Conditionalize the behavior on an option. Turn that option on for new games, as well as any existing games for which the new and old versions behave the same.
- Negative options: Conditionalize the old behavior on an option. Turn that option on only for existing games where the results for old and new versions differ. Never turn the option on for newly created games.
During the lifespan of the site I've used most of these at one time or
another. The 'ignore' strategy was appropriate a couple of times (for
changes where I decided that the the new behavior was always
acceptable, such as situations where a player had ended up overpaying
for an action). The 'delete' strategy would be exceptional, the only
situations where I used it were games that were aborted, and one case
of a single game being completely unsalvageable due to bug abuse by a
player. The 'fixup' strategy has the nice benefit that it avoids
introducing a new code path, and was my default choice early on. But
at this point it'd be an unacceptable amount of manual work, and it's
not readily automatable. Especially with the relatively freeform input
from the command language. My next default was 'positive options', but
after about 3-4 of those I switched to 'negative options'. Positive
options had a slightly more complicated rollout procedure, and also
permanently clutter up all games, confusing people. ("What's this
None of these options are good, in this instance a log replay model
does introduce some major costs either to the developer (who has to do
extra work) or the users (who have some games screwed up or completely
But it's not all bad! A log replay model makes testing much
easier. First, it'd be very easy to write test cases since there is a
very natural serialization format for games already, the command
language. I don't actually write explicit tests for TM, but for
example at work we need absurd amount of infrastructure for making it
easy to write unit tests for TCP/IP packet handling. This kind of
design gives the test cases for free. Likewise a Age of Steam
implementation I was once doodling around with had lots of test cases,
but even with the reasonably friendly format (protocol buffers) they
were an absolute pain to write due to the boilerplate.
If I don't write unit tests, how do I test? Mostly by side by side
testing; I have
script that runs every single game in the database against both
the new and the previous version. It munges the results a bit removing
known harmless diffs, and then displays any changes from game to
game. I can then look at those games, and decide whether it's
indicating some kind of a problem with my change, an expected result
of my change, or a problem of some sort in the game. It also acts as
a great regression test that prevents failures from creeping in, and
is the source of data for finding the games that would be broken by
a game, so that one of the fixes discussed in the previous section
can be applied.
This has been one of my favorite forms of testing for a long time, and
works tremendously well in a case like Online TM where we have access
to all games ever played. Thinking specifically of digital boardgames,
it's also a model that wouldn't work well without a replayable log.
The only problem is, as alluded to above, the CPU usage. Right now a
diffgame run takes about 90 minutes of CPU time on a
rather beefy machine. Even with parallelization it's not a fast
feedback cycle. (Makes me kind of miss being able to just casually run
a sxs test on a thousand machines).
I'm afraid this ended up longer than intended, despite only covering
one design decision. It's also a design decision that I feel is
overall a win. You'll have to wait for the next post for the
embarrassing technical missteps.