The origins of XXX as FIXME

Posted on 2017-04-17 in General

The token XXX is frequently used in source code comments as a way of marking some code as needing attention. (Similar to a FIXME or TODO, though at least to me XXX signals something far to the hacky end of the spectrum, and perhaps even outright broken).

It's a bit of an odd and non-obvious string though, unlike FIXME and TODO. Where did this convention come from? I did a little bit of light software archaeology to try to find out. To start with, my guesses in order were:

  • MIT (since it sometimes feels like that's the source of 90% of ancient hacker shibboleths)
  • Early Unix (probably the most influential codebase that's ever existed)
  • Some kind of DEC thing (because really, all the world was a PDP)

... Continue reading ...

Computing multiple hash values in parallel with AVX2

Posted on 2017-03-19 in General

I wanted to compute some hash values in a very particular way, and couldn't find any existing implementations. The special circumstances were:

  • The keys are short (not sure exactly what size they'll end up, but almost certainly in the 12-40 byte range).
  • The keys all of the same length.
  • I know the length at compile time.
  • I have a batch of keys to process at once.

Given the above constraints, it seems obvious that doing multiple keys in a batch with SIMD could speed thing up over computing each one individually. Now, typically small data sizes aren't a good sign for SIMD. But that's not the case here, since the core problem parallelizes so neatly.

After a couple of false starts, I ended up with a version of xxHash32 that computes hash values for 8 keys at the same time using AVX2. The code is at parallel-xxhash.

... Continue reading ...

I've been writing ring buffers wrong all these years

Posted on 2016-12-13 in General

So there I was, implementing a one element ring buffer. Which, I'm sure you'll agree, is a perfectly reasonable data structure.

It was just surprisingly annoying to write, due to reasons we'll get to in a bit. After giving it a bit of thought, I realized I'd always been writing ring buffers "wrong", and there was a better way.

... Continue reading ...

The hidden cost of QUIC and TOU

Posted on 2016-12-01 in Networking

Application specific UDP-based protocols have always been around, but with traffic volumes that are largely rounding errors. Recently the idea of using UDP has become a lot more respectable. IETF has started the ball rolling on standardizing QUIC, Google's UDP-based combination of TCP+TLS+HTTP/2. And Facebook published Linux kernel patches to add an encrypted UDP encapsulation of TCP, TOU (Transports over UDP). On a very high level, the approaches are dramatically different.

... Continue reading ...

Ratas - A hierarchical timer wheel

Posted on 2016-07-27 in General

Last week I needed a timer wheel for a hobby project. That's a data structure that's been reimplemented over and over in the last three decades, but for various reasons I couldn't get excited by any of the freely available ones. Obviously this means that one more implementation was needed, hence Ratas - a hierarchical timer wheel. Unfortunately my vacation ran out before I could get back to the original project, but that's the nature of yak shaving.

In this post I'll first explain briefly what timer wheels are - you might want to read one of the references instead if you've got the time - and then go into more detail on why I wrote a new one.

... Continue reading ...

The many ways of handling TCP RST packets

Posted on 2016-02-01 in Networking

What could be a simpler networking concept than TCP's RST packet? It just crudely closes down a connection, nothing subtle about it. Due to some odd RST behavior we saw at work, I went digging in RFCs to check what's the technically correct behavior and in different TCP implementations to see what's actually done in practice.

... Continue reading ...

json-to-multicsv - Convert hierarchical JSON to multiple CSV files

Posted on 2016-01-12 in General, Perl

Introduction

json-to-multicsv is a little program to convert a JSON file to one or more CSV files in a way that preserves the hierarchical structure of nested objects and lists. It's the kind of dime a dozen data munging tool that's too trivial to talk about, but I'll write a bit anyway for a couple of reasons.

The first one is that I spent an hour looking for an existing tool that did this and didn't find one. Lots of converters to other formats, all of which seem to assume the JSON is effectively going to be a list of records, but none that supported arbitrary nesting. Did I just somehow manage to miss all the good ones? Or is this truly something that nobody has ever needed to do?

Second, this is as good an excuse as any to start talking a bit about some patterns in how command line programs get told what to do (I'd use the word "configured", except that's not quite right).

... Continue reading ...

A rating system for asymmetric multiplayer games

Posted on 2015-11-18 in Games

Introduction

A couple of years ago I wrote a quick and dirty rating system for a online boardgame site I run. It wasn't particularly well thought out, but it did the job. Some discussion about the system made me revisit it, with two years of hindsight and orders of magnitude more data.

How well does the system actually work, and how predictive are the ratings? There are some obvious tweaks to the system — would implementing them make things better or worse? Would anything be gained from switching to a more principled (but more complicated) approach. For this last bit, I used Microsoft's TrueSkill as the benchmark. It has some desirable properties and appears to be the gold standard of team based rating systems right now.

The code and the data are available on GitHub in my rating-eval repository.

... Continue reading ...

Flow disruptor - a deterministic per-flow network condition simulator

Posted on 2015-10-01 in Networking

Introduction

I finally got around to open sourcing flow disruptor, a tool I wrote at work late last year. What does it do? Glad you asked! Flow disruptor is a deterministic per-flow network condition simulator.

To unpack that description a bit, per-flow means that the network conditions are simulated separately for each TCP connection rather than on the link layer. Deterministic means that we normalize as many network conditions as possible (e.g. RTT, bandwidth), and any changes in those conditions happen at preconfigured times rather than randomly. For example the configuration could specify that the connection experiences a packet loss exactly 5s after it was initiated, and then a packet loss every 1s after that. Or that packet loss happens at a specifed level of bandwidth limit based queueing.

You can check the Github repo linked above for the code and for documentation on e.g. configuration. This blog post is more on why this tool exists and why it looks the way it does.

... Continue reading ...

The most obsolete infrastructure money could buy - my worst job ever

Posted on 2015-09-01 in General

Today marks the 10th anniversary of the most bizarre, and possibly the saddest, job I ever took.

The year was 2005. My interest in writing a content management system in Java for the company that bought our startup had been steadily draining away, while my real passion was working on compilers and other programming language infrastructure (mostly SBCL). One day I spotted a job advert looking for compiler people, which was a rare occurrence in that time and place. I breezed through the job interview, but did not ask the right questions and ignored a couple of warning signs. Oops.

It turned out to be a bit of an adventure in retrocomputing.

... Continue reading ...