The mystery of the hanging S3 downloads

Posted on 2017-07-20 in Networking

A coworker was experiencing a strange problem with their Internet connection at home. Large downloads from most sites worked fine. The exception was that downloads from a Amazon S3 would get up to a good speed (500Mbps), stall completely for a few seconds, restart for a while, stall again, and eventually hang completely. The problem seemed to be specific to S3, downloads from generic AWS VMs were ok.

What could be going on? It shouldn't be a problem with the ISP, or anything south of that: after all, connections to other sites were working. It should not be a problem between the ISP and Amazon, or there would have been problems with AWS too. But it also seems very unlikely that S3 would have a trivially reproducible problem causing large downloads to hang. It's not like this is some minor use case of the service.

If it had been a problem with e.g. viewing Netflix, one might suspect some kind of targeted traffic shaping. But an ISP throttling or forcibly closing connections to S3 but not to AWS in general? That's just silly talk.

The normal troubleshooting tips like reducing the MTU didn't help either. This sounded like a fascinating networking whodunit, so I couldn't resist butting in after hearing about it through the grapevine.

... Continue reading ...

I don't want no 'wantarray'

Posted on 2017-07-18 in Perl

A while back, I got a bug report for json-to-multicsv. The user was getting the following error for any input file, including the one used as an example in the documentation:

    , or } expected while parsing object/hash, at character offset 2 (before "n")

The full facts of the matter were:

  • The JSON parser was failing on the third character of the file.
  • That was also the end of the first line in the file. (I.e. the first line of the JSON file contained just the opening bracket).
  • The user was running it on Windows.
  • The same input file worked fine for me on Linux.

... Continue reading ...

The origins of XXX as FIXME

Posted on 2017-04-17 in General

The token XXX is frequently used in source code comments as a way of marking some code as needing attention. (Similar to a FIXME or TODO, though at least to me XXX signals something far to the hacky end of the spectrum, and perhaps even outright broken).

It's a bit of an odd and non-obvious string though, unlike FIXME and TODO. Where did this convention come from? I did a little bit of light software archaeology to try to find out. To start with, my guesses in order were:

  • MIT (since it sometimes feels like that's the source of 90% of ancient hacker shibboleths)
  • Early Unix (probably the most influential codebase that's ever existed)
  • Some kind of DEC thing (because really, all the world was a PDP)

... Continue reading ...

Computing multiple hash values in parallel with AVX2

Posted on 2017-03-19 in General

I wanted to compute some hash values in a very particular way, and couldn't find any existing implementations. The special circumstances were:

  • The keys are short (not sure exactly what size they'll end up, but almost certainly in the 12-40 byte range).
  • The keys all of the same length.
  • I know the length at compile time.
  • I have a batch of keys to process at once.

Given the above constraints, it seems obvious that doing multiple keys in a batch with SIMD could speed thing up over computing each one individually. Now, typically small data sizes aren't a good sign for SIMD. But that's not the case here, since the core problem parallelizes so neatly.

After a couple of false starts, I ended up with a version of xxHash32 that computes hash values for 8 keys at the same time using AVX2. The code is at parallel-xxhash.

... Continue reading ...

I've been writing ring buffers wrong all these years

Posted on 2016-12-13 in General

So there I was, implementing a one element ring buffer. Which, I'm sure you'll agree, is a perfectly reasonable data structure.

It was just surprisingly annoying to write, due to reasons we'll get to in a bit. After giving it a bit of thought, I realized I'd always been writing ring buffers "wrong", and there was a better way.

... Continue reading ...

The hidden cost of QUIC and TOU

Posted on 2016-12-01 in Networking

Application specific UDP-based protocols have always been around, but with traffic volumes that are largely rounding errors. Recently the idea of using UDP has become a lot more respectable. IETF has started the ball rolling on standardizing QUIC, Google's UDP-based combination of TCP+TLS+HTTP/2. And Facebook published Linux kernel patches to add an encrypted UDP encapsulation of TCP, TOU (Transports over UDP). On a very high level, the approaches are dramatically different.

... Continue reading ...

Ratas - A hierarchical timer wheel

Posted on 2016-07-27 in General

Last week I needed a timer wheel for a hobby project. That's a data structure that's been reimplemented over and over in the last three decades, but for various reasons I couldn't get excited by any of the freely available ones. Obviously this means that one more implementation was needed, hence Ratas - a hierarchical timer wheel. Unfortunately my vacation ran out before I could get back to the original project, but that's the nature of yak shaving.

In this post I'll first explain briefly what timer wheels are - you might want to read one of the references instead if you've got the time - and then go into more detail on why I wrote a new one.

... Continue reading ...

The many ways of handling TCP RST packets

Posted on 2016-02-01 in Networking

What could be a simpler networking concept than TCP's RST packet? It just crudely closes down a connection, nothing subtle about it. Due to some odd RST behavior we saw at work, I went digging in RFCs to check what's the technically correct behavior and in different TCP implementations to see what's actually done in practice.

... Continue reading ...

json-to-multicsv - Convert hierarchical JSON to multiple CSV files

Posted on 2016-01-12 in General, Perl

Introduction

json-to-multicsv is a little program to convert a JSON file to one or more CSV files in a way that preserves the hierarchical structure of nested objects and lists. It's the kind of dime a dozen data munging tool that's too trivial to talk about, but I'll write a bit anyway for a couple of reasons.

The first one is that I spent an hour looking for an existing tool that did this and didn't find one. Lots of converters to other formats, all of which seem to assume the JSON is effectively going to be a list of records, but none that supported arbitrary nesting. Did I just somehow manage to miss all the good ones? Or is this truly something that nobody has ever needed to do?

Second, this is as good an excuse as any to start talking a bit about some patterns in how command line programs get told what to do (I'd use the word "configured", except that's not quite right).

... Continue reading ...

A rating system for asymmetric multiplayer games

Posted on 2015-11-18 in Games

Introduction

A couple of years ago I wrote a quick and dirty rating system for a online boardgame site I run. It wasn't particularly well thought out, but it did the job. Some discussion about the system made me revisit it, with two years of hindsight and orders of magnitude more data.

How well does the system actually work, and how predictive are the ratings? There are some obvious tweaks to the system — would implementing them make things better or worse? Would anything be gained from switching to a more principled (but more complicated) approach. For this last bit, I used Microsoft's TrueSkill as the benchmark. It has some desirable properties and appears to be the gold standard of team based rating systems right now.

The code and the data are available on GitHub in my rating-eval repository.

... Continue reading ...