What's wrong with pcap filters?

Posted on 2015-05-18 in Networking

Introduction

I recently watched a video of a great talk on the early days of pcap by Steve McCanne. The bit on how the filtering language was designed - around the 26 minute mark but you might want to start at 20 minutes if you're unfamiliar with BPF - was one of the best stories about creating a new "little language" I've heard.

But that got me thinking a bit. This language is a tool that I use daily, that I'm generally happy with, but that also drives me absolutely crazy sometimes. This post is an attempt to look at some classes of problems that the pcap filtering language fails on, why those deficiencies exist, and why I continue using it even despite the flaws.

Just to be clear, libpcap is an amazing piece of software. It was originally written for one purpose, and it really is my fault that I end up too often using it for a different one. There's three very different use cases that I have for a packet filtering language (others may have more).

  • Small and simple filters to pick out a specific slice of traffic (single protocol, single flow, or single host). I believe it's fair to say that this is what the language was originally designed for.
  • Potentially complex filters for classifying traffic with real-time constraints and with no state, usually when using the filters for configuration rather than as an exploratory tool. This is where pcap is clumsy even when it generally works.
  • Offline analysis at higher protocol layers that'd benefit also benefit from tracking the high level protocol state between packets. You can sometimes coerce pcap to work for this use case, but it's super-awkward. It's also worth noting that features that are beneficial for this use case would not be welcome in the others. (Being able to run an arbitrary PCRE regexp on the packet payload? Great when doing offline analysis, unacceptable for real-time classification).

I try to do the third case with tools better suited for that, and only have a couple of complaints (e.g. VLAN support) on on first case. Mostly the pain comes from the middle case. So as we start the tour of annoyances, keep in mind that I'll often complain about a tool not doing a job it wasn't meant for.

... Continue reading ...

"It's like an OkCupid for voting" - the Finnish election engines

Posted on 2015-05-11 in General

Have I ever told you about the time I built an "OkCupid for elections" for the communists?

No? That's strange, I tend to get good mileage out of that story during election season. Unfortunately for the story to make any sense, you'll need a bit of absolutely fascinating background information on how elections work in Finland, and especially how websites that tell people whom to vote for became an integral part of it.

... Continue reading ...

A Monte Carlo simulation of Red7

Posted on 2015-03-30 in Games, Lisp

Red7 is a very clever little card game, and one of my favorite 2014 releases. But I have wondered about the density of meaningful decisions in the game. Sometimes it doesn't feel like you have all that much agency, and are just hanging on in the game with a single valid move every time it's your turn.

So here's some automated exploration of what a game of Red7 actually looks like from a statistical point of view. The method used here is a pure Monte Carlo simulation, with the players choosing randomly from the set of their valid moves.

Why a Monte Carlo simulation? I started trying to do a full game tree for a given starting setup but to my surprise the game tree is actually too large for that to be feasible; 2 weeks of computation even for a single two player game and a lot of optimization. The branching factor is just much bigger than it feels like when playing the game.

... Continue reading ...

Can't even throw code across the wall - on open sourcing existing code

Posted on 2015-03-19 in General

Starting a new project as open source feels like the simplest thing in the world. You just take the minimally working thing you wrote, slap on a license file, and push the repo to Github. The difficult bit is creating and maintaining a community that ensures long term continuity of the project, especially as some contributors leave and new ones enter. But getting the code out in a way that could be useful to others is easy.

Things are different for existing codebases, in ways that's hard to appreciate if you haven't tried doing it. Code releases that are made with no attempt to create a community around it and which aren't kept constantly in sync with the proprietary version are derided as "throwing the code across the wall". But getting even to that point can be non-trivial.

... Continue reading ...

Podcast on mobile TCP optimization

Posted on 2015-03-14 in Networking

I was recently a guest on Ivan Pepelnjak's (ipspace.net) Software Gone Wild podcast, talking about TCP acceleration in mobile networks, as well as whining in general about how much radio networks suck ;-) Thanks a lot to Ivan for the opportunity, it was fun!

You can listen to the podcast episode here.

Command languages as game user interfaces

Posted on 2014-12-08 in Games, Perl

In the previous post in this series, I promised to discuss in detail some of the positive and negative consequences of the less conventional design choices of my online Terra Mystica implementation. If you have no idea of what that is, reading at least the intro of that post might be a good idea. This post will just deal with one design choice, but it's the elephant in the room: the command language.

The canonical internal representation of a game in my TM implementation is as a sequence of rows, each describing a some number of player actions specified in an ad hoc mini language, or administrative commands that change the game setup in some way (for example setting game options, or dropping a player from the game partway through). This is what it might look like:

yetis: action ACT4
cultists: upgrade E6 to TE
cultists: +FAV6
giants: Leech 3 from cultists
giants: pass BON4
yetis: Leech 2 from cultists
cultists: +WATER
dragonlords: Decline 2 from cultists
dragonlords: dig 1. build G6
yetis: send p to EARTH
cultists: action FAV6. +AIR
dragonlords: pass BON7
yetis: upgrade E7 to TE. +FAV11
giants: Leech 3 from yetis
dragonlords: Leech 2 from yetis
cultists: Leech 2 from yetis

That's a short excerpt from the middle of a random game. A full game generally runs for about 400 rows.

... Continue reading ...

How buying a SSL certificate broke my entire email setup

Posted on 2014-12-05 in General, Networking

Everything looks ok to me, the error must be on your end!

Earlier this week I got a couple of emails from different people, both telling me that they'd just created a new user account, but hadn't yet received the validation email. In both cases my mail server logs showed that the destination SMTP server had rejected the incoming message due to a DNS error while trying to resolve the hostname part of the envelope sender. Since I knew very well that my DNS setup was working at the time, I was already ready to reply with something along the lines of "It's just some kind of a transient error somewhere, just wait a while and it'll sort itself out".

But I decided to check the outgoing mail queue just in case. It contained 2700 messages, going around 5 days back, most with error messages that looked at least a little bit DNS-related.

Oops.

... Continue reading ...

A brief history of Online Terra Mystica

Posted on 2014-11-27 in Games, Perl

What's this Online Terra Mystica thing?

For the last couple of years my main hobby hacking project (over a thousand commits, and probably an order of magnitude more time spent on it than all other non-work projects combined) has been an asynchronous multiplayer web implementation of the brilliant board game Terra Mystica (Feuerland Spiele, 2012). At the moment it's roughly 2/3 Perl, 1/3 Javascript, and uses Postgres as the data storage.

It's been a fairly successful project for something that was originally intended as a one-off. The usage statistics at the end of November 2014 are:

  • Almost 6000 registered users
  • About 1200 monthly active users (as in playing at least one game; not passive use like looking at the statistics pages).
  • 14000 moves executed on a normal weekday (10000 on weekends)
  • 16500 games either ongoing or finished.
  • Bi-monthly online TM tournament run by Daniel Åkerlund with 400+ players.
  • 1038 commits as of this writing.

This was not supposed to be a general use program. It was originally a one night hack to help keep track of a hand-moderated play-by-forum game of TM, which was obviously headed for failure due to the massive amount of errors people were making while describing their moves in natural language or when manually tracking their resources in the game.

... Continue reading ...

HTTPS is taking over in mobile (over 35% of traffic in some networks)

Posted on 2014-11-20 in Networking

Summary

This is a post in two parts. The first part discusses the current state and historical trends of HTTPS adoption in traffic carried over mobile networks. The second part explains why this change might well be beneficial to mobile operators, for whom the transition from unencrypted to encrypted traffic appears potentially harmful.

What's the traffic share of HTTPS?

Based on measurements from a number of mobile operators around the world, a typical 3G or LTE network currently carries around 20-25% HTTPS/SSL traffic. In some networks we've seen SSL traffic shares of over 35%. The transition has been very rapid, and given the the technological and social trends, it seems likely to continue. (For example just this week's announcment of Let's Encrypt, which removes any remaining barriers to entry, could matter a lot for the long tail of websites.)

... Continue reading ...

TCP is harder than it looks

Posted on 2014-11-11 in Networking

In theory it's quite easy to write software that deals with TCP at the packet level, whether it's a full TCP stack or just a packet handling application that needs to be TCP-aware. The main RFCs aren't horribly long or complicated, and specify things in great detail. And while new extensions appear regularly, the option negotiation during the initial handshake ensures that you can pick and choose which extensions to support. (Of course some of the extensions are in effect mandatory -- a TCP stack without support for selective acknowledgements or window scaling will be pretty miserable.)

If all you want to do is to e.g. load the frontpage of Google, writing a TCP stack can be a one afternoon hack. And even with a larger scope, if all you need to do is to connect to arbitrary machines running vanilla Linux, BSD, or Windows on the same internal network, you can fairly quickly validate both interoperability and performance.

... Continue reading ...