Great cat and mouse story of a MS Messenger interoperability with AIM. I don't know if the punchline is how AOL finally made a change MS could not emulate (by exploiting a buffer overflow in AIM to do remote code execution to craft a response packet), or how MS bungled up the PR response to that.
Another classic description of early 2k organizational dysfunction of MS.
Random access vs. radix sort.
The Cardinals vs. the Ordinals, on where the boundaries between decades are. What does it say that despite reading so much nitpicking about this around 2000, I never knew that the ISO 8601 definition of decade is the one that's ostensible wrong.
Avoid wasting time on fetching the secondary bucket by maintaining a bloom filter of keys that required falling back to the secondary.
A linear probing hash table for batch purposes that can basically get 100% occupancy by having a dense hash-key sorted array as the primary, and a bit-array with popcount tricks to find the index into the dense array. It's not at all obvious to me why this works, it seems like for any reasonable hash code length the bitmap has to be wasting tremendous amounts of memory. Especially considering their bitmap encoding seems wasteful (two bits per possible hashcode, seems like you could get it very close to one bit without any more memory accesses).
But the benchmarks claim it works, so...
Schema-less flatbuffers. The HN thread turned into a delightful pissing match between protobuf implementors.
Converting a (part of a) row-order query engine to batched column-order.
This seems to be the patient zero for database engines optimizing for branch mispredicts by batching operations on homogenous data?
Protobuf dom that works entirely in-place on the original input string.
A level generator for 2-D platformers built on a rhythm-based model of player behavior, derived from an analysis of existing platformer games
Analyzing the Super Mario Bros. levels using a framework of repeating patterns.
Linkbait title, but this was a good data-oriented design walk-through for a non-gaming use case. (I'm still trying to understand what counts as DoD and what doesn't, since it sometimes feels like 90% of it is common sense.)
Slides for the above.
It's a great feature, and I would never have guessed how it works.
SIMD-ified parsing of the Google-style MSB-bit-1 varints. Don't think it'll work for one little project I've been doodling around with, due to being optimized for decoding multiple small varints with one call.
An algorithm for computing a difficulty rating for a nonogram.
A comparison of a bunch of nonogram solvers and the implementation strategies.
The strategies a human would use for solving Nongrams.
A good network debugging story. Why are a couple of content providers / CDNs responding to some customers of an operator with an RST, while other customers of the same operator had no problems?
Archaeology into an embarrassing looking security issue in OpenBSD (x86-only). How a series of superficially safe looking transformations into a struct definition first created a tiny crack, and then later widened the hole.
A tagged pointer setup for detecting buffer overflows with no branches or extra memory accesses.
Actually just the opposite of the title. A lovely story on why sometimes you actually need to write software to build a product, not just snap together some lego blocks.
Mmap is so great! Just do a single system call, and all the IO is hidden behind the scenes. Oh, wait. Turns out that sometimes that transparency isn't what you want at all. (Specifically sometimes you want non-blocking IO. Or as it happens, I found this post while looking for stuff on the MAP_POPULATE | MAP_NONBLOCK combination that Linux used to once support).
Robin Hood hash tables with linear probing are just sorted arrays. What a great way to think about it.
> This makes it surprisingly hard to answer a very simple question: what is the fastest join algorithm in 2015? In this paper we will try to develop an answer. We start with an end-to-end black box comparison of the most important methods. Afterwards, we inspect the internals of these algorithms in a white box comparison. We derive improved variants of stateof-the-art join algorithms by applying optimizations like softwarewrite combine buffers, various hash table implementations, as well as NUMA-awareness in terms of data placement and scheduling
> Zobrist hashing starts by randomly generating bitstrings for each possible element of a board game, i.e. for each combination of a piece and a position (in the game of chess, that's 12 pieces × 64 board positions, or 14 x 64 if a king that may still castle and a pawn that may capture en passant are treated separately). Now any board configuration can be broken up into independent piece/position components, which are mapped to the random bitstrings generated earlier. The final Zobrist hash is computed by combining those bitstrings using bitwise XOR.
And obviously the punchline being that given a board state and its hash, recomputing the hash of an output board state is simply xoring the original hash with the bitstrings of the moved piece; once before the move, once after.Using genetic programming to choose when to use which heuristic for directing a iterative deepening A* search. Though I have to say that the game trees this paper is talking of seem ludicrously small for 2009 (1.5M states).
Running multiple unrelated algorithms for Sokoban solving in parallel, on the assumption that the different algorithms have different blind spots. You get better average and worst cases by giving each of x algorithms 1/x% CPU than run just one with 100%. (Assuming the selection is diverse enough, of course).
Also had some ideas about having the different algorithms exchange information, but that seems very complicated and didn't seem to pan out.
> However, there isn't a Morrowind speedrun category where someone tries to become the head of all factions. For all its critical acclaim and its great story, most of quests in Morrowind are basically fetch-item or kill-this-person and there aren't many quests that require anything else. But planning such a speedrun route could still be extremely interesting for many reasons.
Some really neat stuff about treating speedrunning as a search/optimization problem. I was a little bit annoyed by the parts where the story strays from that, and the author instead uses human intuition to e.g. select which set of quests to do or which skills to train. Also, part 2.
Two things you don't see often in CS. Trying to replicate a systems design paper, and publishing a negative result. And also showing just how many crucial details can get left out in a systems description that makes it impossible to actually implement. And when you do implement it and don't get the hoped for performance, what then? Obviously more and more optimizations that the original system probably didn't have.
It's kind of interesting to read the original paper's HN comments after this.
> Assuming typical game theory for the jerks, here’s what the thinking would have been: I was a jerk too, and my real goal here was not to actually solve a problem, but was to leverage SIMD either to usurp the people who led parallel programming models in the compiler group or to advance some other nefarious agenda.
A personal retrospective on the development of ispc (a compiler for a shader-programming style C dialect for x86-64). What a great story of big-company intrigue and dysfunction. I'm reloading this site daily to check for new installments.
Proxying traffic through home Wifi routers that expose UPnP to the internet. (I'd heard of malicious proxying through home routers, but I'd thought they were compromised devices rather than just misconfigured ones).
The sophistication of modern reverse engineering tools is pretty amazing.
> When a typical Silicon Valley company decides to "sell off their assets" that generally means office chairs, white boards, and the occasional espresso machine. Not test equipment, test fixtures, extra parts, and tools.
Using the closure of what was apparently a famous electronics scrap store to reflect on how Silicon Valley changed in the last couple of decades.
Expose SACKs directly to the encoder. Always use the latest fully ACKed frame as the keyframe. (Plus other things, but that felt like the interesting insight to me.)
A worthy new entry in the popular "why Go sucks" genre.
> I definitely found the answer to my question about why so few graphical Kaypro programs exist. The Kaypro’s graphics are awful – it’s a text-mode machine with graphics bolted on as a box-checking exercise. That being said, the development experience was surprisingly nice and it was a lot of fun to go through the exercise of actually making a functional game for a machine slightly older than me.
On the performance and power efficiency of Xeons vs. Qualcomms server chips on SIMD workloads. My basic assumption on CF's tech blog posts is that they're 90% PR. But this does have hard numbers, and they're pretty surprising ones (specifically the power usage / unit of work numbers. though I wish they had raw power usage as well).
> This post will focus on types of tech debt I’ve seen during my time working at Riot, and a model for discussing it that we’re starting to use internally. If you only take away one lesson from this article, I hope you remember the “contagion” metric discussed below.
John Cowan linked to this in a comment on my post on tagged pointers. It's a very comprehensive look at datatype implementations in (mostly) Lisp implementations.
(Is this the by the same Stan Shebs who wrote XConq back in the day?)
There's a bunch of different reasons why packets might get corrupted in-flight. This research finds out signals for distinguishing between those cases (+ congestion-induced packet loss) and recommends specific maintenance tasks to fix the problems.
A design for a key-value store for update-heavy applications.
A critique of the GDPR as a piece of legislation from somebody who a) appears to be a privacy activist, b) works as a GDPR DPO.
Another story of debugging and mitigating a problem in a closed source program.
>This is a long-ish entry posted after multiple discussions were had on the nature of having or not having bounded mailbox in Erlang.
Multi-use software can be used in bad ways. If you sell such software to authoritarian governments (or government-controlled companies), it'd be good to have controls on exactly what they can do. Obviously that doesn't work if the system is arbitrarily scriptable, but few systems are.
But what really offends me about this article is just what garbage the Procera traffic rewriting implementation clearly was.
Just what it says in the title. Stories about genetic algorithms etc. generating unexpected results.
Reading through this, I kept thinking that I'd pretty recently read about someone else using TCP congestion control for RPC queue management. And indeed I had, it was this post by Evan Jones. First time this linkblog actually did what I intended it for! ;-)
The inner workings of an Android OAuth-token stealing botnet. [part 2] [part 3]
Not actually the death of the sampling theorem. But an absolutely brutal takedown of some dodgy signal processing research. The punchline:
> As so often, one does have to ask: How did these dramatic claims get through peer review? Given the obvious conflict with the Sampling Theorem, weren’t some eyebrows raised in the process? Who reviewed these submissions anyway? Well, I did. For a different journal, where the manuscript ultimately got rejected.
A tool to transform JSON to a line-based format, where each line is prefixed with a path. And a tool to transform from that format back to JSON. Such a clever idea.
Procedural map generation using (cleverly designed) Wang tiles.
The DNS protocol design is becoming increasingly detached from the practice, leading to increasingly complex and bug-prone features.
The new version resolution algorithm for Dart's package manager, with special emphasis on error messages. The contrast between this and the recent work for Go package version is pretty interesting.