Juho Snellman's Weblog

LLMs are cheap

Posted on 2025-06-02 in General

This post is making a point - generative AI is relatively cheap - that might seem so obvious it doesn't need making. I'm mostly writing it because I've repeatedly had the same discussion in the past six months where people claim the opposite. Not only is the misconception still around, but it's not even getting less frequent. This is mainly written to have a document I can point people at, the next time it repeats.

It seems to be a common, if not a majority, belief that Large Language Models (in the colloquial sense of "things that are like ChatGPT") are very expensive to operate. This then leads to a ton of innumerate analyses about how AI companies must be obviously doomed, as well as a myopic view on how consumer AI businesses can/will be monetized.

It's an understandable mistake, since inference was indeed very expensive at the start of the AI boom, and those costs were talked about a lot. But inference has gotten cheaper even faster than models have gotten better, and nobody has an intuition for something becoming 1000x cheaper in two years. It just doesn't happen. It doesn't help that the common pricing model ("$ per million tokens") is very hard to visualize.

So let's compare LLMs to web search. I'm choosing search as the comparison since it's in the same vicinity and since it's something everyone uses and nobody pays for, not because I'm suggesting that ungrounded generative AI is a good substitute for search.

Web Environment Integrity vs. Private Access Tokens - They're the same thing!

Posted on 2023-07-25 in General

I've seen a lot of discussions in the last week about the Web Environment Integrity proposal. Quite predictably from the moment it got called things like "DRM for the web", people have been arguing passionately against it on HN, Github issues, etc. The basic claims seem to be that it's going to turn the web into a walled garden, kill ad blockers, kill all small browsers, kill all small operating systems, kill accessibility tools like screen readers, etc.

The Web Environment Integrity proposal is basically:

A website can request an attestation from the browser
The browser forwards the attestation requests to an attester
The attester checks properties like hardware and software integrity
If they check out, the attester creates a token and signs it with its private key.
The attester hands off the signed token to the browser, which in turn sends it to the website.
The website checks that the token was signed by a trusted attester

Here's a funny thing I suspect few of those commenters know: A very similar mechanism already exists on the web, and is already deployed in production browsers (Safari), operating systems (iOS, OS X), and hosting infrastructure (Cloudflare, Fastly). That mechanism is Private Access Tokens / Privacy Pass.

Here's what PATs (as deployed by Apple, and on by default) do to the best of my understanding:

A website can request an attestation from the browser
The browser forwards the attestation requests to an attester
The attester checks properties like hardware and software integrity.
If they check out, the attester calls the website's trusted token issuer
The issuer checks whether to trust the attester and whether the information passed by the attester is sufficient, and then issues a token signed by its private key
The attester hands off the signed token to the browser, which passes it to the website.
The website checks that the token was signed by a trusted token issuer

This launching was hailed in the tech press as a win for privacy and security, not as an attempt to kill accessibility tools or build a walled garden. [1]

You might notice that the basic operating model of the two protocols is almost exactly the same. So is their intended use. From the "DRM for websites" perspective, I don't think there is a difference.

With both WEI and PATs, the website would be able to ask Apple to verify that the request is coming from a genuine non-jailbroken iPhone running Safari, and block the ones running Firefox on Linux. And in both, the intent is not for the API to be used for that kind of outright blocking.

Neither lists e.g. checking whether the browser is running an ad blocker extension as a use case. Both would have just the same technical capabilities for making that kind of thing happen, by just having the attester check for it, and I bet that in both cases the attester would be equally unmotivated in actually providing that kind of attestation.

It's also not that PATs would somehow make it easier for people to spin up new attesters for small or new platforms. Want to run your own attester for PATs? You could, but the issuers you care about will not trust it. [2]

Now, the technologies aren't quite identical, but the distinctions are subtle and would just matter for exactly the kind of anti-abuse work that both of the proposals were ostensibly meant for. The big one is the WEI proposal including the ability to content-bind the attestation to a specific operation. It's a feature anyone trying to use a feature like this for abuse prevention would think is needed, but that adds no power to the theorized "DRM for the web" use case. There is also a more obvious difference between the two, with whether the attester and issuer are the same entity or split. But that too is irrelevant in the discussion on how the technology could be misused. [3]

In principle there could also be differences in the exact things that the APIs allow attesting for. But neither standard defines the exact set of attestations, just the mechanisms.

Given the DRM narrative would have worked exactly the same for the two projects, why such a different reception? I can only think of two differences, both social rather than technical.

One is that the PAT (and related Privacy Pass) draft standards were written in the IETF and are dense standardese. There was no plaintext explainer. Effectively nobody outside of the internet standardization circles read those drafts, and if they had they wouldn't have known whether they needed to be outraged or not. The first time it actually broke through to the public was when Apple implemented it.

The other is the framing. PATs were sold to the public exclusively as a way of seeing fewer captchas. Who wouldn't want fewer captchas? WEI was pitched as a bunch of fairly abstract use cases and mostly from the perspective of the service provider, not for how it'd improve the user experience by reducing the need for invasive challenges and data collection.

This isn't the first time I've seen two attempts at a really similar project, with one getting lauded while the other gets trashed for something that's common to both. But it is the one where the two things are the most similar, and it feels like it should be instructive somehow.

If the takeaway is that standards proposals should be opaque and kept away from the public for as long as possible, before being launched straight to prod based on a draft spec, that'd be bad. If it's that standard proposals should be carefully written to highlight the benefit for the end user, even starting from the first draft, that's probably pretty good? And if it's that only Apple can launch any browser features without a massive backlash, it seems pretty damn bad.

[1] Just to be clear, the one significant HN discussion on PATs had similar arguments about it being DRM, so my claim is not that absolutely everyone loved PATs. But it didn't actually get traction as a hacker cause celebre, and as far as I can see the general media coverage was broadly positive.

[2] What's the process for getting Cloudflare or Fastly to trust a non-Apple attester anyway? I can't find any documentation.

[3] The split version seems kind of superior for deployment, since it means each site needs to only care about a single key (their chosen issuer). This makes e.g. the creation of a new attester a lot more tractable. You only need to convince half a dozen issuers to trust your new attester and ingest the keys, not try to sign up every single website in the world one by one.

A monorepo misconception - atomic cross-project commits

Posted on 2021-07-21 in General

In articles and discussions about monorepos, there's one frequently alleged key benefit: atomic commits across the whole tree let you make changes to both a library's implementation and the clients in a single commit. Many authors even go as far to claim that this is the only benefit of monorepos.

I like monorepos, but that particular claim makes no sense! It's not how you'd actually make backwards incompatible changes, such as interface refactorings, in a large monorepo. Instead the process would be highly incremental, and more like the following:

Push one commit to change the library, such that it supports both the old and new behavior with different interfaces.
Once you're sure the commit from stage 1 won't be reverted, push N commits to switch each of the N clients to use the new interface.
Once you're sure the commits from stage 2 won't be reverted, push one commit to remove the old implementation and interface from the library.

Writing a procedural puzzle generator

Posted on 2019-05-14 in Games

This blog post describes the level generator for my puzzle game Linjat. The post is standalone, but might be a bit easier to digest if you play through a few levels. The source code is available; anything discussed below is in src/main.cc.

A rough outline of this post:

Linjat is a logic game of covering all the numbers and dots on a grid with lines.
The puzzles are procedurally generated by a combination of a solver, a generator, and an optimizer.
The solver tries to solve puzzles the way a human would, and assign a score for how interesting a given puzzle is.
The puzzle generator is designed such that it's easy to change one part of the puzzle (the numbers) and have other parts of the puzzle (the dots) get re-organized such that the puzzle remains solvable.
A puzzle optimizer repeatedly solves levels and generates new variations from the most interesting ones that have been found so far.

Optimizing a breadth-first search

Posted on 2018-07-23 in Games

A couple of months ago I finally had to admit I wasn't smart enough to solve a few of the levels in Snakebird, a puzzle game. The only way to salvage some pride was to write a solver, and pretend that writing a program to do the solving is basically as good as having solved the problem myself. The C++ code for the resulting program is on Github. Most of what's discussed in the post is implemented in search.h and compress.h. This post deals mainly with optimizing a breadth-first search that's estimated to use 50-100GB of memory to run on a memory budget of 4GB.

There will be a follow up post that deals with the specifics of the game. For this post, all you need to know is that that I could not see any good alternatives to the brute force approach, since none of the usual tricks worked. There are a lot of states since there are multiple movable or pushable objects, and the shape of some of them matters and changes during the game. There were no viable conservative heuristics for algorithms like A* to narrow down the search space. The search graph was directed and implicit, so searching both forward and backward simultaneously was not possible. And a single move could cause the state to change in a lot of unrelated ways, so nothing like Zobrist hashing was going to be viable.

A back of the envelope calculation suggested that the biggest puzzle was going to have on the order of 10 billion states after eliminating all symmetries. Even after packing the state representation as tightly as possible, the state size was on the order of 8-10 bytes depending on the puzzle. 100GB of memory would be trivial at work, but this was my home machine with 16GB of RAM. And since Chrome needs 12GB of that, my actual memory budget was more like 4GB. Anything in excess of that would have to go to disk (the spinning rust kind).

Numbers and tagged pointers in early Lisp implementations

Posted on 2017-09-04 in Lisp, History

There was a bit of discussion on HN about data representations in dynamic languages, and specifically having values that are either pointers or immediate data, with the two cases being distinguished by use of tag bits in the pointer value:

If there's one takeway/point of interest that I'd recommend looking at, it's the novel way that Ruby shares a pointer value between actual pointers to memory and special "immediate" values that simply occupy the pointer value itself [1].
This is usual in Lisp (compilers/implementations) and i wouldn't be surprised if it was invented on the seventies once large (i.e. 36-bit long) registers were available.

I was going to nitpick a bit with the following:

The core claim here is correct; embedding small immediates inside pointers is not a novel technique. It's a good guess that it was first used in Lisp systems. But it can't be the case that its invention is tied into large word sizes, those were in wide use well before Lisp existed. (The early Lisps mostly ran on 36 bit computers.)

It seems more likely that this was tied into the general migration from word-addressing to byte-addressing. Due to alignment constraints, byte-addressed pointers to word-sized objects will always have unused bits around. It's harder to arrange for that with a word-addressed system.

But the latter part of that was speculation, maybe I should try to check the facts first before being tediously pedantic? Good call, since that speculation was wrong. Let's take a tour through some early Lisp implementations, and look at how they represented data in general, and numbers in particular.

Why PS4 downloads are so slow

Posted on 2017-08-19 in Networking, Games

Game downloads on PS4 have a reputation of being very slow, with many people reporting downloads being an order of magnitude faster on Steam or Xbox. This had long been on my list of things to look into, but at a pretty low priority. After all, the PS4 operating system is based on a reasonably modern FreeBSD (9.0), so there should not be any crippling issues in the TCP stack. The implication is that the problem is something boring, like an inadequately dimensioned CDN.

But then I heard that people were successfully using local HTTP proxies as a workaround. It should be pretty rare for that to actually help with download speeds, which made this sound like a much more interesting problem.

The mystery of the hanging S3 downloads

Posted on 2017-07-20 in Networking

A coworker was experiencing a strange problem with their Internet connection at home. Large downloads from most sites worked fine. The exception was that downloads from a Amazon S3 would get up to a good speed (500Mbps), stall completely for a few seconds, restart for a while, stall again, and eventually hang completely. The problem seemed to be specific to S3, downloads from generic AWS VMs were ok.

What could be going on? It shouldn't be a problem with the ISP, or anything south of that: after all, connections to other sites were working. It should not be a problem between the ISP and Amazon, or there would have been problems with AWS too. But it also seems very unlikely that S3 would have a trivially reproducible problem causing large downloads to hang. It's not like this is some minor use case of the service.

If it had been a problem with e.g. viewing Netflix, one might suspect some kind of targeted traffic shaping. But an ISP throttling or forcibly closing connections to S3 but not to AWS in general? That's just silly talk.

The normal troubleshooting tips like reducing the MTU didn't help either. This sounded like a fascinating networking whodunit, so I couldn't resist butting in after hearing about it through the grapevine.

I don't want no 'wantarray'

Posted on 2017-07-18 in Perl

A while back, I got a bug report for json-to-multicsv. The user was getting the following error for any input file, including the one used as an example in the documentation:

    , or } expected while parsing object/hash, at character offset 2 (before "n")

The full facts of the matter were:

The JSON parser was failing on the third character of the file.
That was also the end of the first line in the file. (I.e. the first line of the JSON file contained just the opening bracket).
The user was running it on Windows.
The same input file worked fine for me on Linux.

The origins of XXX as FIXME

Posted on 2017-04-17 in History

The token XXX is frequently used in source code comments as a way of marking some code as needing attention. (Similar to a FIXME or TODO, though at least to me XXX signals something far to the hacky end of the spectrum, and perhaps even outright broken).

It's a bit of an odd and non-obvious string though, unlike FIXME and TODO. Where did this convention come from? I did a little bit of light software archaeology to try to find out. To start with, my guesses in order were:

MIT (since it sometimes feels like that's the source of 90% of ancient hacker shibboleths)
Early Unix (probably the most influential codebase that's ever existed)
Some kind of DEC thing (because really, all the world was a PDP)