Next Token Prediction is a Misleading Term

I’m fed up of hearing about how LLMs are next token predictors, and therefore they <cannot do some task> <aren’t really doing cognition> <are just guessing>.

There’s lots of philosophical objections, but fundamentally, framing AI as next token predictors in the first places is just misleading and inaccurate. Here’s why LLMs aren’t naive next token predictors.

Continue reading

The Lightyear Race

Earth, proudly twinned with Rigel, went the joke.

When intelligent life on Rigel was first discovered, it was a sensation. The newly christened Really Improbably Large Interferometer had just picked up on the barest traces of radio signals from the planet. In mystery, there was speculation, possibility, hope.

As investigation progressed, every bit of information was a revelation. Rigel has an oxygen atmosphere! Rigellians must have discovered radio pretty recently! Rigel must get fantastic meteor showers from a billion year old lunar collision!

But excitement dimmed to the brutal reality of physics. At 1137 light years away, Rigel was to be looked at, not touched. We scanned, and listened and analysed every scrap of information. We broadcast, lasered and launched every signal and probe, of welcome and jubilation. But that was the end of it, there was nothing left to do but wait. We had to make do with a millennium-old image of our neighbors, who couldn’t have even yet received any indication of Earth’s existence.

We quickly realized we were still alone in the universe. For what does “alone” mean, if not “no one to talk to”.


We felt proud of our neighbors, as we watched them develop. At this distance, we could tell little of their biology or culture. But we could see telltale signs of technology. From the clear view from our orbital factories, we saw their atmosphere clear as they transitioned from fossil fuels and nuclaics to clean fusion. By the time our terraforming of Mars was complete, we saw their first space flight.

The Rigelians must have been a cautious or incurious species. They had a head start on us in time, but their technology progressed slowly. By the time we’d have a chance to meet, it would be on our terms.

The Union of Man was forty planets strong when we saw the first signs of their own interstellar expansion. Both our space empires were burgeoning in all directions, as raw resources fed into automated factories that fed into the next line of probes and terraformers. And that’s when things started to go amiss.

Fusion drives enabled low persistent acceleration for years with a well provisioned ship. Longer trip times enabled higher top speeds, and technology developments from the heartland could be beamed out to ships to enable them to improve on the fly. Over larger time scales, expansion becomes bottlenecked by the speed of light.

We found their progress surprising. As they got closer, we could resolve individual ships from the distinctive plumes they emit. But their ships were always a bit faster than you’d expect. Then, a lot faster. They managed to cross 300 light years towards us, nearly a third of the distance, in 250 years!

It wasn’t some undiscovered faster-than-light technology that enabled this. Their ships were still stuck at tiny accelerations that built up over years. It took their fastest probes 550 years to cross that 300ly distance, travelling for much of it above 60% the speed of light. But as the destination was 300 light-years closer, we had less delay receiving the signals at end of the journey than the start. That difference in delay appeared to us as the journey happening about faster on our telescope’s receivers than it happened in real-time, a distortion even larger than the relativistic effects the travellers were experiencing.

In other words, we thought they were 1000 light-years away, far from any possible interaction with us. And indeed they were… one thousand years ago. The closer ships get to the speed of light, travelling directly towards an observer, the smaller the delay between the light that heralds your arrival and your arrival itself. With enough speed, you can chase the light bubble and arrive practically at the same time.

In fact, the only parts of a journey that gave any advance warning at all are the start and end, when you are travelling slow enough. A thousand light years or a million would give roughly the same advance notice. By the time we say them travel 300 ly of 1000 we only had 80 years left until their arrival of the full destination.


At least, it would have been, if they had planned to come to a stop on arrival. The Rigelian’s had other plans. Once they became aware of our existence, they altered flight paths, to travel half the journey, then go dark. For 100 years, probes had been cruising invisible near the speed of light, targeting Earth, and every plausible nearby planet in the light cone.

Without active emissions beyond subtle nudges, these objects were impossible to detect until they started interacting with interstellar dust emitting hard x-rays which kept barely ahead of the missiles themselves. The notice time was in minutes, then they struck the plantets surface with the accumulated kinetic energy of the constant acceleration. No message passed from planet to planet could outspeed the missiles themselves.

At a stroke, the Union of Man found every planetary outpost razed to the ground. The Rigelian’s themselves have still yet to arrive, but our remnants stand no chance against such advance planning.

We broadcast this last message as a lesson and a warning. But a futile one: by the time you receive it – it’s already too late.

Good Software Doesn’t Double Check

I’ve been wrestling with the balance between vibe coding and high touch manual coding these days. At the level of quality I’m trying to output, you cannot delegate everything to agents. But you want to do so as much as possible.

To do this effectively means developing a new set of coding smells. The original coding smells were quick tells of typical coding problems. They indicated muddled thinking, structural problems, or technical debt. But the sort of errors agents make are different and need a new set of nasal receptors. That is, for the next few months until a new crop of tools makes current wisdom obsolete again.

Continue reading

Infinite Random Rectangles – the Poisson Rect process

Previously, we looked at how to sample points randomly at a given density across an infinite plane.

It’s harder than it sounds, as I was looking for an algorithm that was not biased by the size/shape of the chunks used to calculate it.

Today let’s extend that to filling the infinite plane with random non-overlapping rectangles. As before, that means finding a deterministic chunked algorithm that we can prove is unaffected by the choice of chunking.

Continue reading