I’m in the midst of doing the MATS program which has kept me super busy, but that didn’t stop me working on resolving the most important question of our time: What Hogwarts House does your chatbot belong to?
Continue readingai
My Failed AI Safety Research Projects (Q1/Q2 2025)
This year I’ve been on sabbatical, and have spent my time upskilling in AI Safety. Part of that is doing independent research projects in different fields.
Some of those items have resulted in useful output, notably A Toy Model of the U-AND Problem, Do No Harm? and SAEs and their Variants.
And then there are others that I’ve just failed fast, and moved on.
I’ve detailed those projects that still have something to say, even if it’s mostly negative results.
Find them on LessWrong.
A Technique of Pure Reason
Looking a little ahead into the future, I think LLMs are going to stop being focused on knowledgeable, articulate chatbots, but instead be more efficient models that are weaker in these areas than current models, but relatively stronger at reasoning, a pure-reasoner model. The rest will be bolted on via tool-use and other scaffolding.
TransformerLens Quick Reference
TranformerLens is a Python library for Mechanistic Interpretability. It’s got some great tutorials… but they are all kinda verbose. Here’s a cheatsheet of all the common things you’ll want from the library. Click the links for more details.
Continue readingComputational Superposition in a Toy Model of the U-AND Problem
I’ve been working on some AI Safety research. It’s kinda dense for a blog, so I’m hosting elsewhere.
It’s investigation into how ML models do boolean at the most fundamental level. Under an assumption of feature sparsity, which is common for large models, certain patterns appear.
Running Tracery bots with LLMs
Tracery bots were a fun, simple, way of making generative texts. They are basically an easy way to specify generative grammars via a simple JSON file format. There used to be a horde of fun little tracery bots on twitter until API changes shut them all down.



Nowadays, you can prompt a chatbot to get whatever you want. But that lacks the same charm, and it doesn’t give you the control you’d want for something unleashed on the internet. Let’s do something about that.
Continue readingGenerating Tilesets with Stable Diffusion
Recently I’ve been playing around more with gen AI techniques. I thought I’d try to generate a set of tiles that all connect together. It’s harder than it sounds – Stable Diffusion is hard to control, so there’s no easy way to get a set of images that are fully consistent with one another.
I’ve developed a technique for doing it that I’ll call Non-Manifold Diffusion as it involves doing diffusion over a set of patches that interlock to form a non-manifold surface.
Continue readingMy Mental Model of AI Creativity – Creativity Kiki
I went to some lectures on the future of science in games recently, and the keynote speaker was Tommy Thompson, an well-known AI expert in the game dev space.
Of course, by AI, he didn’t mean the modern sort that dominates the news. His focus is AI for games, which is algorithmic and rarely involves any ML component. Still, he spoke about the challenges the industry faces regarding Image Generators, LLMs and so on. He specifically called LLMs “stochastic parrots”, which I found disappointing. Imho it’s an incredibly misleading model of what LLMs are capable of and is usually deployed to downplay their abilities and belittle them. But it’s a common view, particularly in creative industries.
So what is a better model? It’s clear that they are not that smart in most ways we consider important, but they do have some interesting capabilities. Here’s model I use that I feel give a better intuition for what they can and cannot do.
Continue readingUtilitarian Decision-Making in Models – Evaluation and Steering
An Uncanny Moat

Back in the early days of computer animation, the technology at the time really struggled with realism. The first cartoons were necessarily abstract, or cartoony.
As time progressed, the technology caught up. CGI now can be all but indistinguishable from real life. But there was a brief period, as seen in films like The Polar Express or Final Fantasy: The Spirits Within, when the artists aimed for realism and didn’t quite get there.
These films were often critically panned. Eventually, it became clear that the cause was quite deep in the human psyche. These films were realistic enough that we’d mentally classify the characters as real humans, but not so realistic that they actually looked normal. On an instinctive level, people reject these imposters far harder than more stylised graphics that don’t have the pretence of reality.
This phenomenon is known as the Uncanny Valley and has influenced visual design of fake people in films, robots, games etc.
For a time, the recent crop of image generators and LLMs fell into the same boat. Twisting people with the wrong number of fingers or teeth was a common source of derision. People are still puzzling over chatbots that can speak very coherently and yet make wild mistakes with none of the inner light you might expect from a real conversationalist.
Now, or at least very soon, AI threatens to cross that valley and advance up the gentle hills on the opposite side. Not only are we faced with a disinformation storm like nothing before, but AI is going to start challenging exactly how we consider personhood itself.
This is something we need to fight, in addition to all the other worries about AI. I don’t want to get into philosophical weeds about whether LLMs could be considered moral patients. But I think our society and thinking are structured around a clear human/non-human divide. Chatbots threaten to unravel that.
Continue reading