More Accelerated Game of Life

2025-09-212025-09-21 | Boris

I got a good comment on my previous article about implementing the Game of Life in CUDA pointing out that I was leaving a lot of performance at the table by only considering a single step at once.

Their point was that my implementations were bound by the speed of DRAM. An A40 can send 696 GB/s from DRAM memory to the cores, and my setup required sending at least one bit in each direction per-cell per-step, which worked out at 1.4ms.

But DRAM is the slowest memory on a graphics card. The L1 cache is hosted inside each Streaming Multiprocessor, much closer to where the calculations can occur. It’s tiny, but has an incredible bandwidth – potentialy 67 TB/s. Fully utilizing this is nigh impossible, but even with mediocre utilization we’d do far better than before.

Continue reading →

The Culture Novels as a Dystopia

2025-09-142025-09-16 | Boris

A couple of people have mentioned to me: “we need more fiction examples of positive AI superintelligence – utopias like the Culture novels”. And they’re right, AI can be tremendously positive, and some beacons lit into the future could help make that come around.

But one of my hobbies is “oppositional reading” – deliberately interpreting novels counter to the obvious / intended reading. And it’s not so clear to me that the Culture is all it is cracked up to be.

Continue reading →

Accelerated Game Of Life with CUDA / Triton

2025-09-112025-09-21 | Boris

Let’s look at implementing Conway’s Game of Life using a graphics card. I want to experiment with different libraries and techniques, to see how to get the best performance. I’m going to start simple, and get increasingly complex as we dive in.

The Game Of Life is a simple cellular automata, so should be really amenable to GPU acceleration. The rules are simple: Each cell in the 2d grid is either alive or dead. At each step, count the alive neighbours of the cell (including diagonals). If the cell is alive, it remains alive if 2 or 3 neighbours are alive. Otherwise it dies. If the cell is dead, it comes ot life if exactly 3 neighbours are alive. These simple rules cause an amazing amount of emergent complexity which has been written about copiously elsewhere.

For simplicity, I’ll only consider N×N grids, and skip calculations on the boundary. I ran everything with an A40, and I’ll benchmark performance at N=2¹⁶ . For now, we’ll store each cell as 1 byte so this array is which equates to 4 GB of data.

All code is shared in the GitHub repo.

Continue reading →

Claude is a Ravenclaw

2025-07-042025-07-07 | Boris

I’m in the midst of doing the MATS program which has kept me super busy, but that didn’t stop me working on resolving the most important question of our time: What Hogwarts House does your chatbot belong to?

Continue reading →

My Failed AI Safety Research Projects (Q1/Q2 2025)

2025-06-192025-06-19 | Boris

This year I’ve been on sabbatical, and have spent my time upskilling in AI Safety. Part of that is doing independent research projects in different fields.

Some of those items have resulted in useful output, notably A Toy Model of the U-AND Problem, Do No Harm? and SAEs and their Variants.

And then there are others that I’ve just failed fast, and moved on.

I’ve detailed those projects that still have something to say, even if it’s mostly negative results.

Find them on LessWrong.

A Technique of Pure Reason

2025-06-042025-06-19 | Boris

Looking a little ahead into the future, I think LLMs are going to stop being focused on knowledgeable, articulate chatbots, but instead be more efficient models that are weaker in these areas than current models, but relatively stronger at reasoning, a pure-reasoner model. The rest will be bolted on via tool-use and other scaffolding.

TransformerLens Quick Reference

2025-03-292025-05-07 | Boris

TranformerLens is a Python library for Mechanistic Interpretability. It’s got some great tutorials… but they are all kinda verbose. Here’s a cheatsheet of all the common things you’ll want from the library. Click the links for more details.

Continue reading →

Computational Superposition in a Toy Model of the U-AND Problem

2025-03-272025-03-27 | Boris

I’ve been working on some AI Safety research. It’s kinda dense for a blog, so I’m hosting elsewhere.

It’s investigation into how ML models do boolean at the most fundamental level. Under an assumption of feature sparsity, which is common for large models, certain patterns appear.

Read on Less Wrong

Running Tracery bots with LLMs

2025-03-082025-03-08 | Boris

Tracery bots were a fun, simple, way of making generative texts. They are basically an easy way to specify generative grammars via a simple JSON file format. There used to be a horde of fun little tracery bots on twitter until API changes shut them all down.

Darcy is saying "You are well-educated, but no cromulent enough to intrigue me"

Nowadays, you can prompt a chatbot to get whatever you want. But that lacks the same charm, and it doesn’t give you the control you’d want for something unleashed on the internet. Let’s do something about that.

Continue reading →

Generating Tilesets with Stable Diffusion

2025-02-042025-02-28 | Boris

Recently I’ve been playing around more with gen AI techniques. I thought I’d try to generate a set of tiles that all connect together. It’s harder than it sounds – Stable Diffusion is hard to control, so there’s no easy way to get a set of images that are fully consistent with one another.

I’ve developed a technique for doing it that I’ll call Non-Manifold Diffusion as it involves doing diffusion over a set of patches that interlock to form a non-manifold surface.

Continue reading →