UnSearch

Research

Here you’ll find our latest research projects and publications.

Transformers Use Causal World Models in Maze-Solving Tasks

Using sparse autoencoders and attention analysis, we discover and intervene on world models in maze-solving transformers

Structured World Representations in Maze-Solving Transformers

We train transformers on mazes and use linear probing to show that they form internal representations of the entire maze, and find evidence for Adjacency Heads which attend to valid "next moves"

A Configurable Library for Generating and Manipulating Maze Datasets

The paper accompanying our maze-dataset library, which was used in our first two papers and is publically available.

Understanding Mesa-optimization Using Toy Models

LessWrong post detailing and motivating our Research Agenda.