UnSearch

demo

Transformers Use Causal World Models in Maze-Solving Tasks

Using sparse autoencoders and attention analysis, we discover and intervene on world models in maze-solving transformers

Structured World Representations in Maze-Solving Transformers

We train transformers on mazes and use linear probing to show that they form internal representations of the entire maze, and find evidence for Adjacency Heads which attend to valid "next moves"