Transformers Use Causal World Models in Maze-Solving TasksUsing sparse autoencoders and attention analysis, we discover and intervene on world models in maze-solving transformersBlog Post
Structured World Representations in Maze-Solving TransformersWe train transformers on mazes and use linear probing to show that they form internal representations of the entire maze, and find evidence for Adjacency Heads which attend to valid "next moves"External Link