Structured World Representations in Maze-Solving Transformers
We train transformers on mazes and use linear probing to show that they form internal representations of the entire maze, and find evidence for Adjacency Heads which attend to valid "next moves"
External Link
