“[T]here is no technique that would allow us to lay out in any satisfactory way what kinds of knowledge, reasoning, or goals a model is using when it produces some output.” – Sam Bowman
Most recent ML papers start with a long description of how Transformers have been incredibly successful in a huge variety of tasks. Capabilities are advancing rapidly, but our understanding of how Transformers do what they do is limited. Recognizing this gap, our project boldly contributes to the field of mechanistic interpretability. In particular, we focus on the importance of search and goal representations in transformers.