Machine Psychology

A lot of recent LLM research has focused on different prompting techniques that use surface-level manipulations to nudge the model to improve its own reasoning via methods like chain-of-thought (CoT) reasoning or self-reflection. As I read through these papers, it all felt incredibly intuitive — almost suspiciously intuitive. And then I realized why: it didn’t feel like I was reading an ML paper.

I stand by what might be a controversial take: CoT reasoning and self-reflection papers should be classified as psychology research rather than CS research. Sure, everything prior and leading up to the implementation of the models’ underlying architecture is CS stuff. But playing with inputs and outputs of existing models, coaching the model like a human, running evals on its thought process, observing and manipulating how a system responds to various stimuli (prompts) under different conditions... isn’t that basically just behavioral experiments on LLMs? It’s less about improving the model’s underlying structure and more about optimizing how it interacts with us. Sorta like therapy or personal coaching sessions, but for machines. Maybe we need to create a new field: call it machine psychology. I could imagine there should be at least some slightly interesting work that can come out of probing the mental models of working AI agents.