about

I am an MSc student in Logic and Computation at the University of Amsterdam, working with the CALM Lab. Before that, I studied Philosophy, Logic and Scientific Method at LSE. My research asks whether large language models possess human-interpretable semantic concepts, or whether the structure we observe in their representations is something fundamentally different from how humans carve up meaning.

research

I study faithful concept representation in large language models through causal abstraction. Causal methods can reliably locate a subspace that is causally efficacious — intervene on it and the model's behavior changes as predicted — yet causal sufficiency says nothing about what that subspace actually is. This is the interpretability gap I work on: a causally efficacious direction can masquerade as a human concept while encoding something machine-specific, and the open question is whether we can name what we have found in our own vocabulary, or whether we cannot.

publications

contact