about
I am an MSc student in Logic and Computation at the University of Amsterdam, working with the CALM Lab. Before that, I studied Philosophy, Logic and Scientific Method at LSE. My research asks whether large language models possess human-interpretable semantic concepts, or whether the structure we observe in their representations is something fundamentally different from how humans carve up meaning.
research
I study faithfulness in large language models — whether what we observe in a model's internals actually reflects how it behaves. Specifically, I distinguish concept faithfulness (whether a representation genuinely encodes the concept it appears to encode) from reasoning faithfulness (whether the model's outputs actually follow from those representations).
publications
- Three Desiderata for Faithfulness in Machine Learning Explanations: The Case for Causal Abstraction
- Causal Sufficiency Without Semantic Alignment: How Causal Subspaces Can Masquerade as Semantic Concepts
- The Geometry of Metaphor
- AntropoScore