Mette Friis Andersen

about

I am an MSc student in Logic and Computation at the University of Amsterdam, working with the CALM Lab. Before that, I studied Philosophy, Logic and Scientific Method at LSE. My research asks whether large language models possess human-interpretable semantic concepts, or whether the structure we observe in their representations is something fundamentally different from how humans carve up meaning.

research

I study faithfulness in large language models — whether what we observe in a model's internals actually reflects how it behaves. Specifically, I distinguish concept faithfulness (whether a representation genuinely encodes the concept it appears to encode) from reasoning faithfulness (whether the model's outputs actually follow from those representations).

publications

Three Desiderata for Faithfulness in Machine Learning Explanations: The Case for Causal Abstraction Mette Friis Andersen, Maria Heuss, Ana Lucic · NeurIPS 2025 Workshop on Mechanistic Interpretability
Causal Sufficiency Without Semantic Alignment: How Causal Subspaces Can Masquerade as Semantic Concepts Mette Friis Andersen et al. · under review
The Geometry of Metaphor Mette Friis Andersen et al. · under review at EMNLP
AntropoScore Mette Friis Andersen et al. · in preparation

about

research

publications

contact