- Published on
The paper 'Looking Inward: Language Models Can Learn About Themselves by Introspection' explores the concept of introspection in large language models (LLMs). The authors define introspection as the ability of LLMs to acquire knowledge about their internal states, which is not derived from their training data. This capability could enhance model interpretability and potentially allow models to self-report on their internal states, such as subjective feelings or desires.