Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models

• Computer Science > Computation and Language [Submitted on 23 Jan 2026] Title:Do Personality Traits Interfere? • Geometric Limitations of Steering in Large Language Models View PDF HTML (experimental)Abstract:Personality steering in large language models (LLMs) commonly relies on injecting trait-specific steering vectors, implicitly assuming that personality traits can be controlled independently. • In this work, we examine whether this assumption holds by analysing the geometric relationships between Big Five personality steering directions. • We study steering vectors extracted from two model families (LLaMA-3-8B and Mistral-8B) and apply a range of geometric conditioning schemes, from unconstrained directions to soft and hard orthonormalisation. • Our results show that personality steering directions exhibit substantial geometric dependence: steering one trait consistently induces changes in others, even when linear overlap is explicitly removed. • While hard orthonormalisation enforces geometric independence, it does not eliminate cross-trait behavioural effects and can reduce steering strength.

Article Summaries:

A recent study on large language models (LLMs) challenges the assumption that personality traits can be steered independently. Researchers examined Big Five trait steering vectors in LLaMA‑3‑8B and Mistral‑8B, applying geometric conditioning from unconstrained to hard orthonormalisation. They found that steering one trait consistently alters others, even after removing linear overlap. Hard orthonormalisation enforces geometric independence but does not fully eliminate cross‑trait behavioural effects and can weaken steering strength. The results suggest that personality directions occupy a slightly coupled subspace, limiting the ability to control traits in isolation.

Sources:

https://arxiv.org/abs/2602.15847