SOMtime the World Ain$'$t Fair: Violating Fairness Using Self-Organizing Maps

• Computer Science > Artificial Intelligence [Submitted on 20 Feb 2026] Title:SOMtime the World Ain$’$t Fair: Violating Fairness Using Self-Organizing Maps View PDF HTML (experimental)Abstract:Unsupervised representations are widely assumed to be neutral with respect to sensitive attributes when those attributes are withheld from training. • We show that this assumption is false. • Using SOMtime, a topology-preserving representation method based on high-capacity Self-Organizing Maps, we demonstrate that sensitive attributes such as age and income emerge as dominant latent axes in purely unsupervised embeddings, even when explicitly excluded from the input. • On two large-scale real-world datasets (the World Values Survey across five countries and the Census-Income dataset), SOMtime recovers monotonic orderings aligned with withheld sensitive attributes, achieving Spearman correlations of up to 0.85, whereas PCA and UMAP typically remain below 0.23 (with a single exception reaching 0.31), and against t-SNE and autoencoders which achieve at most 0.34. • Furthermore, unsupervised segmentation of SOMtime embeddings produces demographically skewed clusters, demonstrating downstream fairness risks without any supervised task. • These findings establish that \textit{fairness through unawareness} fails at the representation level for ordinal sensitive attributes and that fairness auditing must extend to unsupervised components of machine learning pipelines.

Article Summaries:

A new study shows that unsupervised machine‑learning models can still reveal sensitive personal data, even when those attributes are omitted from training. Using a high‑capacity Self‑Organizing Map (SOMtime), researchers found that age and income emerge as dominant latent axes in purely unsupervised embeddings of two large datasets-the World Values Survey and the Census‑Income data. SOMtime produced Spearman correlations up to 0.85 with the withheld attributes, far exceeding PCA, UMAP, t‑SNE, or autoencoders, which stayed below 0.34. Unsupervised clustering of the SOMtime embeddings also yielded demographically skewed groups, highlighting downstream fairness risks. The work argues that “fairness through unawareness” is insufficient and that auditing must extend to unsupervised components. Code is publicly available.

Sources:

https://arxiv.org/abs/2602.18201