A flaw in using pretrained protein language models in protein-protein interaction inference models

• Nature Machine Intelligence, Published online: 13 February 2026; doi:10.1038/s42256-025-01176-7 The usage of pretrained protein language models (pLMs) is rapidly growing. • However, Szymborski and Emad find that pretrained pLMs can be a source of data leakage in the task of protein-protein interaction inference, showing inflated performance scores.

Article Summaries:

Nature Machine Intelligence, Published online: 13 February 2026; doi:10.1038/s42256-025-01176-7 The usage of pretrained protein language models (pLMs) is rapidly growing. However, Szymborski and Emad find that pretrained pLMs can be a source of data leakage in the task of protein-protein interaction inference, showing inflated performance scores.

Sources:

https://www.nature.com/articles/s42256-025-01176-7