• Abstract Intrinsically disordered proteins and regions (collectively IDRs) are found across all kingdoms of life and have critical roles in virtually every eukaryotic cellular process1. • IDRs exist in a broad ensemble of structurally distinct conformations. • This structural plasticity facilitates diverse molecular recognition and function2,3,4. • Here we combine advances in physics-based force fields with the power of multi-modal generative deep learning to develop STARLING, a framework for rapid generation of accurate IDR ensembles and ensemble-aware representations from sequence. • STARLING supports environmental conditioning across ionic strengths and demonstrates proof of concept for the interpolative ability of generative models beyond their training domain. • Moreover, we enable ensemble refinement under experimental constraints using a Bayesian maximum-entropy reweighting scheme.
Article Summaries:
- A new computational framework, STARLING, has been introduced to generate accurate ensembles of intrinsically disordered protein regions (IDRs) from sequence alone. STARLING blends physics‑based force fields with multi‑modal generative deep learning, allowing rapid sampling of IDR conformations under varying ionic conditions and enabling interpolation beyond its training data. The method incorporates Bayesian maximum‑entropy reweighting to refine ensembles with experimental constraints. Beyond characterization, STARLING’s latent representations support rapid search for biophysical “look‑alikes” and accelerate sequence‑design workflows from weeks to seconds, making large‑scale IDR studies more accessible. The authors benchmarked STARLING against existing experimental data, demonstrating its accuracy and potential to aid hypothesis generation in IDR research.
Sources: