1 minute read

The Diffuse Project is developing new methods for collecting and modeling protein dynamics. One of the most exciting aspects of this emerging frontier is that the details of these ensembles are often already hidden in plain sight, embedded within the experimental data used to generate the static structures. We hypothesize that if we can learn directly from this raw experimental data, we may be able to more accurately reconstruct a macromolecule’s conformational ensemble.

At present, we often model ensembles separately from Bragg peaks, the sharp, well-defined signals from crystallography, and diffuse scattering, the information-rich “background” patterns. The results rarely match. We aim to change that by transitioning to a paradigm where we directly learn from experimental data, integrating both Bragg and diffuse scattering to create consistent, physically grounded models of a macromolecule’s ensemble.

Our approach begins by improving how experimental data is integrated into modeling, building tools that incorporate both Bragg and diffuse data into optimization and machine learning loss functions and validation metrics, and improving algorithms for ensemble modeling directly from Bragg data. We are also moving towards developing machine learning algorithms that train directly on experimental data rather than using it only in the loss function.

Ultimately, we envision a representation learning framework that dissolves the boundaries between experimental modalities, bringing Bragg, diffuse, and other structural data types into a single, shared space. Within this unified representation, molecular dynamics simulations informed by diffuse data will flow seamlessly into Bragg-based training and inference, allowing the strengths of each approach to amplify the other. By enabling AI models to learn jointly from heterogeneous datasets, we can unlock new levels of predictive accuracy, reveal hidden relationships between data types, and open the door to true cross-modality discovery in structural biology.

Updated: