<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://diffuse.science/feed.xml" rel="self" type="application/atom+xml" /><link href="https://diffuse.science/" rel="alternate" type="text/html" /><updated>2026-03-09T22:21:30+00:00</updated><id>https://diffuse.science/feed.xml</id><title type="html">The DiffUSE Project</title><subtitle>The DiffUSE Project</subtitle><entry><title type="html">Shake It Up!</title><link href="https://diffuse.science/post/shake-it-up/" rel="alternate" type="text/html" title="Shake It Up!" /><published>2026-02-22T00:00:00+00:00</published><updated>2026-02-22T00:00:00+00:00</updated><id>https://diffuse.science/post/shake-it-up</id><content type="html" xml:base="https://diffuse.science/post/shake-it-up/"><![CDATA[<h2 id="md-simulations-of-changes-in-diffuse-scattering-depending-on-ligand-binding"><em>MD simulations of changes in diffuse scattering depending on ligand binding</em></h2>

<figure class="third ">
  
    
      <a href="/assets/images/posts/2026-02-23/Mac1_NoADPr.png" title="MD diffuse with CHES (buffer molecule)">
          <img src="/assets/images/posts/2026-02-23/Mac1_NoADPr.png" alt="MD diffuse with CHES" />
      </a>
    
  
    
      <a href="/assets/images/posts/2026-02-23/Mac1_WithADPr.png" title="MD diffuse with ADPr">
          <img src="/assets/images/posts/2026-02-23/Mac1_WithADPr.png" alt="MD diffuse with ADPr" />
      </a>
    
  
    
      <a href="/assets/images/posts/2026-02-23/Mac1_DeltaADPr.png" title="Difference ADPr - CHES">
          <img src="/assets/images/posts/2026-02-23/Mac1_DeltaADPrAniso.png" alt="Difference ADPr - CHES" />
      </a>
    
  
  
    <figcaption>MD simulations of Mac1 diffuse scattering change depending on ligand binding. <em>Left</em>. Mac1 with CHES (buffer molecule). <em>Center</em>. Mac1 with ADPr. <em>Right</em>. Difference.
</figcaption>
  
</figure>

<h2 id="what-did-we-find">What did we find?</h2>

<p>In a previous <a href="/post/in_the_cloud/">post</a> I talked about our plan to perform MD simulations of Mac1 crystals under different conditions. We had just started an MD simulation of Mac1 in complex with ADP-ribose (ADPr), and were waiting for it to finish. We weren’t sure how different it would be from the <a href="/post/lets-dance/">initial simulations</a> of Mac1 without ADPr, in which a CHES buffer occupies the ADPr binding site.</p>

<p>Now we know more. Visual comparisons (above images) show that the rich anisotropic diffuse diffuse features in the maps with CHES buffer (left) vs. ADPr (center) in the binding pocket have similar patterns of peaks and troughs that differ in detail. Subtracting them reveals the differences more clearly (right). Quantitative analysis shows that the variations in the difference map are comparable in strength to the intensities in either individual map. This result is encouraging as it indicates that such differences might be observed in experiments.</p>

<h2 id="its-a-trap">It’s a trap!</h2>

<p>Along the way we encountered a common pitfall in making these kinds of comparisons: due to an indexing ambiguity in the P43 space group, the structures of Mac1 used for simulations with and without ADPr were solved using different definitions of the lattice vectors, with the <em>h</em> and <em>k</em> axis swapped, and the <em>l</em> axis reversed (compare PDB IDs <a href="https://www.rcsb.org/structure/7TX0">7TX0</a> and <a href="https://www.rcsb.org/structure/7TX3">7TX3</a>). The diffUSE modeling team worked out how to make the simulated diffuse maps consistent at our recent <a href="/posts/allhands/">all hands meeting</a>, enabling us to perform a controlled comparison of the simulations.</p>

<h2 id="what-next">What next?</h2>

<p>The next step is to compare both of these simulations with data recently collected at CHESS (see <a href="https://diffuse.science/logbook/beamtime/20251105-chess/">logbook</a>), in one of a series of diffUSE beam times that are expected to yield a large number of datasets. These runs already have revealed that diffuse data are <a href="/posts/allhands/">reproducible between CHESS and ALS beamlines</a>. Data from Mac1 +/- ADPr are now in the processing pipeline; we’re eager to see how Mac1 diffuse scattering changes upon ligand binding, and whether MD simulations can help explain what we see.</p>

<hr />

<script src="https://giscus.app/client.js" data-repo="diff-use/diff-use.github.io" data-repo-id="R_kgDOPO07gg" data-category="General" data-category-id="DIC_kwDOPO07gs4CtV5I" data-mapping="title" data-strict="0" data-reactions-enabled="1" data-emit-metadata="0" data-input-position="bottom" data-theme="light" data-lang="en" crossorigin="anonymous" async="">
</script>

<noscript>Please enable JavaScript to view comments.</noscript>]]></content><author><name>Michael Wall</name><email>mewall00@gmail.com</email></author><category term="post" /><category term="diffuse scattering" /><category term="molecular dynamics" /><category term="modeling" /><category term="open science" /><category term="meta" /><summary type="html"><![CDATA[MD simulations of changes in diffuse scattering depending on ligand binding]]></summary></entry><entry><title type="html">DiffUSE January 2026 Retreat: From Coast to Coast, Diffuse Scattering Reproduces</title><link href="https://diffuse.science/posts/allhands/" rel="alternate" type="text/html" title="DiffUSE January 2026 Retreat: From Coast to Coast, Diffuse Scattering Reproduces" /><published>2026-02-02T00:00:00+00:00</published><updated>2026-02-02T00:00:00+00:00</updated><id>https://diffuse.science/posts/allhands</id><content type="html" xml:base="https://diffuse.science/posts/allhands/"><![CDATA[<div class="notice" style="font-style: italic;">
DiffUSE is a Radial Project by <a href="https://astera.org">Astera</a>. This initiative aims to make diffuse X-ray scattering a routine tool for understanding protein dynamics in basic biology and drug discovery.
</div>

<h2 id="why-this-retreat-mattered"><strong>Why This Retreat Mattered</strong></h2>

<p>In late January, the DiffUSE Project team gathered in person for our first progress meeting at Astera’s headquarters in Emeryville, California. Since our October online meeting, every team has made substantial progress.</p>

<p>The retreat brought together team members working on data collection, data processing, molecular dynamics simulations, machine learning modeling, infrastructure, and open science to assess progress against our six-month goals and chart the path forward.</p>

<p>Perhaps the most exciting development is a deceptively simple one: diffuse scattering data collected at CHESS (Cornell) and ALS (Berkeley) are reproducible. This cross-country validation marks a critical step toward making diffuse scattering a routine tool for structural biology.</p>

<h2 id="what-have-we-accomplished-since-october"><strong>What Have We Accomplished Since October?</strong></h2>

<h3 id="data-collection"><strong>Data Collection</strong></h3>

<p>Kara Zielinski (Fraser Lab, UCSF) reported on an intensive fall data collection campaign:</p>

<ul>
  <li><strong>9 beamtimes</strong> since project inception across two synchrotrons (CHESS and ALS)</li>
  <li><strong>18 participants</strong> contributed to data collection</li>
  <li><strong>7 protein systems</strong>: Mac1, NrdE, Lysozyme, DNA fibers, ATCase, Insulin, and Huwe1</li>
  <li><strong>129 “good” datasets</strong> collected (no data collection errors)</li>
</ul>

<p>The team systematically explored experimental perturbations:</p>

<ul>
  <li><strong>Temperature</strong>: Data collected at 100K (cryo), 220-275K (intermediate), and 310-315K (elevated), though sample handling for intermediate temperatures requires further optimization</li>
  <li><strong>Ligands</strong>: Mac1 + ADPr (11 datasets from CHESS and ALS combined) and Mac1 + small molecule “opener” (6 datasets from ALS)</li>
  <li><strong>Radiation damage mitigation</strong>: Vector scans implemented at CHESS to spread dose across radiation-sensitive samples like NrdE and ATCase</li>
</ul>

<p>Beamline-specific improvements included:</p>

<ul>
  <li><strong>ALS</strong>: Explored dose dependence, wavelength effects, and exposure time optimization; addressed collimator ring scatter issues at 14 keV</li>
  <li><strong>CHESS</strong>: Continued X-ray aperture optimization for background reduction</li>
</ul>

<p><img src="/assets/images/posts/2026-02-02/2026_crystals_spm_allhands.png" alt="2026_crystals_spm_allhands" title="Initial protein systems tested by the diffUSE team" />
<em>Slide shared at the DiffUSE project’s retreat showcasing crystals of initially tested protein systems.</em></p>

<p><strong>Data Processing</strong></p>

<p>Steve Meisburger (Cornell/CHESS) presented major advances in data processing tools and a landmark reproducibility result.</p>

<p><strong><a href="https://github.com/diff-use/mdx2">mdx2</a></strong> is an open-source software package for processing and analyzing diffuse X-ray scattering data. Development has accelerated with a new team (Steve Meisburger, Justin Biel, Joseph Lee) and modern development practices, including version control, issue tracking, and code review. Version 10.3 was released in December 2025 with:</p>

<ul>
  <li>Containerized deployment via <code class="language-plaintext highlighter-rouge">conda install -c conda-forge mdx2</code></li>
  <li>Jupyter Lab environment integration</li>
  <li>Live processing capability on Voltage Park during beam times</li>
</ul>

<p><strong>Reference datasets from CHESS</strong> now span multiple systems (Mac1, NrdE, DNA, ATCase, Insulin) with systematic tracking through integration, merging, and fine map generation stages.</p>

<p><strong>The headline result</strong>: Diffuse scattering is reproducible between CHESS and ALS. Side-by-side comparisons of Mac1 diffuse maps from both beamlines show consistent features, validating that the signal is robust across different detector systems, beam profiles, and facilities. This East-meets-West reproducibility is foundational for any future multi-site data collection campaigns.</p>

<p><img src="/assets/images/posts/2026-02-02/reproducible_ds_allhands.png" alt="Scattering comparison across beamlines" title="DiffUSE Scattering is reproducible across coasts!" />
<em>Slides shared at the DiffUSE project’s retreat showcasing reproducibility across beamlines.</em></p>

<p>Additional findings from DNA crystal analysis revealed that correlated disorder differs between room temperature and 100K conditions, even when the static structures appear similar—and that diffuse signal extends beyond the Bragg resolution limit, suggesting untapped information content.</p>

<p>A <strong>Galaxy platform prototype</strong> was demonstrated, pointing toward a vision of “Cryosparc for diffuse,” making diffuse data processing accessible through a GUI with integrated workflows and interactive visualizations.</p>

<h3 id="molecular-dynamics-simulations"><strong>Molecular Dynamics Simulations</strong></h3>

<p>Mike Wall presented substantial progress on crystallographic MD simulations.</p>

<p><strong>Apo Mac1 baseline results</strong> show exceptional agreement between simulation and experiment:</p>

<ul>
  <li>Total correlation coefficient: CC = 0.96</li>
  <li>Anisotropic correlation coefficient: CC = 0.56</li>
  <li>Simulation: 2×2×2 supercell with OPC3 waters (279,004 atoms), neutron crystal structure 7TX3, 1100 ns unrestrained trajectory</li>
</ul>

<p><strong>MD optimization methods</strong> are advancing on two fronts:</p>

<ul>
  <li><strong>Enrichment</strong>: Selectively removing MD frames to increase diffuse correlation</li>
  <li><strong>Reweighting</strong>: Using JAX to optimize frame weights via differentiable Pearson CC maximization. Initial test on experimental diffuse data achieved CC = 0.97 with 47,150 reflections to 3.5 Å resolution (<a href="https://github.com/diff-use/sampleworks">work</a> by Karson Chrispens, documented in a <a href="https://diffuse.science/posts/jax_refine/">DiffUSE blog post</a>)</li>
</ul>

<p><strong>Ligand perturbations</strong> are now being simulated: Mac1 + ADPr shows distinct diffuse patterns compared to baseline Mac1, with protonation state variations (ASP157 → ASH157) under investigation.</p>

<p><strong>Second system</strong>: Dihydrofolate reductase (DHFR, PDB: 7FPV) is being developed as a generalization target, expanding beyond the Mac1 test case.</p>

<p><strong>Simulated diffraction</strong> capabilities using nanoBragg (James Holton) enable validation of data processing pipelines—simulated diffuse intensity can be extracted using mdx2, closing the loop between simulation and experiment.</p>

<p><img src="/assets/images/posts/2026-02-02/DHFR_allhands.png" alt="Simulated diffuse intensity of PDB 7FPV" title="A new protein system is being tested for simulated diffuse intensity, PDB 7FPV" />
<em>Slides shared at the DiffUSE project’s retreat showcasing simulation results of a new protein system, Dihydrofolate reductase (DHFR, PDB: 7FPV).</em></p>

<h3 id="machine-learning-modeling"><strong>Machine Learning Modeling</strong></h3>

<p>Marcus Collins presented the ML modeling roadmap focused on using experimental data to reveal hidden protein conformations.</p>

<p><strong>Key insight</strong>: Current ML structure predictors (AlphaFold3-like models, including Boltz-2, Protenix, RF3) do not reliably predict alternate conformations (altlocs) even with multiple random seeds, indicating they have not learned about underlying ensembles. This gap motivates developing density-guided ensemble generation (Sampleworks).</p>

<p><strong>Density guidance approach</strong>: The team is implementing training-free guidance from experimental density maps (2Fo-Fc), using the difference between experimental and calculated maps to steer diffusion model sampling toward conformations consistent with crystallographic data. Early results are promising but mixed: Boltz-2 with density guidance can capture both altlocs in some test cases like PTP1B (6B8X), though performance varies across systems.</p>

<p><strong>Sampleworks pipeline</strong> is being built as a plug-and-play guidance framework to use different structure prediction models, experimental data, and guidance strategies.</p>

<ul>
  <li>Model wrappers implemented for RF3, Protenix, Boltz-1, and Boltz-2 (MD and X-ray modes)</li>
  <li>Initial test set of ~50 structures from PDB prepared with altlocs; electron density maps being generated</li>
  <li>Evaluation metrics: RSCC, LDDT, clash scores, backbone and sidechain geometry</li>
</ul>

<p><strong>Water modeling</strong> emerges as a critical challenge for advancing to reciprocal space. Our first attempt is to improve the modeling of explicit solvent. Current models achieve ~0.3 precision/recall at 0.5 Å—insufficient for improving Rwork/Rfree. The team is exploring flow-matching approaches and evaluating whether a single unified model or separate protein/water models will be more effective. Ordered waters coupled to protein altlocs are particularly important targets.</p>

<h3 id="infrastructure-and-publishing"><strong>Infrastructure and Publishing</strong></h3>

<p>Justin Biel presented the computational infrastructure supporting DiffUSE, built around a three-pillar model: Data, Compute, and Publishing.</p>

<p><strong>Compute Infrastructure</strong> uses Voltage Park as the backbone:</p>

<ul>
  <li>H100 SXM5 GPUs available via bare metal (8× GPU configurations)</li>
  <li>Two usage patterns supported:
    <ul>
      <li><strong>Workspaces</strong>: Interactive environments for experimental work, debugging, and visualization</li>
      <li><strong>Workflows</strong>: Hardened, scalable pipelines for production analysis</li>
    </ul>
  </li>
  <li>The DiffUSE web app now provides resource checkout, visibility into running resources, and SSH/Jupyter access</li>
  <li>Custom container management enables workspace pausing and environment customization</li>
  <li>Workflow orchestration via Prefect and Docker</li>
</ul>

<p><strong>Data Infrastructure</strong> centers on the DiffUSE web app:</p>

<ul>
  <li><strong>Storage</strong>: Core Backblaze storage (S3-compatible) with OSN bucket integration for beamline data</li>
  <li><strong>Access</strong>: Automatic mounting to Voltage Park resources, plus web app download, CLI, Python SDK, and API</li>
  <li><strong>Metadata</strong>: Experiments have artifacts, optional markdown content (like logbook entries), relationships to other experiments, and tags</li>
  <li><strong>Automation</strong>: Beam-trip data automatically triggers experiment registration; dataset files populate metadata fields</li>
  <li><strong>Governance</strong>: Standards compliance checking, staging-to-public workflows, DOI attachment decisions</li>
</ul>

<p><strong>Publishing workflow</strong> discussions focused on:</p>

<ul>
  <li>When to stage data privately vs. make everything open immediately</li>
  <li>When to attach DOIs (content should be largely immutable)</li>
  <li>External database destinations: SBGrid Databank, PDB, Zenodo</li>
</ul>

<h3 id="open-science"><strong>Open Science</strong></h3>

<p>Prachee Avasthi (Head of Open Science, Astera) led a discussion on publishing expectations and open science practices. <strong>Discussion</strong> explored barriers to sharing, evidence of downstream reuse, orphan artifacts without ideal homes, and prioritization of unaddressed data sharing issues.</p>

<h2 id="reflections-on-our-distributed-model"><strong>Reflections on Our Distributed Model</strong></h2>

<p>This retreat underscored how the diffUSE’s distributed structure works. By embedding team members across institutions (Cornell, UCSF, Berkeley Lab, and beyond) we maintain direct access to beamlines, computational expertise, and scientific communities that would be impossible to replicate in a single location. The “Diffuse East ≈ Diffuse West” result is itself a product of this model: data collected by different teams at facilities 2,500 miles apart, processed with shared tools, yielding consistent results. Our infrastructure investments (the DiffUSE web app, Voltage Park compute, standardized containerized environments) bridge the geographic gaps, allowing a scientist at Cornell to spin up the same analysis environment as a colleague in California.</p>

<p>The in-person retreat revealed how much asynchronous collaboration had already accomplished, sessions focused on integration and next steps rather than catching people up. Open science practices (shared logbooks, blog posts, open repositories) keep everyone aligned between meetings. The challenge ahead is scaling this approach: as we add systems, datasets, and collaborators, maintaining the coherence that makes distributed work effective will require continued investment in documentation, automation, and the human connections that make a dispersed team feel like one group working toward a shared goal.</p>

<p>The science described here represents the output of a significant and coordinated resource investment. Since DiffUSE’s start in July, Astera has committed <span>$3.2M</span> to stand up the project: <span>$2.63M</span> in research grants distributed directly to our partner labs at Fraser Lab/LBL, Ando Lab/CHESS, and Wankowicz Lab, $567K in Astera personnel and contractor support, and <span>$30K</span> in computational infrastructure. On top of this, CHESS contributed an estimated <span>$700K</span> in beamtime, bringing the total resource investment to roughly <span>$3.9M</span>. Looking ahead, an additional <span>$2.4M</span> is projected for 2026 as the project scales toward its core scientific goals.</p>

<hr />

<p><img src="/assets/images/posts/2026-02-02/diffuse_demo_datamanagement.png" alt="The DiffUSE App is currently under development" title="the DiffUSE App, currently under development" />
<em>A screenshot from our data management infrastructure, demonstrated at the retreat. This is in active development with Prophet Town and Voltage Park.</em></p>

<p><img src="/assets/images/posts/2026-02-02/2026_allhands_pres.png" alt="Mike Wall presents progress on MD optimization of diffuse scattering" title="Mike Wall presents progress on MD optimization of diffuse scattering" />
<em>Mike Wall presents progress on MD optimization of diffuse scattering to a full house at the Astera Institute.</em></p>

<hr />

<h2 id="whats-next"><strong>What’s Next?</strong></h2>

<h3 id="data-collection-3-month-goals"><strong>Data Collection (3-month goals)</strong></h3>

<ul>
  <li>Collect data on additional systems; collect lysozyme at ALS</li>
  <li>Optimize sample handling for intermediate temperatures (oil-based approaches)</li>
  <li>Explore serial crystallography approaches (chip types, small wedges, crystal size variation)</li>
  <li>Continue investigating cryo options (traditional, NANUQ, high-pressure cryocooling)</li>
</ul>

<h3 id="data-processing-2026-goals"><strong>Data Processing (2026 goals)</strong></h3>

<ul>
  <li>Improve mdx2 performance (~2× speedup)</li>
  <li>Implement GOODVIBES and DISCOBALL in Python (JAX)</li>
  <li>Fully explore serial crystallography processing</li>
  <li>Deploy on Ando lab Galaxy server; add mdx2 tools</li>
  <li>Develop “Cryosparc for diffuse” project roadmap</li>
</ul>

<h3 id="md-simulations"><strong>MD Simulations</strong></h3>

<ul>
  <li>Continue model/data comparisons and refine MD models (protonation states, parameterization)</li>
  <li>Expand to new systems and additional ligand/mutation perturbations</li>
  <li>Explore how MD optimizations can support other DiffUSE activities (ML modeling, diffraction image simulation, data processing validation)</li>
</ul>

<h3 id="ml-modeling"><strong>ML Modeling</strong></h3>

<ul>
  <li>Scale up Sampleworks evaluation across initial test set</li>
  <li>Improve water prediction models (retrain SuperWater with better data, explore flow matching vs. diffusion)</li>
  <li>Quantify water model precision requirements by systematically perturbing well-supported waters</li>
  <li>Progress toward reciprocal space/Bragg peak guidance, ultimately targeting diffuse data guidance</li>
</ul>

<h3 id="infrastructure"><strong>Infrastructure</strong></h3>

<ul>
  <li>Finalize containerized workspace management with pause/resume capability</li>
  <li>Expand workflow orchestration options</li>
  <li>Refine data governance workflows for staging → public → external database publication</li>
</ul>

<h3 id="open-science-1"><strong>Open Science</strong></h3>

<ul>
  <li>Address identified barriers to sharing</li>
  <li>Establish timelines for DOI attachment and external database deposition</li>
  <li>Continue documentation through blog posts and logbooks</li>
</ul>

<p>Special thanks to Astera for hosting the retreat in Emeryville.</p>

<hr />

<h2 id="glossary"><strong>Glossary</strong></h2>

<table>
  <tr>
   <td><strong>Acronym</strong>
   </td>
   <td><strong>Definition</strong>
   </td>
  </tr>
  <tr>
   <td>ADPr
   </td>
   <td>Adenosine diphosphate ribose (a ligand)
   </td>
  </tr>
  <tr>
   <td>ALS
   </td>
   <td>Advanced Light Source (synchrotron at Lawrence Berkeley National Laboratory)
   </td>
  </tr>
  <tr>
   <td>API
   </td>
   <td>Application Programming Interface
   </td>
  </tr>
  <tr>
   <td>ASH
   </td>
   <td>Protonated aspartic acid residue
   </td>
  </tr>
  <tr>
   <td>ASP
   </td>
   <td>Aspartic acid residue
   </td>
  </tr>
  <tr>
   <td>ATCase
   </td>
   <td>Aspartate Transcarbamylase (enzyme)
   </td>
  </tr>
  <tr>
   <td>CC
   </td>
   <td>Correlation Coefficient
   </td>
  </tr>
  <tr>
   <td>CHESS
   </td>
   <td>Cornell High Energy Synchrotron Source
   </td>
  </tr>
  <tr>
   <td>CLI
   </td>
   <td>Command Line Interface
   </td>
  </tr>
  <tr>
   <td>DHFR
   </td>
   <td>Dihydrofolate Reductase (enzyme)
   </td>
  </tr>
  <tr>
   <td>DOI
   </td>
   <td>Digital Object Identifier
   </td>
  </tr>
  <tr>
   <td>GPU
   </td>
   <td>Graphics Processing Unit
   </td>
  </tr>
  <tr>
   <td>GUI
   </td>
   <td>Graphical User Interface
   </td>
  </tr>
  <tr>
   <td>JAX
   </td>
   <td>Just After eXecution (Google's autodiff/ML library for Python)
   </td>
  </tr>
  <tr>
   <td>keV
   </td>
   <td>Kiloelectronvolt (unit of X-ray energy)
   </td>
  </tr>
  <tr>
   <td>LDDT
   </td>
   <td>Local Distance Difference Test (structure quality metric)
   </td>
  </tr>
  <tr>
   <td>Mac1
   </td>
   <td>Macrodomain 1 (SARS-CoV-2 nonstructural protein 3)
   </td>
  </tr>
  <tr>
   <td>MD
   </td>
   <td>Molecular Dynamics
   </td>
  </tr>
  <tr>
   <td>ML
   </td>
   <td>Machine Learning
   </td>
  </tr>
  <tr>
   <td>NrdE
   </td>
   <td>Ribonucleotide Reductase class Ib alpha subunit (enzyme)
   </td>
  </tr>
  <tr>
   <td>ns
   </td>
   <td>Nanoseconds
   </td>
  </tr>
  <tr>
   <td>OPC3
   </td>
   <td>Optimal Point Charge 3-point water model
   </td>
  </tr>
  <tr>
   <td>OSN
   </td>
   <td>Open Storage Network
   </td>
  </tr>
  <tr>
   <td>PDB
   </td>
   <td>Protein Data Bank
   </td>
  </tr>
  <tr>
   <td>PTP1B
   </td>
   <td>Protein Tyrosine Phosphatase 1B (enzyme)
   </td>
  </tr>
  <tr>
   <td>RF3
   </td>
   <td>RoseTTAFold 3 (structure prediction model)
   </td>
  </tr>
  <tr>
   <td>Rfree
   </td>
   <td>Free R-factor (crystallographic validation metric)
   </td>
  </tr>
  <tr>
   <td>Rwork
   </td>
   <td>Working R-factor (crystallographic refinement metric)
   </td>
  </tr>
  <tr>
   <td>RSCC
   </td>
   <td>Real Space Correlation Coefficient
   </td>
  </tr>
  <tr>
   <td>S3
   </td>
   <td>Simple Storage Service (cloud storage protocol)
   </td>
  </tr>
  <tr>
   <td>SBGrid
   </td>
   <td>Structural Biology Software Grid (consortium)
   </td>
  </tr>
  <tr>
   <td>SDK
   </td>
   <td>Software Development Kit
   </td>
  </tr>
  <tr>
   <td>SSH
   </td>
   <td>Secure Shell (network protocol)
   </td>
  </tr>
  <tr>
   <td>UCSF
   </td>
   <td>University of California, San Francisco
   </td>
  </tr>
</table>

<hr />

<script src="https://giscus.app/client.js" data-repo="diff-use/diff-use.github.io" data-repo-id="R_kgDOPO07gg" data-category="General" data-category-id="DIC_kwDOPO07gs4CtV5I" data-mapping="title" data-strict="0" data-reactions-enabled="1" data-emit-metadata="0" data-input-position="bottom" data-theme="light" data-lang="en" crossorigin="anonymous" async="">
</script>

<noscript>Please enable JavaScript to view comments.</noscript>]]></content><author><name></name></author><category term="posts" /><category term="meta" /><summary type="html"><![CDATA[A report on our January 2026 all-hands meeting]]></summary></entry><entry><title type="html">In the Cloud</title><link href="https://diffuse.science/post/in_the_cloud/" rel="alternate" type="text/html" title="In the Cloud" /><published>2025-11-15T00:00:00+00:00</published><updated>2025-11-15T00:00:00+00:00</updated><id>https://diffuse.science/post/in_the_cloud</id><content type="html" xml:base="https://diffuse.science/post/in_the_cloud/"><![CDATA[<figure class="half ">
  
    
      <a href="/assets/images/posts/Clouds.jpg" title="Cloudy diffuse features in the sky">
          <img src="/assets/images/posts/Clouds.jpg" alt="Cloudy diffuse features in the sky" />
      </a>
    
  
    
      <a href="/assets/images/posts/DiffuseClouds.png" title="MD simulation of cloudy diffuse features">
          <img src="/assets/images/posts/DiffuseClouds.png" alt="MD simulation of cloudy diffuse features" />
      </a>
    
  
  
    <figcaption>(Left) Cloudy diffuse features in the sky. (Right) MD simulation of cloudy diffuse features.
</figcaption>
  
</figure>

<h2 id="diffuse-scattering-in-the-cloud">Diffuse Scattering in the Cloud</h2>

<p>While out on a walk, as I looked up at the sky, a certain cloud formation (above left) reminded me of the <em>l</em> = 0 slice through the MD simulation of Mac1 diffuse scattering (above right). That got me thinking about the next steps for the diffUSE MD simulations (which are, of course, being performed using <a href="https://www.voltagepark.com">cloud computing resources</a>).</p>

<p>As described in the <a href="/posts/allhands/">Quarterly All Hands Meeting</a> post, we recently shared our short-term plans for the various components of the diffUSE project. We’ve already performed baseline comparisons of crystalline MD simulations of Nsp3 macrodomain (Mac1) to diffuse scattering data (see the <a href="/post/3-2-1-contact/">3-2-1 Contact</a> post). Now we want to improve the models and see what happens in the simulations when we make changes. With this in mind, we’re planning to: (1) improve the current MD model of Mac1; (2) simulate Mac1 crystals under different conditions; and (3) develop a model of a new system.</p>

<p>Thinking about (2), I contacted James Fraser to chat about what to do for the next MD simulations of Mac1. We decided to look at Mac1 in complex with ADP-ribose. This choice is timely, as Kara Zielinski just collected diffUSE diffraction data from crystals of this complex at the Cornell High-Energy Synchrotron Source (CHESS), during a recent trip from UCSF to Nozomi Ando’s lab at the Cornell.</p>

<p>What will the MD simulation of diffuse scattering from crystals of Mac1 in complex with ADPr look like? Probably a lot like the ones we’ve done already, with some small changes. We’re planning to analyze the differences and find out what happens to the dynamics when different ligands bind. But we don’t really know yet what we’ll see. These moments of suspense are very common in science, but they’re absent from the stories we usually tell in the literature. The open science model we’re using on the diffUSE project enables us to document these periods of uncertainty as a part of the public narrative of the project. It feels kind of liberating.</p>

<hr />

<script src="https://giscus.app/client.js" data-repo="diff-use/diff-use.github.io" data-repo-id="R_kgDOPO07gg" data-category="General" data-category-id="DIC_kwDOPO07gs4CtV5I" data-mapping="title" data-strict="0" data-reactions-enabled="1" data-emit-metadata="0" data-input-position="bottom" data-theme="light" data-lang="en" crossorigin="anonymous" async="">
</script>

<noscript>Please enable JavaScript to view comments.</noscript>]]></content><author><name>Michael Wall</name><email>mewall00@gmail.com</email></author><category term="post" /><category term="diffuse scattering" /><category term="molecular dynamics" /><category term="clouds" /><category term="planning" /><category term="open science" /><category term="meta" /><summary type="html"><![CDATA[Next steps in diffUSE MD simulations]]></summary></entry><entry><title type="html">Quarterly All-Hands Meeting Summary</title><link href="https://diffuse.science/posts/allhands/" rel="alternate" type="text/html" title="Quarterly All-Hands Meeting Summary" /><published>2025-11-10T00:00:00+00:00</published><updated>2025-11-10T00:00:00+00:00</updated><id>https://diffuse.science/posts/allhands</id><content type="html" xml:base="https://diffuse.science/posts/allhands/"><![CDATA[<h2 id="why-this-quarter-mattered"><strong>Why this quarter mattered</strong></h2>

<p>We haven’t gathered all together since our June kick-off meeting, so in Mid October, we met (online) with all members of the diffUSE project to discuss our overall goals for each project team, progress made over the first few months, and goals for the next three months. We emphasized how the different pieces of the project integrate to build methods, data, models, and encodings so the community can routinely use diffuse scattering in basic biology and drug discovery.</p>

<h2 id="what-have-we-completed-in-the-first-few-months"><strong>What have we completed in the first few months?</strong></h2>

<h3 id="data-collection"><strong>Data collection:</strong></h3>

<ul>
  <li>We have collected ambient-temperature datasets from CHESS for <a href="https://diffuse.science/logbook/beamtime/20251008-chess/">lysozyme</a>, <a href="https://diffuse.science/logbook/beamtime/20251015-chess/">macrodomain</a>, <a href="https://diffuse.science/logbook/beamtime/20250924-chess/">NrdE</a>, and <a href="https://diffuse.science/logbook/beamtime/20251015-chess/">DNA fibers</a>.</li>
  <li>We have collected data at ALS on <a href="https://diffuse.science/logbook/beamtime/20250701-als/">Mac1</a> and <a href="https://diffuse.science/logbook/beamtime/20251015-17-als/">Huwe1</a> using humidity boxes and watershed sleeves with controlled transmission and beam size.</li>
  <li>We have played around with temperature modulation used to probe dose dependence and mosaicity effects with <a href="https://diffuse.science/diffuse-shipping/">samples shipped</a>] from UCSF to Cornell</li>
  <li>We implemented standardized background frames, uniform sleeve lengths, and precise humidity control to enhance map quality and cross-beamline comparability with an eye toward <a href="https://diffuse.science/posts/windows/">future</a> multi-site data collection campaigns.</li>
  <li>We have documented all collection procedures on the <a href="https://diffuse.science/logbook/beamtime/">diffUSE website logbooks</a>.</li>
</ul>

<h3 id="data-processing"><strong>Data processing:</strong></h3>

<ul>
  <li>
    <p>Developing xia2.multiplex for automated data merging, <em>mdx2</em> for data extraction, and comprehensive data quality control (QC) workflows.</p>
  </li>
  <li>
    <p>Building graphical user interfaces (GUIs) for <em>mdx2</em> to improve usability and accessibility.</p>
  </li>
  <li>
    <p><a href="https://diffuse.science/next-steps-macrodomain/">Identifying and resolving</a> bugs that arise when multiple users concurrently process the same datasets.</p>
  </li>
  <li>
    <p><a href="https://diffuse.science/posts/jax_refine/">Implementing differentiable refinement</a> by treating molecular dynamics (MD) frame weights as trainable parameters in a Pearson correlation–based objective function.</p>
  </li>
</ul>

<h3 id="machine-learning-modeling"><strong>Machine Learning Modeling:</strong></h3>

<ul>
  <li>Developing <a href="https://diffuse.science/posts/modeling/">pipeline scaffolds</a> to integrate experimental structural data directly into generative model training and evaluation.</li>
  <li>Creating quantitative metrics for assessing and benchmarking ensemble data.</li>
  <li>Building a generative water model that learns to predict water molecule positions from protein structure, designed for future integration into broader generative modeling frameworks.</li>
</ul>

<h3 id="simulations"><strong>Simulations:</strong></h3>

<ul>
  <li>We <a href="https://diffuse.science/post/3-2-1-contact/">simulated</a> a Mac1 2×2×2 supercell  with OPC3 waters and 279,004 atoms reaches 150 ns per day on Voltage Park. With refined masking and resampling, total CC is 0.96 and anisotropic CC is 0.56 on the H8 dataset, which sets a clear target for larger supercells and ligand or mutant comparisons.</li>
  <li>Taylor completed his rotation developing a <a href="https://diffuse.science/posts/diffuse_rotation/">simulator</a>.</li>
</ul>

<p><strong>Encoding:</strong></p>

<ul>
  <li>We continue to <a href="https://diffuse.science/posts/encoding/">advocate</a> for conformational and compositional heterogeneity-encoding strategies.</li>
  <li>We have developed a <a href="https://diffuse.science/posts/multi_to_ens/">script</a> to translate between encodings for multiconformer and ensemble representations.</li>
  <li>We are working on developing a standalone script and a COOT integration script for conformational heterogeneity.</li>
</ul>

<h3 id="infrastructure-and-open-science"><strong>Infrastructure and Open Science:</strong></h3>

<ul>
  <li>We have access to Voltage Park compute and S3 storage via the command line, which will make sharing maps and models easier.</li>
  <li><a href="https://diffuse.science/posts/">16 blog posts</a> and 6 beamtime <a href="https://diffuse.science/logbook/">logbooks</a> to date!</li>
</ul>

<h2 id="what-is-up-for-the-next-3-months"><strong>What is up for the next 3 months?</strong></h2>

<p><strong>Data Collection:</strong> Catalog all existing data, fill gaps, complete background series at ALS, finalize hardened collection procedures that travel across LBL and CHESS, and post collection reports on the site. Developing shared checklists to coordinate and standardize future data-collection cycles.</p>

<p><strong>Data Processing:</strong> Converge on a single, documented workflow, generate preliminary maps for all ALS and CHESS datasets, produce fine maps for GOODVIBES and DISCOBALL, stand up a CHESS 2026-1 pipeline, and publish processing reports on the site.</p>

<p><strong>Modeling:</strong> Ship an initial pipeline that accepts maps for guided sampling.</p>

<p><strong>Encoding:</strong> Land final working group approval, publish the schema and examples, and connect the web app to our catalog so processed maps, models, and metadata are searchable and shareable.</p>

<p><strong>Infrastructure and sharing science</strong>: More blog posts!</p>

<p>We will meet in the Bay Area in January and report back more after that.</p>]]></content><author><name>James Fraser</name><email>jfraser@fraserlab.com</email></author><category term="posts" /><category term="meta" /><summary type="html"><![CDATA[A report on our October 2025 all-hands meeting]]></summary></entry><entry><title type="html">3-2-1 Contact: Comparing MD simulations with diffuse data</title><link href="https://diffuse.science/post/3-2-1-contact/" rel="alternate" type="text/html" title="3-2-1 Contact: Comparing MD simulations with diffuse data" /><published>2025-10-20T00:00:00+00:00</published><updated>2025-10-20T00:00:00+00:00</updated><id>https://diffuse.science/post/3-2-1-contact</id><content type="html" xml:base="https://diffuse.science/post/3-2-1-contact/"><![CDATA[<h2 id="overview">Overview</h2>

<figure class="third ">
  
    
      <a href="/assets/images/posts/h8_md_view.png" title="Slice through the MD simulated diffuse map">
          <img src="/assets/images/posts/h8_md_view.png" alt="Slice through the MD simulated diffuse map" />
      </a>
    
  
    
      <a href="/assets/images/posts/h8_initial_processing.png" title="Slice through the diffuse data with initial processing">
          <img src="/assets/images/posts/h8_initial_processing.png" alt="Slice through the diffuse data with initial processing" />
      </a>
    
  
    
      <a href="/assets/images/posts/h8_final_processing.png" title="Slice through the diffuse data with final processing">
          <img src="/assets/images/posts/h8_final_processing.png" alt="Slice through the diffuse data with final processing" />
      </a>
    
  
  
    <figcaption>First diffUSE project comparisons of MD simulations with diffuse data. Slices through the anisotropic diffuse map are visualized in the <em>l</em> = 0 plane. (Left) MD simulation. (Center) Data with initial processing. (Right) Data with refined processing.
</figcaption>
  
</figure>

<h2 id="overview-1">Overview</h2>

<p>This post describes the <strong>first systematic comparisons</strong> between molecular dynamics (MD)–derived diffuse scattering and experimental measurements on the diffUSE project. 
These early analyses establish a baseline for evaluating how well MD simulations reproduce the observed isotropic and anisotropic features of the data. They also suggest possible improvements in the way we compare MD simulations to diffuse data.</p>

<hr />

<h2 id="initial-comparisons">Initial Comparisons</h2>

<p>Data from the first diffUSE experiments were used for comparisons (see <a href="/posts/wetfeet/">Getting our feet wet</a> post). The <strong>H8 dataset</strong> was used, which was obtained using a medium radiation dose (see <a href="https://diffuse.science/logbook/20250624-als831-macrodomain-analysis/">preliminary analysis of the diffuse scattering</a>).</p>

<p>The H8 dataset was originally sampled on a grid using <strong>2×2×4</strong> points per integer <em>hkl</em> value.  To align with the 2x2x2 supercell MD simulation output, it was resampled onto a <strong>2×2×2</strong> grid for direct comparison.</p>

<p>This initial processing yielded a correlation with the MD simulation of 0.88 for the total diffuse intensity, and a correlation of 0.32 for just the anisotropic component.</p>

<hr />

<h2 id="refinement-and-improvements">Refinement and Improvements</h2>

<p>The MD simulations were performed using 2x2x2 supercell, which is too small to include contributions from long-range correlations in the lattice. We therefore wondered whether the agreement of the MD with the diffuse data might be diminished in the immediate neighborhood of the Bragg peak, which is associated with lattice vibrations (<a href="https://doi.org/10.1038/s41467-023-36734-3">GOODVIBES</a> is specifically designed to accurately model this part of the signal). We also noticed certain outlier intensity values in the data (specifically, negative values), and wished to avoid using those in comparisons.</p>

<p>Another round of data processing was performed considering these ideas. First, Steve re-processed the diffraction images, sampling more finely onto a <strong>4x4x4</strong> grid. The intensities at integer <em>hkl</em>, correspinding to the immediate neighborhood of the Bragg peaks, were then masked out prior to downsampling to 2x2x2. Negative intensities were additionally masked out.</p>

<p>The revised processing improved the agreement with the MD substantially:</p>

<ul>
  <li><strong>Total correlation:</strong> 0.96</li>
  <li><strong>Anisotropic correlation:</strong> 0.56</li>
</ul>

<hr />

<h2 id="outlook">Outlook</h2>

<p>We now have our first assessment of the agreement of MD simulations with diffuse data collected on the diffUSE project. This is our baseline for improving the MD models. We also identified specific regions of the diffuse map – close to the Bragg peaks – where the MD might be currently lacking. These regions of the model might be improved by using a larger supercell for the MD.</p>

<hr />

<p><em>This post was initially drafted in ChatGPT based on a Slack exchange between Steve Meisburger and Michael Wall, and was rewritten and posted by Michael Wall on October 20, 2025.</em></p>

<script src="https://giscus.app/client.js" data-repo="diff-use/diff-use.github.io" data-repo-id="R_kgDOPO07gg" data-category="General" data-category-id="DIC_kwDOPO07gs4CtV5I" data-mapping="title" data-strict="0" data-reactions-enabled="1" data-emit-metadata="0" data-input-position="bottom" data-theme="light" data-lang="en" crossorigin="anonymous" async="">
</script>

<noscript>Please enable JavaScript to view comments.</noscript>]]></content><author><name>Michael Wall</name><email>mewall00@gmail.com</email></author><category term="post" /><category term="diffuse scattering" /><category term="molecular dynamics" /><category term="data processing" /><category term="mdx2" /><category term="meta" /><summary type="html"><![CDATA[Insights from the first comparisons of MD simulations to diffuse data]]></summary></entry><entry><title type="html">Ensemble &amp;lt;-&amp;gt; Multiconformer Model Conversion</title><link href="https://diffuse.science/posts/multi_to_ens/" rel="alternate" type="text/html" title="Ensemble &amp;lt;-&amp;gt; Multiconformer Model Conversion" /><published>2025-10-17T00:00:00+00:00</published><updated>2025-10-17T00:00:00+00:00</updated><id>https://diffuse.science/posts/multi_to_ens</id><content type="html" xml:base="https://diffuse.science/posts/multi_to_ens/"><![CDATA[<h2 id="representing-conformational-heterogeneity">Representing conformational heterogeneity</h2>
<p>Our structural biology techniques capture an enormous amount of conformational heterogeneity that is often lost in the transition from experimental data to deposited models. Part of this loss stems from a lack of sufficiently sophisticated algorithmic methods, which is an active area of development in this project and elsewhere. Still, an equally important factor is how we choose to encode structural heterogeneity in the models themselves.</p>

<p>In the majority of structures deposited in the Protein Data Bank (PDB), conformational heterogeneity is represented only in a harmonic sense, through atomic displacement parameters (B-factors) or translation–libration–screw (TLS) parameters. These parameters can be incorporated into a single structural model, describing the amplitude and anisotropy of atomic fluctuations around a mean position. However, they do not encode anharmonic or discrete conformational variability. To capture non-harmonic conformational heterogeneity, models have emerged that explicitly include multiple atomic coordinate sets. Two dominant strategies have emerged from X-ray crystallography and cryo-EM: multiconformer models and multi-model (ensemble) models[1].</p>

<p>A multiconformer model represents conformational diversity locally, without duplicating the entire macromolecule. When a region of the electron density is well described by a single conformation, only one set of coordinates with appropriate B-factors is modeled. When the density indicates discrete alternative conformations, such as side chain rotamers or backbone flips, the relevant atoms are copied and assigned alternate location (ALTLOC) identifiers in the PDB file. We have previously demonstrated that this modeling approach can yield substantial improvements in fitting to experimental data and also reduce geometric distortions and eliminate many rotamer outliers[2].</p>

<p>Multi-model approaches model heterogeneity by encoding multiple complete copies of the system, which can sometimes more effectively capture structural motions like backbone shifts. However, ensembles containing tens to hundreds of models can lead to a high parameter-to-data ratio. The most common ensemble models used in Bragg peak analysis today are time-averaged ensembles, generated by molecular dynamics simulations. These ensembles are restrained by time-averaged X-ray structure factors to produce a large number of models, often hundreds, each representing a snapshot from a single trajectory [3]. Further, crystalline MD simulations are currently the best model to describe the diffuse data[4].</p>

<p>Converting between the two representations is a non-trivial task, as the generation and data contained in the two model types are unique. However, there are many times when transferring between the two representations may be needed. For example, the primary methods we use to model and represent diffuse data are molecular dynamics, which generate ensemble models. However, as described by members of this project and others, this MD data has a poor correspondence with the Bragg peak data. However, currently, there are no approaches to further refine this MD model against Bragg peak data. One way we can represent and refine this against Bragg peaks is with multiconformer models, which are compatible with traditional refinement software and allow for manual manipulation.</p>

<h2 id="converting-multiconformer-to-ensemble-models">Converting multiconformer to ensemble models</h2>
<p>In general, there is an exponential number of ways to combine single-residue conformations. For this reason, enumerating all combinations becomes infeasible as soon as structures contain a higher number of residues with AltLoc conformations. Previously, Gutermuth et al proposed an algorithm to convert multiconformer models into ensemble models (AltLocEnumerator)[5]. This fast branch-and-bound algorithm to generate valid alternative protein structure conformations is described through AltLoc annotations. The algorithm searches for compatible residue conformations, maximizing the probabilities of conformational states by scoring the AltLoc occupancy values. 
We aimed to convert a qFit multiconformer PDB into an ensemble structure (PDB: 5iu1). While we attempted multiple options, including enumerating all, optimizing the occupancy score, and changing the number of models, all resulted in multiple models that were almost always overlapping (distribution of all heavy atom RMSD shown below).</p>

<p><img width="900" height="675" alt="rmsd_protein" src="https://github.com/user-attachments/assets/4e8e1c18-d566-49d7-90a0-bcc5e1df5f28" /></p>

<p>We then decided to make five different models (as qFit models make up to 5 alt locs per residue) using the option ‘atllocid’, which provided us with models that were separated with more realistic RMSD (distribution of all heavy atom RMSD shown below).</p>

<p>AltLocEnumerator –file 5i1u_final_qFit.pdb –altlocid A</p>

<p><img width="900" height="675" alt="5ilu_altloc_rmsd_protein" src="https://github.com/user-attachments/assets/fb2242ea-c4dc-421b-adb6-4ad2bc127e69" /></p>

<p>This command needs to be repeated for each altloc ID, and then the models should be concatendated with MODEL/ENDMODEL lines. Of note, this algorithm is not open source but available to academics for free.</p>

<h2 id="converting-ensemble-models-to-multiconformer">Converting ensemble models to multiconformer</h2>

<p>There was no existing tool to convert an ensemble model to a multiconformer model, prompting us to design one. We did this using <a href="https://github.com/ExcitedStates/qfit-3.0">qFit</a>. Our approach systematically collapses an ensemble by iterating over each residue across all models and clustering equivalent residues. Taking the first model as the reference, we assign each subsequent residue to an existing cluster if its RMSD to the cluster centroid is within 1 Å (default parameter). If the RMSD exceeds this threshold, a new conformation is created.  We then used the relabel function in qFit, which uses simulated annealing (SA) optimization of a Lennard-Jones potential to reassign altloc labels, ensuring that conformers of different residues/ligands have consistent altloc labels. Note that while we collapsed many conformers, many residues still have 26+ conformers, meaning these cannot be represented with the historic PDB format.</p>

<p>While we can create a multiconformer model, a few issues remain. 1) We do not have a correct occupancy for any residue (all residues are currently assigned occupancy of 0.50), 2) There may be issues in the geometry of backbone atoms due to removing conformations on a residue level. The other thing to note is that this is currently incredibly slow (~45 minutes for 400 residues with 70 models). While imperfect, the multiconformer models enable us to feed this into other algorithms, such as refinement or qFit.</p>

<p>This tool is available in the <a href="https://github.com/ExcitedStates/qfit-3.0">qFit repository</a>, calling multimodel_2_multiconformer.py, only requiring an input PDB.</p>

<h2 id="references">References</h2>

<ol>
  <li>Woldeyes RA, Sivak DA, Fraser JS. E pluribus unum, no more: from one crystal, many conformations. Curr Opin Struct Biol. 2014;28: 56–62.</li>
  <li>Wankowicz SA, Ravikumar A, Sharma S, Riley BT, Raju A, Flowers J, et al. Uncovering Protein Ensembles: Automated Multiconformer Model Building for X-ray Crystallography and Cryo-EM. bioRxiv. 2024. doi:10.1101/2023.06.28.546963</li>
  <li>Burnley BT, Afonine PV, Adams PD, Gros P. Modelling dynamics in protein crystal structures by ensemble refinement. Elife. 2012;1: e00311.</li>
  <li>Wall ME. Internal protein motions in molecular-dynamics simulations of Bragg and diffuse X-ray scattering. IUCrJ. 2018;5: 172–181.</li>
  <li>Gutermuth T, Sieg J, Stohn T, Rarey M. Modeling with Alternate Locations in X-ray Protein Structures. J Chem Inf Model. 2023;63: 2573–2585.</li>
</ol>]]></content><author><name>Stephanie Wankowicz</name><email>stephanie@wankowiczlab.com</email></author><category term="posts" /><category term="meta" /><summary type="html"><![CDATA[Converting between ensemble and multiconformer models]]></summary></entry><entry><title type="html">Optimizing Molecular Dynamics Weights with Machine Learning Tools</title><link href="https://diffuse.science/posts/jax_refine/" rel="alternate" type="text/html" title="Optimizing Molecular Dynamics Weights with Machine Learning Tools" /><published>2025-10-16T00:00:00+00:00</published><updated>2025-10-16T00:00:00+00:00</updated><id>https://diffuse.science/posts/jax_refine</id><content type="html" xml:base="https://diffuse.science/posts/jax_refine/"><![CDATA[<p>In our latest round of diffuse scattering experiments, we ran into an intriguing optimization problem that feels a lot like training a neural network.</p>

<hr />

<h2 id="the-scientific-setup">The Scientific Setup</h2>

<p>For each 3D pixel in reciprocal space (indexed by <strong>h</strong>), we have:</p>

<ul>
  <li><strong>Observed data</strong>, $y(h)$ from experiment</li>
  <li><strong>Predicted data</strong>, $x(h)$ computed from molecular dynamics (MD) trajectories</li>
</ul>

<p>We evaluate agreement using the <strong>Pearson correlation coefficient</strong>:</p>

\[\mathrm{CC} =
\frac{\langle x \cdot y \rangle - \langle x \rangle \langle y \rangle}
{\sqrt{(\langle x^2 \rangle - \langle x \rangle^2)(\langle y^2 \rangle - \langle y \rangle^2)}}\]

<p>Each prediction $x(h)$ is derived from <strong>structure factors</strong> $F(h, t)$ across time points in the MD simulation:</p>

\[x(h) = \langle F(h)^2 \rangle_t - \langle F(h) \rangle_t^2\]

<p>The goal is to assign <strong>weights</strong> $w(t)$ to each time point to maximize $\mathrm{CC}$:</p>

\[x'(h) = \sum_t w_t F(h,t)^2 - \left(\sum_t w_t F(h,t)\right)^2\]

<p>If we can find optimal weights, we can identify which regions of the trajectory best match experimental reality — potentially distinguishing “good” frames from those that detract from agreement.</p>

<hr />

<h2 id="community-brainstorming">Community Brainstorming</h2>

<p><strong>Steve</strong> suggested asking whether $\mathrm{CC}$ is the right target — perhaps a likelihood might better capture the physics.</p>

<p><strong>Karson Chrispens</strong> proposed leveraging machine learning frameworks like <strong>JAX</strong> or <strong>PyTorch</strong> to treat the weights as trainable parameters.<br />
By backpropagating through the Pearson correlation, an optimizer like Adam could efficiently learn the optimal weights.</p>

<p><strong>James Holton</strong> suspected this approach could outperform traditional non-linear least-squares optimization and shared example MTZ datasets for testing.</p>

<p><strong>Steve</strong> also mentioned using a <strong>genetic algorithm</strong> if the weights were binary ($0$ or $1$), though he acknowledged the continuous formulation might not have a unique minimum.</p>

<hr />

<h2 id="prototyping-the-optimizer">Prototyping the Optimizer</h2>

<p>Karson quickly implemented a JAX-based prototype using <strong>reciprocalspaceship</strong> for MTZ I/O and <strong>optax</strong> for optimization.<br />
The loss function was simply $-\mathrm{CC}$, and weights were constrained to $(0, 1)$ via a sigmoid transform.</p>

<p>When tested on toy datasets and real MTZ files, the optimizer:</p>

<ul>
  <li>Successfully recovered <strong>50:50</strong> weights for mixtures of two “ground-truth” structures</li>
  <li>Produced sensible intermediate values when one or both inputs were “wrong”</li>
  <li>Converged robustly from different initializations</li>
</ul>

<p>Example output for a ground-truth mixture:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
Final weights: [0.46, 0.54]
Final CC: 1.0000

</code></pre></div></div>

<p>And for mismatched data:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
Final weights: [0.76, 0.24]
Final CC: 0.78

</code></pre></div></div>

<hr />

<h2 id="discussion">Discussion</h2>

<p><strong>Marcus Collins</strong> noted that this approach resembles computing <strong>Boltzmann-like factors</strong> for each configuration and suggested PyTorch could be an equally good (and more common) platform.<br />
He also cautioned that Pearson $\mathrm{CC}$ may not be the optimal objective function.</p>

<p>Karson confirmed that JAX runs efficiently on GPUs and planned to scale the approach to larger datasets by stacking multiple MTZ files.</p>

<hr />

<h2 id="where-this-might-go-next">Where This Might Go Next</h2>

<p>This prototype demonstrates that <strong>gradient-based optimization</strong> can efficiently identify the contribution of different MD frames to observed diffuse scattering patterns.<br />
Future directions include:</p>

<ul>
  <li>Expanding to full MD trajectories with thousands of frames</li>
  <li>Experimenting with alternate objectives (e.g., likelihood, cross-entropy)</li>
  <li>Incorporating <strong>crystal symmetry</strong> and <strong>resolution weighting</strong></li>
  <li>Exploring physical interpretations of the resulting weights</li>
</ul>

<hr />

<h2 id="code-and-data">Code and Data</h2>

<p>Karson’s implementation, <code class="language-plaintext highlighter-rouge">pearson_target.py</code>, is available <a href="https://github.com/k-chrispens/simulation_timeseries_optim">here</a>, and the test MTZ data can be downloaded from<br />
<a href="http://bl831.als.lbl.gov/~jamesh/pickup/diffUSE_CC_opt_test.tgz">here</a>.</p>

<hr />

<p><strong>TL;DR:</strong><br />
By treating MD frame weights as trainable parameters in a differentiable Pearson correlation objective, we can use ML optimizers like Adam to rapidly identify which parts of a trajectory best explain experimental diffuse scattering — turning a brute-force search into a smooth, data-driven optimization problem.</p>]]></content><author><name>James Holton, with contributions from Karson Chrispens, Steve, and Marcus Collins</name></author><category term="posts" /><category term="diffuse scattering" /><category term="molecular dynamics" /><category term="optimization" /><category term="machine learning" /><summary type="html"><![CDATA[Using gradient-based optimization to identify the most physically relevant portions of MD trajectories by maximizing agreement with diffuse scattering data.]]></summary></entry><entry><title type="html">Taylor’s Diffuse Rotation</title><link href="https://diffuse.science/posts/diffuse_rotation/" rel="alternate" type="text/html" title="Taylor’s Diffuse Rotation" /><published>2025-09-24T00:00:00+00:00</published><updated>2025-09-24T00:00:00+00:00</updated><id>https://diffuse.science/posts/diffuse_rotation</id><content type="html" xml:base="https://diffuse.science/posts/diffuse_rotation/"><![CDATA[<h2 id="an-atypical-phd-start-for-an-atypical-student-in-biophysics">An Atypical PhD Start for an Atypical Student in Biophysics</h2>

<p>I walked into Jaime Fraser’s office on a Friday in June, shortly before my early summer rotation at UCSF was about to start. As a physicist by training, I found his work on conformational entropy in proteins fascinating. But he quickly turned our conversation towards a topic I had never heard of before: diffuse X-Ray crystallography and somehow using it to solve protein ensembles and dynamics. “We’re kicking off the initiative at a conference on Monday, you should come!”</p>

<p>Come I did. While I was very lost on the specifics of most of the science during those two days, I could sense the excitement. Whole labs had traveled from across the country to attend as well as esteemed scientists from more than one national laboratory. They were proposing a new way to do science, more open, more flexible, and faster than we have come to expect from the traditional academy. So I did what any 1st-year PhD student would given the opportunity to join such a project, I took it and ran with it.</p>

<p>There was one small problem with joining a structural biology lab on the pretense of innovating in the theory of X-Ray crystallography during a summer rotation: I hadn’t touched diffraction since my bachelor’s degree. I too was coming off the heels of a master’s degree at Utrecht University and a follow-up internship at the University of Vienna both in theoretical physics. So I started at the beginning.</p>

<p>I was already reading the basics of Thomson scattering of a free electron before the end of day 2 at the diffUSE kick-off conference. “OH, it’s just the oscillation of the electron under electromagnetic radiation that emits scattered rays of the same frequency”. I’m not sure why but I always expected there to be a lot more quantum mechanics involved in scattering. As is often the case, the classical understanding of even very small systems is very much good enough.</p>

<p>The beginning of my rotation was nose to the grind stone in the form of review papers and textbooks, including some from the middle of the last century. I needed to understand classical structural X-Ray crystallography before I even knew how to ask Jaime what diffuse X-Ray crystallography is. Finally, after a few weeks, I knew what a Bragg peak and a structure factor were, and I had actually worked through the derivation of what I consider the central result of crystallographic theory: the electron density of the unit cell (and therefore the molecular structure of the contents of the unit cell) is the Fourier Transform of the scattered structure factors in diffraction space (and vice versa!) So what about diffuse?</p>

<p><img src="/assets/images/posts/structure_scattering.png" alt="Structural crystallography" title="Easy " />
<em>Example scattering image (right) from a perfect crystal (left) with 6-atom hexagonal molecules at each lattice site. Structural information is contained in the sharp reflections (referred to as Bragg Peaks) of the scattered X-Rays. The scattering imafe is devoid of diffuse scattering. Fig. 5.5, Blundell and Johnson (1976).</em></p>

<h2 id="diffuse-x-ray-scattering">diffUSE X-Ray Scattering</h2>

<p>Now this central tenant is only 100% correct if each unit cell of a crystal is identical, but as every structural biologist knows, it is nigh impossible for a molecule as complicated as a protein to crystallize into the exact same form in each and every unit cell throughout a crystal. Besides the possibility of ligand-bound complexes, the protein itself can be in subtlely different folds between each unit cell. These are known as the protein’s conformers, and they result from small adjustments to a protein’s structure like bond rotations. All of the conformers available to a protein are known as its ensemble.</p>

<p>So do Bragg peaks still give you the protein structure when all different kinds of conformers might be present in a protein crystal? Yes and no - the structure you get from analyzing the scattering pattern is the averaged structure due to the averaged structure factors between all the unit cells. Yet the signal is not actually the same as if every unit cell were replaced with this averaged structure; the fact that the crystal is composed of different conformers is encoded in the scattering image as an unphased diffuse signal between the Bragg peaks.</p>

<p><img src="/assets/images/posts/insulin_diffuse.png" alt="Insulin" />
<em>X-Ray scattering image of insulin. The signal between the sharp Bragg peaks is the diffuse scattering. The isotropic ring is due to water’s presence in the crystal as a solvent, but non-isotropic diffuse signal is due in part to insulin’s conformational ensemble. Meisberger and Ando (2023).</em></p>

<p>The dominant term contributing to this diffuse signal is known as Guinier’s Equation:</p>

\[I_{\text{diffuse}}=I_{avg}-|F_{avg}|^2\]

<p>$I$ is the intensity reading on the detector and $F$ is the complex structure factor. These values are averaged (a phased average in the case of the structure factor) over each and every unit cell at each spot on the X-Ray detector. If each unit cell is truly identical, then the average intensity is the same as the square of the average structure factor and the diffuse signal is exactly zero. Now if you have ever learned that the intensity is the complex square of the structure factor, then you can see Guinier’s Equation in a new light (transitioning to the angle brackets notation for ensemble averages):</p>

\[I_{\text{diffuse}}=&lt;|F|^2&gt;-|&lt;F&gt;|^2\]

<p>The diffuse signal is really the statistical variance in the structure factor over every unit cell. James Holton, an expert beamline scientist at the Lawrence Berkeley National Laboratory, has a great interpretation of this variance <a href="https://bl831.als.lbl.gov/~jamesh/diffuse_scatter/">up on his blog</a>: the diffuse signal allows you to differentiate between correlated motions of the crystal’s unit cells. Now if we determine the correlated motions between the various conformers of a protein, <strong>we are really solving for the available dynamics of that protein!</strong> This is what makes <em>diffuse</em> protein X-Ray crystallography a big deal, and what the diffUSE initiative aims to accomplish. Much of protein behavior requires dynamics: catalysis, bond formation, active transport - solving the diffuse problem will allow for a whole new insight into the correlated dynamics that allow proteins to perform their functions.</p>

<p><img src="/assets/images/posts/ATPase_conformers.jpg" alt="" />
<em>Two conformational states of the F1-ATPase complex. This energy-providing molecular machine found in the mitochondria is a famous example of motion-induced function in molecular biology. Sobti, M. et al.(2024).</em></p>

<h2 id="so-what-did-i-actually-do">So what did I actually do?</h2>

<p>After learning the essential background theory, I still had to actually do something to show for the rotation! Innovating on a theory or model of physical phenomena is hardwork, but we are lucky as 21st century scientists. We can use our ample computational resources to simulate the results of an experiment that are predicted from a proposed model. We then compare these simulations with real experimental observations, if the the simulations and the observations look the same, your model likely has explanatory power. If the comparison is not very similar, then the model needs some work.</p>

<p>So this was the guiding question of my rotation: Can we simulate diffuse scattering from an ensemble of protein conformers and process it as if it were real data? If so, the simulated data can be compared to real data to see how well our current models of diffuse scattering fit the real thing (recall that I said Guinier’s Equation is the <em>dominant</em> term in the diffuse signal merely due to the presence of distinct conformers, but other terms become not insignificant when there are correlation lengths of and between certain conformers).</p>

<p>James Holton was one of my main scientific mentors for this project, and he had already spent years developing extremely accurate, realistic simulations of diffraction datasets. So my initial task was to understand his simulation scripts and help modify them to account for diffuse scattering due to multiconformer crystals. In order to do so, I needed to familiarize myself with a whole suite of computational crystallographic tools that had been communally developed for decades (CCP4 and Phenix chief among them). I also needed to learn how to read (and write a few lines of!) tcsh command line scripts, James H.’s preferred scripting language. A totally new language to me, but the good news about command line scripting is that there are only so many layers of complexity that can be associated with each line of code. James helped me get set up to work on the LBL Beamline cluster, and after some googling (as well as LLM queries), I was able to understand what I needed to.</p>

<p>James had already scripted the computation of the diffuse scattering from the presence of two conformers, but we wanted to see what the diffuse signal looks like for whole ensembles of proteins, so we modified the script to account for more than just two. That meant parallelization to keep the computation time reasonable. After some great mentorship from James and the reverse-engineering of some of his other parallelized programs, we came up with a working script.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>

<p>Next step was to process these simulated datasets and see if we could actually extract the diffuse signal from them. In order to do so, I needed to utilize another specially designed software tool, mdx2 - a diffuse scattering signal extractor and processor developed by Steve Meisberger and Nozomi Ando at Cornell University. I recall asking Steve if there was some tutorial that I could access, and he pointed me to an awesome paper written by him and Nozomi that not only linked to a github of step-by-step jupyter tutorials, but also a detailed guide on how to install the necessary Python packages to run mdx2 right in the paper itself! “Whoa!”, I thought. “So that’s what open science is really about.” I was able to get it working with our datasets, and managed to extract the <a href="https://diffuse.science/posts/davinci/">Davinci Dude pattern</a> from one of our simulated diffraction sets as a proof of concept.</p>

<p><img src="/assets/images/posts/Davinci_real_rec.png" alt="" />
<em>Fully simulated diffraction image (left) of the two conformers of 1aho which generate the “Davinci Dude” diffuse scattering featured in James H.’s post. A slice of the extracted diffuse signal in reciprocal space (right) was generated by processing a dataset of these diffraction images using mdx2.</em></p>

<p>We moved on to proper ensemble modelling with the SARS-CoV-2 NSP3 macrodomain in its 7kqo P43 crystal form. Using the <code class="language-plaintext highlighter-rouge">ensemble_refinement</code> function from Phenix, we created a 200-conformer pseudoensemble (James H. ran the refinement for 3 days to get it), generated an associated diffraction dataset, and processed it with mdx2 to extract the diffuse signal.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>

<p><img src="/assets/images/posts/7kqo_3models.png" alt="" />
<em>Three representative conformers of the 200-conformer pseudoensemble of 7kqo we used to generate a diffraction image dataset. One is colored green, red, and tan respectively.</em></p>

<p><img src="/assets/images/posts/7kqo_series.png" alt="" />
<em>Simulated diffuse scattering (left). Full diffraction image (center). Slice of the extracted diffuse signal in reciprocal space (right). All from the 200-conformer pseudoensemble of 7kqo.</em></p>

<p>In essense, the result of my summer project is a functional pipeline for simulating the diffuse X-Ray scattering of a multiconformer protein ensemble and subsequent processing of it through mdx2. These simulated signals are now ready to be compared to real signals, and the model of diffuse scattering incremented on. Sadly, my involvement in the story ends here as I need to move on to my next rotation. I’m very grateful for getting to do such awesome science and learning so much under the guidance of James Holton and Jaime Fraser! I would also like to thank the IMSD program and the diffUSE initiative for making this work possible and for helping me get off to a great start at UCSF!</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p><a href="https://github.com/jmholton/altloc_diffuse">Github repo</a> with the scripts used to generate the datasets. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><code class="language-plaintext highlighter-rouge">phenix.ensemble_refinement</code> was co-developed between Lawrence Berkeley Nat’l Lab and B. Burnley and Piet Gros from Utrecht University! <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>twomack</name></author><category term="posts" /><category term="meta" /><summary type="html"><![CDATA[What Taylor got done during his rotation.]]></summary></entry><entry><title type="html">Sample Shipping</title><link href="https://diffuse.science/diffuse-shipping/" rel="alternate" type="text/html" title="Sample Shipping" /><published>2025-09-10T00:00:00+00:00</published><updated>2025-09-10T00:00:00+00:00</updated><id>https://diffuse.science/diffuse-shipping</id><content type="html" xml:base="https://diffuse.science/diffuse-shipping/"><![CDATA[<p>One of the goals of the Diffuse Project is to collect diffuse scattering from many different samples. To support this goal, the Fraser Lab conducted a thorough inventory of our freezers to identify all available proteins that can be readily crystallized. In addition, we shipped some pre-grown crystals to evaluate their ability to withstand shipping conditions. In total, five different samples, SARS-CoV-2 Mac1, CHIKV Nsp3 WT, CHIKV Nsp3 D31N, FABLE, and Huwe1, were sent to the Ando Lab to ensure that high-quality crystals are available in preparation for future beamtime experiments.</p>

<h2 id="shipping-method">Shipping Method</h2>

<p>Two types of shipments were prepared. The first was a dry ice box containing four samples stored frozen in solution. For one of these samples, we also included a frozen seed stock essential for crystallization. The second shipment utilized a Crystal Positioning Systems
<a href="https://www.crystalpositioningsystems.com/product/thermal-shipper/">thermal shipper</a>, which maintains a consistent temperature at 20 degrees Celsius. In this box, we provided stocks of protein buffers and precipitant components to maximize reproducibility. Additionally, we included a 96-well crystallization plate containing pre-grown Huwe1 crystals of the fifth sample. We utilized foam inserts to hold the plate for safe transport.</p>

<p>By shipping both frozen proteins and a crystallization plate, we aim to determine which shipping method is most effective for sample preparation and crystal quality.</p>

<h2 id="looking-ahead">Looking Ahead</h2>

<p>The next steps involve the Ando Lab assessing the quality of the shipped crystals to determine whether crystallization plates are a feasible option for sample transport. The Ando Lab will also use the shipped protein to grow new crystals. While we anticipate crystallization to be largely reproducible, some optimization may be necessary to produce crystals of sufficient size for diffuse scattering experiments, especially if plate formats and drop sizes differ from our established methods.</p>

<p>The Fraser Lab remains available to provide troubleshooting support and to ship additional protein as needed. Once we establish confidence in our shipping methods and crystal growth reproducibility, we will also prepare shipments of ligands to conduct soaking experiments.</p>

<script src="https://giscus.app/client.js" data-repo="diff-use/diff-use.github.io" data-repo-id="R_kgDOPO07gg" data-category="General" data-category-id="DIC_kwDOPO07gs4CtV5I" data-mapping="title" data-strict="0" data-reactions-enabled="1" data-emit-metadata="0" data-input-position="bottom" data-theme="light" data-lang="en" crossorigin="anonymous" async="">
</script>

<noscript>Please enable JavaScript to view comments.</noscript>]]></content><author><name>Kara Zielinski</name><email>kara.zielinski@ucsf.edu</email></author><category term="meta" /><summary type="html"><![CDATA[Sample preparation for future beamtimes]]></summary></entry><entry><title type="html">The Barrier of Barrier Materials</title><link href="https://diffuse.science/posts/windows/" rel="alternate" type="text/html" title="The Barrier of Barrier Materials" /><published>2025-09-10T00:00:00+00:00</published><updated>2025-09-10T00:00:00+00:00</updated><id>https://diffuse.science/posts/windows</id><content type="html" xml:base="https://diffuse.science/posts/windows/"><![CDATA[<p><img src="/assets/images/posts/crowfoot.png" alt="Crowfoot" title="Crowfoot Cell" /></p>

<p>Protein crystals are wet and tiny. Because of this, they dry out in air very fast, and this drying out crushes the lattice and ruins the diffraction. The first person to realize this was Dorothy Crowfoot (who later won the Nobel Prize under her married name: Dorothy Hodgkin). Pictured above is the sample holder she used. Glass flake is an excellent moisture barrier, but it is hard to work with and also glass has relatively high scattering and absorption of X-rays. This scattering makes X-ray background on the detector that can easily bury the faint diffuse-scatter signal from the protein crystal that we are trying to measure. Many different ways of mounting protein crystals have been developed over the years, and each has its compromises. For diffUSE, we will need to really push this technology to make high quality diffuse scatter data a routine undertaking.</p>

<p>The central challenge of window materials for protein crystals is not just the X-ray background, but the permeability. In general, thinner stuff gives you less background, but also dries out the crystals faster. Permeability of a thin film is hard to measure, but the potato chip industry has already done a lot of work on this (Rob Thorne says to google: “barrier materials”). Nearly all polymer-plastic based stuff has the same compromise because plastic is held together by van der Waals forces, and these easily open up to let gas molecules through via thermal motions. A “pile of worms” on the molecular level.  Covalent solids, like soda lime glass, are ~1e6x less permeable, but with ~10x more scattering/absorption per unit thickness.  I have looked for large sheets of 1-micron thick glass, but nobody seems to make them. You can get broken glass of this thickness and it is very cheap: “glass flake” is a paint additive. I have some, but it is hard to work with.</p>

<p>Recent results in the serial crystallography world have had good results with “SOS” chips, which use 2.5 µm thick Etnom foil (Chemplex). This will have very low background, but variability between crystals has been observed and is still being worked out.</p>

<p>You can also get very low permeability with metal foils. Metallized mylar is a popular barrier material, and you only need nanometers of metal to form a good barrier. Downside is you can’t see through it with optical microscopes, but if you are doing blind, grid-scanning serial data collection then it doesn’t matter anyway.  I, however, would like to do the data collection a bit smarter and try to center the crystals in the beam.  But that is my bias.</p>

<p>From the X-ray point of view, the background level is very predictable. This is because the cross sections of scattering and absorption depend only on the element, so all you need to do is calculate how many atoms are in the beam and you got your answer. Materials that have been rolled, stretched or pulled will have some orientation bias and therefore fiber diffraction. But, all that does is move the scattered photons around in reciprocal space, the number of scattered photons depends on the elemental composition and little else.</p>

<p>Experience at Cornell has settled on “free mounting” as a solution, but I have found this is not broadly general. About half of all crystals in my experience do not like being exposed to surface tension. The “HARE” chip solution from T-REXX at Hamburg is attractive because the silicon substrate makes a good scatterguard, but you need to surround it with a “rain forest” humidity container. That said, the T-REXX folks have been optimizing it for years. I need to follow up with David von Stetten to talk about the details.</p>

<p>Something else I learned at the most recent Serial Workshop was that people are having good luck with agarose as a carrier for fragile crystals (rubisco). That made me realize: we want whatever is touching the crystals to be softer than the crystals themselves. And I mean “softer” in the Mohs hardness scale kind of way. We don’t want the mount to “scratch” the protein. Agarose also has the advantage that it is easily cast, and cast materials tend to have no fiber diffraction.  The trick will be getting it thin and manipulable in a reproducible way.</p>

<p>I want to stress: hardness of the window material is going to be really important. Consider that you not only want the window to be thin, but close to all the crystals.  This is because any gap between the windows and the crystal will no doubt be filled with liquid, and that will contribute to the background. Crystals never all grow in exactly the same size and shape, so there is always going to be a compromise between the size of the crystal-window gap for the smaller crystals, and how many of the big crystals are going to be in contact with the windows. If the windows are harder than the crystal, then the crystal “loses”. It gets smooshed. And those were all your biggest crystals.</p>

<p>If we have a layer of soft, amorphous material coating hard windows, then it will move out of the way as the crystal is pressed into it, filling the gaps between them. In this situation the overall thickness of the material in the beam will be fixed from shot to shot, with the only variability being the fraction of the window-window gap that is filled with crystal vs filled with gel. We could then calculate exactly how much gel is in the beam and quantify its contribution to background. I predict this will be a very valuable feature.</p>

<p>Single-crystal windows I haven’t looked at in a while, but I think the SAXS people are the best ones to turn to for such things.  We learned early on that diamond windows suck.  Too much diffuse scatter from defects in the lattice.  They can make “single” crystal diamond, but the mosaicity is pretty poor and the grain boundaries make DS.  Beryllium is better.  In fact, you want to use etched beryllium.  This is because when they roll the Be foil you get tiny rocks and bits of the rollers stuck in the surface. These pits and defects make diffuse scatter. If you dissolve the surface away in acid it makes for a lower background window.  These days, of course, they coat all the Be foils with plastic. The plastic makes it a crappier window, but it is more acceptable to our friends in charge of chemical safety.</p>

<p>That said, mica has met with a lot of success as a SAXS window.  Ilme Schlichting described to me how to thin it down by pulling off layers with Scotch tape.  You look for a color change that indicates it is only a few hundred nm thick.  Then it is a good window.  Just fragile.</p>

<p>I also have to admit it has been a while since we looked into these things.  It may well be that single-crystal diamond has improved over the years.  The only way to find out, however, is to buy some and try it. The companies have no idea, but last I checked were interested in the results.</p>

<p>Also, the ultimate thin covalent solid window is graphene.  Jeney Wierman tells me she is still traumatized from working with it in grad school, but something she said that stuck with me is that a 1 micron backing material is fine. That also means, I think, that you don’t have to have just one layer of graphene. Ten would probably be ok.  Martin Caffrey experimented years ago with graphene-impregnated plastics, but I think he went back to glass.  Still. A graphene filled plastic that you spin cast and then UV-set should have a uniform background, and could be very very thin and water tight.  You can get “bulk” graphene pretty cheap these days.  Something I’d like to test.</p>

<p>@Steve found a paper comparing 10 µm quartz to various polymers and glasses: https://pmc.ncbi.nlm.nih.gov/articles/PMC6057835/. “Looks promising, provided you avoid hitting a Bragg condition of the quartz.</p>

<p>This reminds me of a big caveat that I’m not sure this work addressed: self-absorption. If all you do is look at the amount of background on the detector, then depleted uranium will look like a good window material: no photons on the detector at all.</p>

<p>Another important principle to remembner is that the scattering cross section of an atom is a fixed quantity. It is independent of the structure of the material it is in: crystal, amorphous, gas, or otherwise. A given number of oxygen atoms in the beam scatters a knowable number of photons. The only thing the structure of the material does is push those photons around on the detector.  So, at the end of the day, the only way a window can be “X-ray transparent” is to be thin. And light.  And it would be great if it is also cheap and easy to work with.</p>

<p>Looking forward to everyone else’s thoughts and to learning how to do this right!</p>

<script src="https://giscus.app/client.js" data-repo="diff-use/diff-use.github.io" data-repo-id="R_kgDOPO07gg" data-category="General" data-category-id="DIC_kwDOPO07gs4CtV5I" data-mapping="title" data-strict="0" data-reactions-enabled="1" data-emit-metadata="0" data-input-position="bottom" data-theme="light" data-lang="en" crossorigin="anonymous" async="">
</script>

<noscript>Please enable JavaScript to view comments.</noscript>]]></content><author><name>James Holton</name><email>jmholton@lbl.gov</email></author><category term="posts" /><category term="meta" /><summary type="html"><![CDATA[It is hard to have happy crystals and low background at the same time]]></summary></entry></feed>