Software for accelerating molecular systems engineering

Summary

Astera is in the early stages of incubating a project to develop (open-source) software to accelerate molecular systems engineering

Today, molecular design software is fragmented and lends itself to analysis of small isolated molecules rather than synthesis of multi-scale, composite, functional nanosystems. This situation holds back progress towards more advanced forms of nanotechnology. The project we are incubating would aim to change that. 

Today, chemists have software to analyze small molecules and reactions in quantum detail, while protein and DNA nanostructure designers have specialized design tools, and other specialized tools exist for emerging molecular building blocks like spiroligomers. Non-quantum molecular dynamics simulations are possible but cumbersome to set up and don’t readily support design. The barrier to entry is very large for people who would want to design and test, or simply envision, future molecular nano-systems, which would combine many of these elements in their functioning. We don’t have anything analogous to AutoCAD or SolidWorks for “general” molecular systems engineering and atomically precise nano-machine design. Fortunately, making a much more general and useful molecular systems engineering CAD tool doesn’t require any research breakthroughs, just good software engineering.

The goal of this project is to enable the next generation of molecular machine design. This includes near term work combining DNA, proteins, specialized bio-conjugation chemistries and other building blocks as platforms for further development of nanoscale, atomically precise molecular structures. It also should include improved support for work looking at scanning probe based pathways to positional chemistry, and for more speculative and theoretical work that might envision nanosystems we can’t yet actually build in any way

The software to support this will have to focus on integrating many existing modeling tools in the context of design. It will have to specifically enable designing composite nanosystems that combine different kinds of building blocks at different scales for different functions, like structural scaffolds, mechanical positioners, reversible or allosteric binders, and precise Angstrom-scale covalent chemistry. Only by doing so might one make faster progress towards the still-preliminary vision of “atomically precise productive nanosystems” that was laid out in roadmaps such as this one from Battelle in 2007.

What if you wanted to design a new kind of artificial ribosome that could leverage mechanical placement to synthesize new kinds of molecular chains, or even a molecular 3D printer that could act like a 2D or 3D version of what the ribosome does for 1D polymers (steps toward “positional chemistry”)? There isn’t any unified software that pulls from quantum chemistry, molecular dynamics, DNA and protein design, and other modules as needed, and puts them together in an easy to use interface that supports rapid design iteration. 

This project deliberately strikes out in new design oriented directions motivated by a departure from the mainstream in chemistry and biology and towards ambitious visions of new nanotechnology systems. At the same time, it aims to create robust and accessible software that will be useful for practitioners in existing fields like DNA and protein nanotechnology, metal organic frameworks, synthetic chemical molecular machines (which already won a Nobel Prize in 2016), and other fields. The most similar effort in the past was Nanoengineer-1 nearly 15 years ago, but it didn’t get that far before shutting down after the 2008 financial market crash, and it has been mostly superseded in the DNA nanotechnology field by narrower software developed directly by experimental practitioners in that field such as caDNAno. Meanwhile, there are commercial suites that allow both drawing molecules or crystals and calculating various classical and quantum properties and relaxations, but this closed source approach comes with restrictions on which kinds of modules and frameworks can be used. 

The launch of this ambitious, forward-looking project will first require finding a leader for the role described below.

Astera is a non-profit operating foundation dedicated to finding and developing high-leverage technologies that can lead to massive returns for humanity.

More formal sketch of the project:

Accelerating Progress in Molecular Systems Engineering Through Software Tools

The absence of effective software support for molecular systems engineering (MSE)1 has delayed transformative physical and medical technologies2, but development of an effective MSE platform would remove this barrier to progress. Although modern computational chemistry software provides adequate tools for modeling complex molecular systems, current tool-sets lack broad support for design, hence progress in MSE lags far behind its potential — even forward-looking experimentalists don’t know what to build next. A well-funded open-source platform that combines existing modeling software with extensible sets of design tools would open the door to swift progress.

 

Current software is inadequate for MSE

Currently available computational chemistry software supports molecular modeling and visualization, not molecular systems engineering, and users are expected to be computational chemists, not design engineers. The lack of adequate, accessible tools has impeded the development of MSE as a field.

Why isn’t science enough? Imagine that we could study ancient architecture, but design only by imitation. “Architectural science” might thrive, but architecture itself would stagnate. Studying and imitating ancient biomolecules is likewise no substitute for creative, clean-sheet molecular engineering.3 Like architecture, MSE will require tools for sketching, analyzing, and refining novel designs.

 

Current software can support modeling for MSE

Engineering requires iterative cycles of design and modeling;4 at a systems level, molecular design tasks mirror their macroscale counterparts, while molecular modeling is what scientific software already does. The hard work of atomistic modeling — molecular dynamics (MD), quantum chemistry (QC) — has already been done, and the necessary capabilities are available in free or open source software. Modeling developed for science provides ample support for design.

 

Potential design tools extend from basic to advanced

A basic yet powerful set of readily implemented operations:

  • Sketch and edit 3D shapes to guide fine-grained, atomistic design.
  • Create, edit, and manipulate objects with atomistic structures.
  • Fit objects to shapes while respecting atomistic constraints.
  • Autofill atomistic structures to bridge gaps and link components.
  • Perform MD simulations with design-specific forces and constraints.
  • Apply QC methods to model mechanically guided chemical transformations.
  • Check results for violations of design and model-validity constraints.

Examples of extended functionality of varying scope and difficulty:

  • Basic operations (above) applied to accessible protein building blocks.
  • Multi-scale, multi-method modeling (QC, MD, and coarse-grained models).
  • Functional modeling at the level of components and operations.
  • Development and use of descriptive ML models to speed computation.5
  • Development and use of generative ML models to automate design.6
  • Engineering workflows transposed into the molecular domain.

 

Development can support (and draw support from) science and education

Computational chemistry software is typically difficult to use,7 limiting use by both students and scientists. By its very nature, an easily accessible MSE platform will provide students and scientists with easy access to modeling tools; by the same token, contributions by and for scientists will extend the toolset for engineering. And, of course, today’s students will become tomorrow’s engineers.

 

Sketch of a project

An MSE platform development project can and should:

  • Follow precedents for successful, well-funded open-source projects.
  • Build on general-purpose tools for modeling and visualization.8
  • Provide transparent access to existing physical modeling engines.9
  • Provide tools for describing, refining, and testing designs.
  • Provide plugin interfaces for contributed design and modeling tools.
  • Invite use by both novices and experts.

Phase 0: Define objectives

  • Consult domain experts and potential users re. desired capabilities.
  • Establish criteria and metrics for core and extended functionality.
  • Consult experienced developers re. development strategies.

Phase 1: Develop high-quality foundations

  • Hire a project leader and architect and build a development team.
  • Implement core functionality in an open framework.

Phase 2: Work with an initial user/contributor community

  • Invite early users and contributors (there is ample interest).
  • Refine and extend core functionality in response to user feedback.

Phase 3: Expand the range of users and applications

  • Promote adoption of the platform for scientific applications.
  • Promote development of MSE expertise and capabilities.
  • Promote gamification of MSE design tasks (see Appendix).

 

Experienced software architects and engineers will need to be recruited, who can develop a high-level software architecture that can:

  • Link an open-ended range of currently available molecular modeling tools to support interoperable workflows.
  • Apply good software engineering principles to create a robust and extensible system.
  • Enable rapid design and analysis of novel molecular systems, some of which are accessible to experimental implementation.
  • Lead the creation of a smooth, unified user interface for molecular system modeling and design, built on state of the art game engines, and design to provide a fast onramp for non-expert users of tools for  molecular modeling and design — this interface must be dramatically more intuitive, usable, unified and fluid than those currently available to the scientific community, and must enable both visualization and design.
  • Create delightful user experiences on top of complex software back-ends.
  • Provide access to complex, bespoke open-source scientific software developed by computational chemists, making these tools useful to both novice and expert users.
  • Create an infrastructure allowing scripting and the flexible creation and integration of new plugins by external parties.

 

APPENDIX:  Gamifying MSE

To identify paths toward transformative applications — both long-term aims and intermediate goals —  it will be necessary to explore designs that are physically realistic, but not yet implementable. This kind of work is beyond the scope of experimental science and practical engineering, but within the scope of curiosity-driven exploration, which is to say, “play”.

Gamification can build directly on core MSE functionality and boost the development of MSE as a field. Successfully gamified tasks in Minecraft and FoldIt! have strong parallels with MSE.

 

Designing protein-based systems overlaps with tasks in “FoldIt!”

FoldIt! is a competitive,10 gamified protein-folding application that combines computational modeling with human creativity. Structure predictions by FoldIt! citizen-science teams have earned a place in the scientific literature.11

Foldit screenshot illustrating tools and visualizations.

An MSE platform that incorporates existing tools for protein design could provide FoldIt!-like functionality for MSE.

 


 

Footnotes:

1:  Molecular systems engineering, in the sense used here, involves the design (and potentially fabrication) of functional, multi-component molecular structures, e.g., molecular machines. Return

2: E.g., in medicine, artificial exosomes that employ sensors and computation to guide biological targeting and interventions at the cellular level; in physical technology, steps toward scalable, high-throughput atomically precise manufacturing. Return

3: Advances in software for biomolecular engineering (proteins, DNA) are impressive, but narrow. Return

4: Design proposes structures; modeling describes how they will perform, guiding next steps in design. Return

5: Emerging machine learning (ML) methods can closely approximate QC results with speedups of multiple orders of magnitude (e.g., see Hu, Weihua, et al. “Forcenet: A graph neural network for large-scale quantum calculations.arXiv preprint arXiv:2103.01436 (2021), and Schütt, K. T., et al. “Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions.Nature communications 10.1 (2019): 1-10). An open, well-architected MSE platform could facilitate the development and rapid adoption of ML methods. Return

6: E.g., Wu, Zachary, et al. “Protein sequence design with deep generative models.” Current Opinion in Chemical Biology 65 (2021): 18-27. Return

7: E.g., ASCII file and command line interfaces are common. Return

8: E.g., Game engines can provide state-of-the-art graphical interfaces for both design and interactive visualization of 3D systems, and have been applied to similar scientific visualization tasks. Return

9: E.g., Physics engines for quantum chemistry, such as GAMESS, Psi4, ORCA, and SIESTA, and for large-scale molecular dynamics, such as GROMACS, LAMMPS, and NAMD. Return

10: See Top Groups. Return

11: Cooper, Seth, et al. “Predicting protein structures with a multiplayer online game.Nature 466.7307 (2010): 756-760. Return