data_driven_chemistry

Data-Driven Chemistry

Like most scientists, chemists are drowning in data from laboratory experiments and from calculations. We are developing tools using machine learning to automate the analysis of quantum-chemistry. Another area in need of automation is in the development of quantitative structure-property relationships, particularly where flexible molecules are concerned.

Collaborators

Matt Sigman (Utah), Tom Rovis (Columbia); Steven Fletcher (Oxford)

Key Papers

Predicting Lewis Acidity: Machine Learning the Fluoride Ion Affinity of p-Block Atom-based Molecules.

Sigmund, L. M.; Sowndarya, S. S. V.; Albers, A.; Erdmann, P.; Paton, R. S.; Greb, L. Angew. Chem. Int. Ed. 2024, DOI: 10.1002/anie.202401084

Combining mechanistic and statistical models for predicting reaction outcomes in organic synthesis.

Gallegos, L. C. Colorado State University 2023

Regiodivergent Nucleophilic Fluorination under Hydrogen Bonding Catalysis: A Computational and Experimental Study.

Horwitz, M. A.; Dürr, A. B.; Afratis, K.; Chen, Z.; Soika, J.; Christensen, K. E.; Fushimi, M.; Paton, R. S.; Gouverneur, V. J. Am. Chem. Soc. 2023, 145, 9708–9717

Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries.

Sowndarya, S. S. V.; Law, J.; Tripp, C.; Duplyakin, D.; Skordilis, E.; Biagioni, D.; Paton, R. S.; St. John, P. C. Nat. Mach. Intell. 2022, 7, 720–730

Mechanistic Studies Yield Improved Protocols for Base-Catalyzed anti-Markovnikov Alcohol Addition Reactions.

Luo, C.; Alegre-Requena, J. V.; Sujansky, S. J.; Pajk, S.; Gallegos, L. C.; Paton, R. S.; Bandar, J. S. J. Am. Chem. Soc. 2022, 144, 9586–9596

Homologation of Electron-Rich Benzyl Bromide Derivatives via Diazo C–C Bond Insertion.
Modak, A.; Alegre-Requena, J. V.; Lescure, L.; Rynders, K. J.; Paton, R. S.; Race, N. J. Am. Chem. Soc. 2022, 144, 86–92
A Quantitative Metric for Organic Radical Persistence Using Thermodynamic and Kinetic Features.

Sowndarya, S. S. V.; St. John, P. C.; Paton, R. S.Chem. Sci. 2021, 12, 13158-13166.

Real-time Prediction of 1H and 13C Chemical Shifts with DFT accuracy using a 3D Graph Neural Network.

Guan, Y.; Sowndarya, S. S. V.; Gallegos, L. C.; St. John, P. C.; Paton, R. S. Chem. Sci. 2021, 12, 12012-12026.

CASCADE.

CASCADE stands for ChemicAShift CAlculation with DEep learning. It is a stereochemically-aware graph network for the prediction of NMR chemical shifts. Model training was performed against 8,000 DFT structures followed by transfer learning with experimental  spectra. A web-server has been created to access CASCADE predictions from SMILES or by drawing structures in the graphical interface. An automated workflow executes 3D structure embedding and MMFF conformer searching. The full ensemble of optimized conformations are passed to a trained graph neural network to predict the NMR chemical shifts (in ppm) for C and H atoms. The underlying datasets used for training and the Python code to run CASCADE from the command line have been made available.

[GitHub]
DBSTEP.

DBSTEP is a python package for obtaining DFT-Based Steric Parameters from 3-dimensional chemical structures. It can parse the outputs from most computational chemistry programs and other common molecular structure file formats. Steric properties can either be obtained exactly or by using a Cartesian grid, the latter approach being amenable to the featurization of a molecular isodensity surface (DBSTEP can process wavefunction files) rather than using classical atomic radii. Currently,  traditional Sterimol parameters (L, Bmin, Bmax) and percent buried volume parameters are implemented, as well as  our novel steric parameter vectors Sterimol2vec and vol2vec. This package is designed for use on the command line or alternatively implemented in a Python script for use in a computational workflow to collect steric parameters.

[GitHub]