Learning Causal Models from Molecular Dynamics Data

Date:

Molecular dynamics (MD) simulations are well-established techniques to investigate the molecular mechanisms that underlie chemical reactivity in complex (bio)molecular systems, the factors that determine their spectral properties, or the physical forces that drive biologically relevant processes. However, the lack of proper tools to analyze and extract unbiased information from the inherently high-dimensional data obtained from the simulations is one of the main drawbacks that limit the applicability and predictive ability of these methodologies. Correlation analyses are very often used to infer cause-effect relationships between electronic or structural features and molecular properties; however, as the validity of these correlation-only strategies to carry out causal analyses depends on very strong assumptions, they cannot be in general considered as legitimate. Furthermore, they can lead to a biased interpretation of the information gained from the simulations. In this talk, I will show how a combined approach based on the application of machine learning techniques, to reduce the dimensionality of the data extracted from the simulations, and causality inference algorithms, to identify the underlying causal structure of the previously reduced dataset, can be employed to overcome the aforementioned limitations and predict molecular properties from structural MD data.