Due to the well-known structure-function paradigm, conformational equilibrium plays a major role in molecular recognition. Therefore, a deep understanding of the conformational profile of small organic molecules is an essential prerequisite to modern computer-assisted drug design. However, a thorough analysis and a meaningful representation of the conformational landscape of drug-like molecules remains a challenge. The thermodynamic equilibrium of conformational states can be described in terms of probability density function (PDF) defined in the space of the relevant degrees of freedom of the system. In principle, this PDF could be estimated by traditional histogram methods, which are, however, hampered by several limitations when the variables forming the space are more than two or three. Here, we present an unsupervised parametric fitting procedure based on cluster analysis, aimed at estimating the PDF in the conformational space of small drug-like molecules with low sensitivity to data dimensionality. Indeed, data are represented in the dihedral space of the molecule and clustered using a simple adaptation of the standard k-means algorithm for periodic data. In the final step of the analysis, the PDF is derived as a linear combination of multivariate circular Gaussian distributions. We show that exploiting the analytic properties of Gaussian distributions, the proposed approach makes it possible to analyze the conformational ensemble in higher dimensional spaces with several advantages over the histogram-based methods. The posterior analysis of the PDF also helps identify a minimal subset of variables able to provide a meaningful representation of the conformational space. We tested our approach on alanine dipeptide, alanine tetrapeptide, and rilpivirine with satisfactory results compared to standard histogram-based methods and to those based on chemical intuition.
ASJC Scopus subject areas
- Computer Science Applications
- Physical and Theoretical Chemistry