Enumerating common molecular substructures
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Science
- Keywords
- molecular dynamics simulations, subgraph enumeration, molecular graphs
- Copyright
- © 2017 Engler et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Enumerating common molecular substructures. PeerJ Preprints 5:e3250v1 https://doi.org/10.7287/peerj.preprints.3250v1
Abstract
Finding and enumerating common molecular substructures is an important task in cheminformatics, where small molecules are often modeled as molecular graphs. We introduce the problem of enumerating all maximal k-common molecular fragments of a pair of molecular graphs. A k-common fragment is a common connected induced subgraph that consists of a common core and a common k-neighborhood. It is thus a generalization of the NP-hard task to enumerate all maximal common connected induced subgraphs (MCCIS) of two graphs, which corresponds to the k = 0 case.
We extend the MCCIS enumeration algorithm by Ina Koch and apply algorithm engineering techniques to solve practical instances fast for the general k > 0 case, which is relevant, for example, for automatically generating force field topologies for molecular dynamics (MD) simulations. We find that our methods achieve good performance on a real-world benchmark of all-against-all comparisons of 255 molecules.
Our software is available under the LGPL open source license at https://github.com/enitram/mogli .
Author Comment
This is an article which has been accepted for the GCB 2017 Conference