-
Notifications
You must be signed in to change notification settings - Fork 4
Description
From Daylight's SMARTS page:
https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
4.2 Bond Primitives
Various bond symbols are available to match connections between atoms. A missing bond symbol is interpreted as "single or aromatic".
In practice, most tools don't really honor this daylight convention, per se. And that's mostly okay. By a strict reading of the Daylight resource a SMARTS query of c1ccccc1 (benzene) would actually be interpreted as having single-or-aromatic bonds between each atom with each atom itself having at least one aromatic bond somewhere. This is impractical to specify in a molfile.
Typically when a tool produces a SMARTS/SMILES pattern with aromatic atoms (e.g. c) but non-specific bonds between those aromatic atoms (e.g. cc), the common interpretation is that the unspecified bond is aromatic (e.g. c:c). Similarly, when a tool produces a SMARTS/SMILES pattern with aliphatic atoms (e.g. C), but a non-specific bond (e.g. CC), the common interpretation is an implied single bond (e.g. C-C). These conventions are widely used even if they present some problems.
The compromise solution requires a modification to Daylight's statement:
A missing bond symbol BETWEEN ATOMS WHERE AT LEAST ONE ATOM HAS A QUERY FEATURE is interpreted as "single or aromatic".
That is, it's fine to have explicit non-query atoms imply the bonds between them. But if at least one atom is a query atom, AND the SMARTS pattern does not specify a bond type, it should get interpreted as single-or-aromatic. For example:
| Ambiguous SMARTS | Equivalent to |
|---|---|
| cc | c:c |
| CC | C-C |
| C[#6] | C-,:[#6] |
| C[*] | C-,:[*] |
| [#6,#7][#6] | [#6,#7]-,:[#6] |
Here a "query atom" is any atom specified as an atom list (including a list of 1 element) or an atom wildcard.