Group encoding in CurlySMILES uses the structural unit annotation format. A structural unit annotation consists of a bond symbol (-, =, #, :, & or ~ ) enclosed in curly braces. CurlySMILES encoding of a terminal group requires exactly one such annotation, which is anchored at that atom of the group which contains the formally open bond. For example, the methyl (-CH3), amino (-NH2), hydroxy (-OH), and fluoro (-F) group have respective notations C{-}, N{-}, O{-}, and F{-}. As in the original SMILES language, in CurlySMILES the number of hydrogen atoms attached to a non-hydrogen atom is derived from normal valence assumptions. An open single, double, and triple bond "substitutes" one, two, and three hydrogen atoms, respectively. The notation N{=} , for example, represents an imino group (=NH), in which each valence of the double bond formally replaces an hydrogen atom of the parent amine molecule (NH3).

For atoms that do not belong to the organic subset (B, C, N, O, P, S, F, Cl, Br, and I) the number of hydrogen atoms is explicitly specified inside the square brackets. For example, the silyl group is encoded as [SiH3]{-}. Formally charged atomic groups are encoded in the same manner: [NH3+]{-} represents the ammonium group in a mono-substituted ammonium cation.
There are no restrictions on group size. The following examples illustrate encoding of terminal groups containing more than one non-hydrogen atom:
load gif/cyclopent2enyl.gif load gif/3isoquinolyl.gif
C1=CCCC1{-} n1c{-}cc2ccccc2c1
Cyclopent-2-enyl group 3-Isoquinolyl group
Non-terminal groups are multiply bonded to other structural units. CurlySMILES encoding of such groups requires corresponding multiple structural unit annotations, as demonstrated for multivalent groups.
_ __ __ submit to reddit __

__ Share on Tumblr ___ bookmark this page

Reference

A. Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures. J. Cheminf. 2011, 3:1;
doi: 10.1186/1758-2946-3-1.




Custom Search