An atomic node code (ANC) encodes a single node in the hydrogen-supressed molecular graph. A node consists of one non-hydrogen atom and its adjacent hydrogen atoms. The SMILES and CurlySMILES languages use the same format: A node is represented either by the bare atomic symbol or the square bracket atomic code (SQC).

SQC encoding is the default. Only symbols of elements that belong to the so-called organic subset may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds. Here is the organic subset: B, C, N, O, P, S, F, Cl, Br, and I.

In the absence of brackets the attached hydrogens are implied. For example, the notations C and P represent methane and phosphine, respectively. Their corresponding SQC-based notations are [CH4] and [PH3]. Silane and arsine, in contrast, always have to be encoded as [SiH4] and [AsH3], since Si and As do not belong to the organic subset. Trichlorosilane can be encoded as Cl[SiH](Cl)Cl, using the SQC only when an atom does not belong to the organic subset. The notation [Cl][SiH]([Cl])[Cl], however, is equally valid. Isotopically labelled atoms and formally charged atoms are required in SQC notation.

In a CurlySMILES notation, an ANC may be followed by an atom-anchored annotation (AAA) such as a stereodescriptor, structural unit annotation, group environment annotation, molecular detail annotation and operational annotation.

References

[1] D. Weininger: SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules . J. Chem. Inf. Comput. Sci. 1988, 28, 31-36; doi: 10.1021/ci00057a005 .
[2] A. Drefahl: CurlySMILES: a chemical language to customize and annotate encodings of molecular and nanodevice structures . J. Cheminf. 2011, 3:1; doi: 10.1186/1758-2946-3-1 .
Custom Search