A x e l e r a t i o

Material notations
By Axel Drefahl

Introduction

MM! material notations are based on the CurlySMILES language. This language provides the format in which users perform search and participants add new material information. MM! employs a subset of CurlySMILES to formulate a material by its composition and structural attributes: the stoichiometric formula notation (SFN) and the square bracket atomic code (SQC). SFN and SQC input can be refined by using state and shape annotations and/or miscellaneous interest annotations. These input formats are explained in the following sections. Further information on CurlySMILES is available at the project site and in the open-access publication in the Journal of Cheminformatics [doi: 10.1186/1758-2946-3-1].
Custom Search


Annotation-free SFN

An SFN is similar to an empirical or molecular formula, consisting of a sequence of atomic symbols with appropriate integer subscripts to express the material composition. Atomic symbols may occur in any order and at multiple places. However, it is recommended to take the commonly used symbol order. For example, zirconium dioxide should be entered as ZrO2 to be consistent with the ZrO2 formula, in which the symbol fore the oxygen atoms occurs at the end as it does in metal oxide formulae.

Atomic symbols can be grouped by enclosing them with round brackets. A stoichiometric integer follows the closing round bracket, when required; for example:

Cu3(CO3)2(OH)2 ..... azurite, Cu3(CO3)2(OH)2

When an SFN represents an ionic species, a charge notation (n+) or (n-) with n ≥ 1 is placed at the end of the notation; for example:

Bi5(4+) ..... pentabismuth(4+) cation, Bi54+

An isotopical label is encoded as a caret followed by the label integer placed directly in front of an atomic symbols:

^2H2O ..... deuterium oxide ( [2H]2-water ), 2H2O

Annotated SFN

An SFN enclosed by {* and } forms a CurlySMILES component notation and can be annotated by shape and state annotations and/or miscellaneous interest annotations . For example, the state and shape annotation {am} specifies an amorphous material

{*Al2O3}{am} ..... amorphous aluminum oxide

A crystalline polymorph is specified using the state and shape annotation marker (SSAM) cr followed by appropriate annotation dictionary entries such as a phase name

{*Al2O3}{crphn=corundum} ..... aluminum oxide in the crystalline corundum phase,

or including the space group notation:

{*Al2O3}{crphn=corundum;spg=-32/m} ..... crystalline aluminum oxide specified by phase name and space group.

Plain and Annoted SQC node

The SQC is a special type of the atomic node code (ANC), a SMILES and CurlySMILES format to encode atomic nodes of the hydrogen-suppressed molecular graphs. Since MM! allows the entry of bare SFNs, ANC entry is not possible without ambiguity. MM! requires atomic nodes to be entered as SQC. SQC are used to represent chemical elements, isotopically labeled atoms and chemical species, including cations and anions, which contain not more than one non-hydrogen atom.

[C] ..... carbon (as single atom or material)
[OH2] ..... water
[NH4+] ..... ammonium cation

An SQC notation represents either a single atom or molecule or a material composed thereof, depending on context. CurlySMILES provides a rich annotation grammar to specify contextual details. For example, the carbon SQC can be annotated to specify a structural modification (allotrope) in which carbon compounds are known to exist:

[C]{crall=diamond} ..... crystalline allotrope diamond
[C]{sdall=graphite} ..... solid allotrope graphite
[C]{alall=graphene} ..... atomic layer (al) of graphene

Annotations can assign a role to atoms and ions, for example as impurities or dissolved species in a medium:

[Cu]{IMc=[Si]} ..... copper impurity in silicon
[NH+]{dsc=O} ..... ammonia ions dissolved in water

CurlySMILES provides a format to encode a mononuclear complex by presenting the metal center as SQC and the ligands as a list of SMILES notations within an annotation. For example, cis-dichlorobis(dibenzyl sulfido-κS)platinum(II) can be encodes as

[Pt+2]{+Lc=[Cl-]{2}.c1ccccc1CS{!I}Cc2ccccc2{2};rcg=cis}

Notice that the current version of MM! does not perform search by complete annotation matching, but accounts for annotation content in the way search results are organized and presented. Sniplinks associated with the structure of the last example will be found by typing [Pt+2] as a query. The sniplinks will then be listed in the domain of coordination compounds.