Thursday, 24 October 2013

SMILES!!!

 it is not just an expression..actually, almost none of us knew that this term could refer to other meaning.

SO, WHAT IS SMILES??

Simplified Molecular-Input Line-Entry System 




ITS FUNCTION??

ITS FOUNDER??

 David Weininger at the USEPA Mid-Continent Ecology Division Laboratory in Duluth (1980).


ASSISTS BY??
  • Gilman Veith and Rose Russo (USEPA) and Albert Leo and Corwin Hansch (Pomona College)
  •  Arthur Weininger (Pomona; Daylight CIS) and Jeremy Scofield (Cedar River Software, Renton, WA) for assistance in programming the system

PROJECT'S FUNDER??



MODIFIED BY??


here some example of SMILES notation ;



SMILESNAME
CCethane
O=C=Ocarbon dioxide
C#Nhydrogen cyanide
CCN(CC)CCtriethylamine
CC(=O)Oacetid acid
C1CCCCC1cyclohexane
c1ccccc1benzene

SMILES Specification Rules

SMILES notation consists of a series of characters containing no spaces. Hydrogen atoms may be omitted (hydrogen-suppressed graphs) or included (hydrogen-complete graphs). Aromatic structures may be specified directly or in Kekulé form.
There are five generic SMILES encoding rules, corresponding to specification of atoms, bonds, branches, ring closures, and disconnections.

3.2.1 Atoms

Atoms are represented by their atomic symbols: this is the only required use of letters in SMILES. Each non-hydrogen atom is specified independently by its atomic symbol enclosed in square brackets, [ ]. The second letter of two-character symbols must be entered in lower case. Elements in the "organic subset" B, C, N, O, P, S, F, Cl, Br, and I may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds. "Lowest normal valences" are B (3), C (4), N (3,5), O (2), P (3,5), S (2,4,6), and 1 for the halogens. Atoms in aromatic rings are specified by lower case letters, e.g., aliphatic carbon is represented by the capital letter C, aromatic carbon by lower case c. Since attached hydrogens are implied in the absence of brackets, the following atomic symbols are valid SMILES notations.

Cmethane(CH4)
Pphosphine(PH3)
Nammonia(NH3)
Shydrogen sulfide(H2S)
Owater(H2O)
Clhydrochloric acid(HCl)
Atoms with valences other than "normal" and elements not in the "organic subset" must be described in brackets.

[S]elemental sulfur
[Au]elemental gold
Within brackets, any attached hydrogens and formal charges must always be specified. The number of attached hydrogens is shown by the symbol H followed by an optional digit. Similarly, a formal charge is shown by one of the symbols + or -, followed by an optional digit. If unspecified, the number of attached hydrogens and charge are assumed to be zero for an atom inside brackets. Constructions of the form [Fe+++] are synonymous with the form [Fe+3]. Examples are:

[H+]proton
[Fe+2]iron (II) cation
[OH-]hydroxyl anion
[Fe++]iron (II) cation
[OH3+]hydronium cation
[NH4+]ammonium cation

3.2.2 Bonds

Single, double, triple, and aromatic bonds are represented by the symbols -, =, #, and :, respectively. Adjacent atoms are assumed to be connected to each other by a single or aromatic bond (single and aromatic bonds may always be omitted). Examples are:

CCethane(CH3CH3)
C=Oformaldehyde(CH2O)
C=Cethene(CH2=CH2)
O=C=Ocarbon dioxide(CO2)
COCdimethyl ether(CH3OCH3)
C#Nhydrogen cyanide(HCN)
CCOethanol(CH3CH2OH)
[H][H]molecular hydrogen(H2)
For linear structures, SMILES notation corresponds to conventional diagrammatic notation except that hydrogens and single bonds are generally omitted. For example, 6-hydroxy-1,4-hexadiene can be represented by many equally valid SMILES, including the following three:

StructureValid SMILES
 C=CCC=CCO
CH2=CH-CH2-CH=CH-CH2-OHC=C-C-C=C-C-O
 OCC=CCC=C

3.2.3 Branches

Branches are specified by enclosing them in parentheses, and can be nested or stacked. In all cases, the implicit connection to a parenthesized expression (a "branch") is to the left. Examples are:

CCN(CC)CCCC(C)C(=O)OC=CC(CCC)C(C(C)C)CCC
TriethylamineIsobutyric acid3-propyl-4-isopropyl-1-heptene

3.2.4 Cyclic Structures

Cyclic structures are represented by breaking one bond in each ring. The bonds are numbered in any order, designating ring opening (or ring closure) bonds by a digit immediately following the atomic symbol at each ring closure. This leaves a connected non-cyclic graph which is written as a non-cyclic structure using the three rules described above. Cyclohexane is a typical example:

There are usually many different, but equally valid descriptions of the same structure, e.g., the following SMILES notations for 1-methyl-3-bromo-cyclohexene-1:

Many other notations may be written for the same structure, deriving from different ring closures. SMILES does not have a preferred entry on input; although (a) above may be simplest, others are just as valid.
A single atom may have more than one ring closure. This is illustrated by the structure of cubane in which two atoms have more than one ring closure:

Generation of SMILES for cubane: C12C3C4C1C5C4C3C25.
If desired, digits denoting ring closures can be reused. As an example, the digit 1 used twice in the specification:

O1CCCCC1N1CCCCC1
The ability to re-use ring closure digits makes it possible to specify structures with 10 or more rings. Structures that require more than 10 ring closures to be open at once are exceedingly rare. If necessary or desired, higher-numbered ring closures may be specified by prefacing a two-digit number with percent sign (%). For example, C2%13%24 is a carbon atom with a ring closures 2, 13, and 24 .

3.2.5 Disconnected Structures

Disconnected compounds are written as individual structures separated by a "." (period). The order in which ions or ligands are listed is arbitrary. There is no implied pairing of one charge with another, nor is it necessary to have a net zero charge. If desired, the SMILES of one ion may be imbedded within another as shown in the example of sodium phenoxide.

Matching pairs of digits following atom specifications imply that the atoms are bonded to each other. The bond may be explicit (bond symbol and/or direction preceding the ring closure digit) or implicit (a nondirectional single or aromatic bond). This is true whether or not the bond ends up as part of a ring.

Adjacent atoms separated by dot (.) implies that the atoms are not bonded to each other. This is true whether or not the atoms are in the same connected component.
For example, C1.C1 specifies the same molecule as CC(ethane)

if SMILES manage to draw your attention, do click this to know more.

No comments:

Post a Comment