Department of Mathematics, University of Bayreuth, D95440 Bayreuth, Germany
Email: wieland@btm2d1.mat.unibayreuth.de
In the automatization of molecular structure elucidation, generators for constitutional and configurational isomers are indispensable tools. Sophisticated computer programs of this kind are based on a mathematical theory of the construction of discrete structures. First, we will describe the basic principles for the generation of constitutional isomers, starting with the notions of group actions and ending up with recent improvements with respect to hybridization constraints. The second part is devoted to the generation of all configurational isomers to a given structural isomer. The underlying mathematical concepts as well as the techniques for computing spatial realizations are discussed. The outlined theory is implemented in the program system MOLGEN, which provides a fast and efficient tool for structure elucidation, running under DOS, MSWindows, OS/2 and SUNSolaris.
A common maxim of all approaches is that to be useful in everyday's work in a laboratory, a structure generator must fulfil at least two requirements: a) It must be exhaustive, i.e. it must generate all isomers corresponding to the given data. b) It must be irredundant, i.e. no isomers should occur twice. In order to meet these requirements, sophisticated algorithms based on mathematical theory must be applied. The efficiency of the programs depends on how the problem of redundancy is treated; algorithms using algebraic techniques generally excel over pure combinatorial methods.
For the algorithmic treatment, chemical compounds must be modeled on a certain level of abstraction. The following levels are commonly used:
Figure 2. Configurational isomers of tartaric acid.
Let G denote a finite group and X a finite set. An action of G on X is described by a mapping
The action defines substructures on G and X:
A labeled mgraph with p vertices is an element f from X to {0,1,...,m} with X:=p^[2]:= {{i,k} 1 lower or equal i<k lower or equal p}, where f({i,k})=j means that the pair {i,k} of vertices connected by an edge of degree j; if j=0, the vertices are not connected. To derive all (unlabeled) multigraphs we have to compute the orbits of the action of the symmetric group Sp on X to {0,1,...,m}. Early algorithms tried to determine the orbits by comparing each potential representative to all previously found. The expenditure on such computations, however, is enormous. A better way is to calculate canonical representatives only, e.g. representatives that are maximal within their orbits.
By writing f in X to {0,1,...,m} as a vector f=/f(x1),f(x2),.../ we can formulate the basic algorithm as follows:
The data contains two elements:
This approach bears several advantages:
Restriction  Number of isomers 

  2,558,517 
max. double bonds  2,434,123 
no OO  360,594 
Hdistribution (2,1,1,0,0,0,1,1,1,1,0,0) 
35,058 
Multiplicities (Csp3,Csp3,Csp3,Csp2,Csp2,Csp2, Osp3,Osp3,Osp3,Osp3,Osp3,Osp2) 
990 
(The current version of the user interface in MOLGEN (vide infra) does not allow the input of hydridizations. This is to be incorporated together with further developments of the generator [23].)
With this algorithm the total variety of structures corresponding to the given data can be computed  regardless as to whether they can be synthesized or even exist in reality. As opinions of what is stable and what is not are subject to changes over the years, the check on plausibility is left to the user. We include, however, a list of rather unlikely substructures (like a 4ring with an sphybridized carbon or a 3ring with three sp2hybridized carbons) to cull isomers thought to be unrealistic. Since this list is composed from data files, it can freely be altered according to the individual needs.
If we take dioxin C12O2H4Cl4, for example, with the skeleton
A number of approaches [2528] have been reported, which are all more or less effective. What we want to discuss here, are the mathematical foundations.
The generation of configurational isomers after the algorithm by Nourse et al. [25] is performed in three steps:
The automorphism group is constructed in our algorithm in two steps: First the vertices of the graph are partitioned such that there cannot be automorphisms that map vertices from different partitions onto one another. Next all possible permutations within the partitions are checked as to whether they are also automorphisms of the graph. The result is stored as a SIMS chain [29].
To be able to write the ligands of a center as (ordered) tuples, the
environment of the center is considered as locally placed in space. Although
this is easily incorporated in the algorithm via the initial numbering of the
vertices, it is a useful method for interpreting changes of the order of the
ligands as changes of the configuration (depending on the sign).
As an example we consider the following structure:
Table 2. The CSG of the structure III, where e:=(0)(1) and i:=(01).
(e,e,e,e,e,e,e,e);  1 
(e,e,i,e,e,i,e,e);  1 
(e,i,e,e,e,e,e,i);  1 
(e,i,i,e,e,i,e,i);  1 
(i,e,e,i,e,e,e,e);  (12)(57) 
(i,i,e,i,e,e,e,i);  (12)(57) 
(i,e,i,i,e,i,e,e);  (12)(57) 
(i,i,i,i,e,i,e,i);  (12)(57) 
(e,i,i,e,e,e,e,e);  (03)(46) 
(e,i,e,e,e,i,e,e);  (03)(46) 
(e,e,i,e,e,e,e,i);  (03)(46) 
(e,e,e,e,e,i,e,i);  (03)(46) 
(i,i,i,i,e,e,e,e);  (03)(12)(46)(57) 
(i,e,i,i,e,e,e,i);  (03)(12)(46)(57) 
(i,i,e,i,e,i,e,e);  (03)(12)(46)(57) 
(i,e,e,i,e,i,e,i);  (03)(12)(46)(57) 
So the permutations of the automorphism group must be extended by mappings which represent the ability of a ligand permutation to change the configuration of the center. These mappings will be called inversions. (In Nourse's notation [30] these are called superscripts.) They are determined as follows:
(0, 0, 0, 0, 0, 0, 0, 0)  (0, 1, 0, 0, 0, 0, 0, 1)  
(0, 0, 0, 0, 0, 0, 0, 1)  (0, 1, 0, 0, 0, 0, 1, 0)  
(0, 0, 0, 1, 0, 0, 0, 0)  (0, 1, 0, 1, 0, 0, 0, 0)  
(0, 0, 0, 1, 0, 0, 0, 1)  (0, 1, 0, 1, 0, 0, 0, 1)  
(0, 0, 0, 1, 0, 0, 1, 0)  (0, 1, 0, 1, 0, 0, 1, 0)  
(0, 0, 0, 1, 0, 0, 1, 1)  (0, 1, 0, 1, 0, 0, 1, 1)  
(0, 0, 1, 1, 0, 0, 0, 0)  (0, 1, 1, 1, 0, 0, 0, 0)  
(0, 0, 1, 1, 0, 0, 0, 1)  (0, 1, 1, 1, 0, 0, 0, 1)  
(0, 1, 0, 0, 0, 0, 0, 0)  (0, 1, 1, 1, 0, 0, 1, 0) 
So we can define the configuration symmetry group (CSG) as C(g,b) := { (E;f)  f in A(g,b)}, and it proves [28] to be a subgroup of the wreath product [14,15] of the S2 over {1,...,s} with A(g,b), if there are s stereocenters in the molecule.
The CSG of the cyclobutanderivative III is listed in table 2. The stereocenters are numbered.
After the construction of C(g,b), we are able to verify the potential stereocenters by the following rule: if there is a permutation in the automorphism group whose restriction to the ligands of a potential stereocenter is odd, this atom is a real stereocenter only if the nonfixed ligands themselves contain at least one stereocenter. The direction of the search should be from the outer to the inner regions of the molecule in order to reduce the expenditure and to avoid overlooking a center.
For the actual generation of the stereoisomers, we remark that with s stereocenters we get all essentially different configurational isomers by computing a transversal of C(,)\\{1,...,s} to {0,1}. This transversal can be computed with (an enhanced version of) the basic algorithm from section 2.
A list with the stereoisomers of III as (0,1)sequences is shown in table 3. As any structural isomer has one or more configurational isomers, the numbers of possible stereoisomers of a set of constitutional isomers can be computed, as shown in table 4.
It is also interesting to compare the sizes of both sets of isomers with each other. In fig. 3 the ratios of stereoisomers per structural isomer are printed. While the numbers of constitutional isomers of CnHm have peaks at 4 to 8 hydrogens, the ratio decreases monotonously as the number of hydrogens increases.
As discussed above, this algorithm also generates all isomers without regard to their realizability. The evaluation of the results is left to the user.
The generation algorithm in the form discussed above can only handle stereocenters of valence 3 or 4. It is, however, possible to extend it to higher valences [31].
CiHj  2  4  6  8  10  12  14  16  18  20  22  

3 




              
4 





            
5 






          
6 







        
7 








      
8 









    
9 










  
10 











Figure 3. Ratios of stereoisomers per structural isomer
This can be done by automatic model builders [3234] or by geometric construction starting from reference coordinates, e.g. obtained by a molecular mechanics calculation [35]. We followed the second way, basing our method on earlier approaches [27, 36].
The first step is to identify the configuration in the reference placement. This is performed by geometric analysis: Let a stereocenter (position X) have its neighbours in the positions X1, X2, X3 and X4. Calculate n := X1X2 x X1X3. If n.(X1X) <0, return 1; else return 0. The double bonds are examined analogously.
Next the ligands at the centers are reflected. In the recursive procedure four cases are considered:
IV
Its four stereoisomers are shown in fig. 4. Another example, now for case 3, is provided by the spirocompound V. There are 8 configurational isomers of this molecule; two of them are drawn in fig. 5
V
Figure 4. Stereoisomers of structure IV
Figure 5. Configurations (0,0,0,0) and (0,1,0,1) of the eight stereoisomers of structure V
As the users of such software differ considerably, MOLGEN is available [37] for DOS, Microsoft Windows, IBM OS/2 and Sun SOLARIS/Motif in different versions. Demos are accessible from the WorldWide Web document http://btm2xd.mat.unibayreuth.de/molgen.
A prominent application of structure generating programs are automated structure elucidation systems. At present MOLGEN is incorporated in SpecInfo [38]. There a given spectrum is interpreted via a large spectra database, yielding a set of substructures. MOLGEN generates candidates based thereon, which can afterwards be verified.
But a structure generator can also be useful in chemical education [19,20], e.g. to visualize the variety of isomerism, in pharmacology [39], and various other fields.
We are grateful to the German Federal Ministry of Education, Science, Research and Technology (BMBF) for financial support.