CCL: Summary: Topological indices and active sites

Dear netters,

last week I posted the question cited below. Thank you for your

contribution or your interest in a summary, respectively. You can

find most messages below. (Some messages were written in German. I tried to

translate their main statements.)

My deep thanks go to all who responded.

Best regards,

Thomas Wieland

Go back to my homepage, to the MOLGEN or MATCH pages.

Original question:

It has been shown that topological indices are well-suited for describing

similarity/dissimilarity of molecules. If two compounds

have close values of a number of indices, they can be regarded as similar.

The same way correlations with physico-chemical properties are established.

In the beginning, experimental results are known for a number of structures.

By calculating several topological indices for these molecules and performing

some kind of statistical analysis (e.g. regression analysis) a correlation

between the experimental values and the values of the topological indices can be


But often the direction of research is rather for _complementarity_

than for _similarity_. That means that an active site of a pharmacophore is given

and a receptor/inhibitor is desired. I am convinced that topological indices

can also be used for this task of "de novo" design, since many decisive effects

like hydrophobicity or charge distribution can be already modelled. However,

I am not sure how exactly this usage can look like. How can the active site

be analyzed to obtain constraints (or limits) for the values of the topological

indices of the descriptor?

I have already asked this question a number of experts but without substantial

results. A related, but somewhat easier problem is: If I have the topological

structure of a compound, how can I determine if there is any conformation of

this molecule that matches with a given (i.e. given as 3D) active site? What

kind of software should I use to check the binding affinities (and how)?

These problems have also a strong relationship to combinatorial chemistry

where the quality of library could be evaluated by these means.

From: "David Clark" <>

In answer to the latter part of your question:

A related, but somewhat easier problem is: If I have the topological

structure of a compound, how can I determine if there is any conformation of

this molecule that matches with a given (i.e. given as 3D) active site? What

kind of software should I use to check the binding affinities (and how)?

I guess that the first part is really a docking problem. So you would need

to generate 3D coordinates for the molecule and then use some (flexible)

docking procedure to dock the molecule into the active site. There are a

number of programs that will carry out such a docking, see for instance:

author = "Jones, G., Willett, P. and Glen, R.C. ",

title = "Molecular Recognition of Receptor Sites Using a Genetic Algorithm

with A Description of Desolvation ",

journal = "Journal of Molecular Biology ",

year = "1995 ",

volume = "245 ",

pages = "43--53 ",

author = "Gehlhaar, D.K., Verkhivker, G.M., Rejto, P.A., Sherman, C.J.

Fogel, D.B., Fogel, L.J. and Freer, S.T. ",

title = "Molecular Recognition of the Inhibitor AG-1343 by HIV-1

Protease: Conformationally Flexible Docking by Evolutionary Programming ",

journal = "Chemistry and Biology ",

year = "1995 ",

volume = "2 ",

pages = "317--324 ",

Given that a molecule can satisfy the binding site's constraints, it is then

possible to estimate its binding affinity in a number of ways. Some progress

has been made in this area in the last few years. A good review was published

not long ago:

author = "Ajay and Murcko, M.A. ",

title = "Computational Methods to Predict Binding Free Energy in

Ligand-Receptor Complexes ",

journal = "JMC ",

year = "1995 ",

volume = "38 ",

pages = "4953--4967 ",

From: "Peter Slickers" <>

Kommerzielle Modelling-Programme wie Insight und insbesondere Sybyl

haben Module, um QSAR, CoMFA oder 'conformational seach'

durchzufuehren. Da ich diese Methoden selbst nicht benutze, kenne

ich mich nicht besonders gut damit aus. Zumindest gibt es

in der pharmazeutischen Literatur einen Wust von Veroeffentlichungen

zu QSAR oder CoMFA.

Mir ist aber auf die schnelle nur ein Review von van Gunsteren in

die Finger gefallen, wo du weitere Ref.s finden koenntest

(van Gunsteren WF, King PM, Kark AE. 1994. Quarterly Review of

Biophysics, 27: 435-481).


Commercial modelling programs like Insight or esp. Sybyl have modules

for performing QSAR, CoMFA or conformational searches. There are many

literature sources related to this. See for example: van Gunsteren WF,

King PM, Kark AE. 1994. Quarterly Review of Biophysics, 27: 435-481

From: "Prof. Curt M. Breneman" <breneman@XRAY.CHEM.RPI.EDU>

You have certainly touched upon an area of research which is under intense

scrutiny at the present time! I also believe as you do, that it should be

possible to use QSAR/QSPR descriptors of some type to generate good

complementary models of binding sites. Additionally, I agree that this kind

of work will greatly facilitate database searching projects and combinatorial\

library evaluation. As a result of this interest, I have been involved with

a fairly large team of researchers from academia, government and industry

which now goes by the name of the "Eastman Kodak Scientific Computing Team"

in honor of our major sponsor. This group of 30 diverse scientists from

all over the country have been working on this problem for about eight years

now, and we're getting close to a satisfactory solution. The key to the

problem has been to bring together mathematicians and statisticians who can

help to generate and evaluate new statistical and heuristic methods for

qualifying dataset models, and the incorporation of new electron-density

based molecular descriptors which work in concert with some of the older

"traditional" 2D and 3D topological indices. The new electronic indices were

a product of our new Transferable Atom Equivalent electron density reconstruction

code, which enables users to generate molecular electron densities and associated

properties with approximately ab initio quality (it is calibrated against the

HF/6-31+G* theoretical model). These electronic properties are then used to

generate a set of novel descriptors (see the November 1996 issue of J. Comp.

Chem. for details when it comes out).

The whole problem seems to have been to large and long-term for any one

company to commit labor and resources, so it took a combination of interested

parties to get things going. We now believe that we are quite close to

our goal of being able to generate active molecules in a de nouvo design

mode based upon a set of relatively noisy data. We will also be in a position

to qualify our predictions and place error limits on them with a good degree

of certainty. Stay tuned for more.....publication limitations have prevented

much of this information from appearing in the journals until recently.

From: "Amanda Harwood :)" <AirPunky@AOL.COM>

in response to your molecular modeling question, have you tried MacroModel???

From: "Dimitris Agrafiotis" <dimitris@3DP.COM>

A semantic remark. A 'topological' index is a property of the molecular

graph, not the conformation. 'Topography' is what you are interested in.

Which indices did you have in mind?

As for property prediction software, any pattern recognition or regression

software will do the job. ADAPT from Peter Jurs' group is tailored for

chemical applications, but I am not sure it can deal with the problem you

described above.

You may contact Prof. Peter C. Jurs at PCJ@PSUVM.PSU.EDU. He's the

author (actually, the author's advisor) of ADAPT. Hope that helps.

... it's clear that knowing the graph

you know everything because, in theory, every molecular property can be

computed from it. The issue is one of encoding. The 3D conformation of

a molecule is a property that can be derived from the graph, but it's

a far more powerful encoding of it for certain applications. This is

one of the most difficult problems in pattern recognition; to enhance

the signal-to-noise ratio and information content of the data

representation. If your signal is diluted, the decision boundaries

become fuzzy, and your classifier's performance is severely compromised.

From: "Hugo Kubinyi" <>

[Summary of the German message:]

Although they model similarity to some extent, topological indices

are not unique. Sometimes changes in the chain length doesn't affect the

biological effects at all, sometimes enormously.

It is even worse that electronical and sterical properties can

be represented only incompletely. Many tricks have been tried, but most

times biological data is coarsely casted to give a regression equation, losing

many degrees of freedom. This criticism can be read in [1].

It is extremely difficult to draw conclusions from a collection of active

and inactive analogs to the 3D-structure of the binding site. For more

details on de novo design and reviews see [2].

The approximation of the binding affinity from the structure of a

protein ligand complex is a widely unsolved problem some research

projects deal with.

In summary, one can say the the problem is highly complicated. While mathematics

is governable, many basic knowledge in physical chemistry about the inside

of the protein in solution is missing. For an introduction see the soon

appearing book [3].

[1] Hugo Kubinyi, QSAR: Hansch Analysis and Related Approaches, VCH,

Weinheim, 1993

[2] Hugo Kubinyi, Hrsg., 3D QSAR in Drug Design. Theory, Methods and

Applications, ESCOM, Leiden, 1993

[3] Hans-Joachim Boehm, Gerhard Klebe und Hugo Kubinyi, Wirkstoff-

design. Der Weg zum Arzneimittel, Spektrum Akademischer Verlag,

Heidelberg, 1996

From: "Dan Pernich" <>

Topological indices have been shown to be good descriptors for molecular

similarity in some applications. I believe the best use of this technique is

for analyzing diversity and library building, rather than for de novo design

and bioisosterism. This is due to the nature of the indices. In diversity

analysis, one is interested in an "average" similarity or dissimilarity to a

group of other compounds, with many (billions?) of chemical

features/shapes/properties/behaviors/conformation/etc being taken into account

in doing the comparison. This is exactly what the indices do, sum up hundred

and thousands of little bits of information to make each index, each of which

contains information not about fragments or little locations on the molecule,

but rather on the molecule as a whole. In inhibitor design, in contrast, one

is interested in specific interactions at exact locations irrespective of what

else the molecule is doing (in simple cases) and I think indices have a harder

time describing molecules in this sense. In other words, two molecules can be

very dissimilar, and both be excellent inhibitors of a protein, if certain key

interactions can occur. In contrast, two somewhat similar molecules can have

greatly different inhibitor properties if one or two of the key interactions

are missing.

Correlations have been found between indices and inhibition. In most cases I

believe these to be due to a correlation between the indices and the important

interactions between inhibitor and protein that exist in the analog series

studied, rather than being due to a fundamental relationship between inhibition

and the indices themselves. This lack of fundamental cause makes extrapolation

to other series even more risky than usual, and I believe will make it very

difficult to do de novo design using indices.

From: "Dr Susan M Boyd" <>

For this question, could I suggest that some modules of our Cerius2 DDW softwaremay

be applicable to such a problem (to some extent). There is a section of

the software which will calculate a Receptor Surface Model over a series of

active analogues (see Hahn and Rogers, J Med Chem, 1995, vol 38, pp 2091-2102).

It is thus possible to take a series of compounds of unknown activity and


place these within the receptor surface model. Certain energy parameters will be

output for each structure. For example, the strain energy placed upon the molecule

in order to constrain it within the surface, and a theoretical binding energy.

For very conformationally flexible molecules, it would be possible to generate

aseries of conformers prior to the calculation, using the Confirm algorithm (sameas

in Catalyst - see Smellie, Teig & Towbin, J Comput Chem, 1995, vol 16, no 2,

pp 171-187).

There is no reason why this method could not be adapted to use a 'real' molecular

surface instead of the theoretical receptor surface model. If people would be

interested in such methods, I would be very pleased to hear from them.

From: "Weifan Zheng" <weifan@GIBBS.OIT.UNC.EDU>

If you are interested in Molecular Similarity, you could use either 2D

indices (graph theoretical) or 3D (topographical) indices. As for

property prediction, any QSAR method (Multiple Linear Regression, PLS,

BP-ANN, CP-ANN etc.) will do. The following references might be of

interest to you.

Frau, J. (Journal of computer-aided molecular 04/01/96)

On the electrostatic and steric similarity of lactam...

Leherte, L. (Journal of computer-aided molecular 02/01/96)

Similarity and complementarity of moleular shapes: A...

Measures, P.T. (Journal of computer-aided molecular 08/01/95)

Applications of momentum-space similarity.

Blaney, F.E. (Journal of molecular graphics. 06/01/95)

Molecular surface comparison. 2. Similarity of elect...

Willett, Peter (Journal of chemical information and 03/01/96)

Similarity Searching in Files of Three-Dimensional C...

Kearsley, Simon K. (Journal of chemical information and 01/01/96)

Chemical Similarity Using Physiochemical Property De...

Sheridan, Robert P. (Journal of chemical information and 01/01/96)

Chemical Similarity Using Geomtric Atom Pair Descrip...

Hall, Lowell H. (Journal of chemical information and 11/01/95)

Molecular Similarity Based on Novel Atom-Type Electr...

Basak, Subhash C. (Journal of chemical information and 05/01/95)

Molecular Similarity and Estimation of Molecular Pro...

Mezey, Paul G. (Journal of chemical information a... 05/01/95)

Shape Group Analysis of Molecular Similarity: Shape

Downs, Geoffrey M. (Journal of chemical information a... 09/01/94)

Similarity Searching and Clustering of Chemical-Stru...

Judson, Philip N. (Journal of chemical information and 07/01/94)

Structural Similarity Searching Using Descriptors De...

Mezey, Paul G. (Journal of chemical information and 03/01/94)

Iterated Similarity Sequences and Shape ID Numbers f...

Rowray, D.H. (Journal of chemical information a... 03/01/94)

Similarity Studies. 1. The Necessity for Analogies i...

Basak, Subhash C. (Journal of chemical information and 03/01/94)

Application of Graph Theoretical Parameters in Quant...

Fisanick, William (Journal of chemical information and 01/01/94)

Similarity Searching on CAS Registry Substances.2.2D...

Bath, Peter A. (Journal of chemical information and 01/01/94)

Similarity Searching in Files of Three-Dimensional C...

Wild, David J. (Journal of chemical information and 01/01/94)

Similarity Searching in Files of Three-Dimensional C...

Drefahl, Axel (Journal of chemical information and 11/01/93)

Similarity-Based Search and Evaluation of Environmet...

Ponec, Robert (Journal of chemical information and 11/01/93)

Similarity Approach to Chemical Reactivity. A Simple...

Bradley, Mary (Journal of chemical information and 09/01/93)

Deducing Molecular Similarity Using Binding Sites.

Fisanick, William (Journal of chemical information and 07/01/93)

Experimental System for Similarity and 3D Searching

Good, A.C. (Journal of chemical information and 01/01/93)

Rapid Evaluation of Shape Similarity Using Gaussian

Judson, Philip N. (Journal of chemical information and 11/01/92)

Structural Similarity Searching Using Descriptors De...

Mezey, Paul G. (Journal of chemical information a... 11/01/92)

Shape-Similarity Measures for Molecular Bodies: A Th...

Allan, Neil L. (Journal of chemical information and 11/01/92)

A Momentum-Space Approach to Molecular Similarity.

Barnard, J.M. (Journal of chemical information a... 11/01/92)

Clustering of Chemical Structures on the Basis of Tw...

Takahashi, Yoshimasa (Journal of chemical information and 11/01/92)

Automatic Identification of Molecular Similarity Usi...

Rouvray, D.H. (Journal of chemical information and 11/01/92)

Definition and Role of Similarity Concepts in the Ch...

Hicks, Martin G. (Journal of chemical information and 11/01/92)

Similarity and the Beilstein Information System: Sea...

Artymiuk, Peter J. (Journal of chemical information and 11/01/92)

Similarity Searching in Databases of Three-Dimension...

Randic, Milan (Journal of chemical information and 11/01/92)

Similarity Based on Extended Basis Descriptors.

Perry, Nicholas C. (Journal of chemical information and 11/01/92)

Database Searching on the Basis of Three-Dimensional...

Fisanick, William (Journal of chemical information and 11/01/92)

Similarity Searching on CAS Registry Substances. 1.

Hagadone, T. R. (Journal of chemical information a... 09/01/92)

Molecular Substructure Similarity Searching: Efficie...

Good, A.C. (Journal of chemical information and 05/01/92)

Utilization of Gaussian Functions for the Rapid Eval...