Krisztina Boda, A. Peter Johnson
School of Chemistry
University of Leeds
Leeds, LS2 9JT
United Kingdom
Presentation held at:
2nd Joint Sheffield Conference on Chemoinformatics
April 2001, Sheffield, United Kingdom
ABSTRACT
One of the deficiencies of de novo molecular structure design programs is that after a structure generation process which may be very demanding of computer resources, many of the solutions produced may not be synthetically accessible. The CAESA program attempts to overcome this deficiency by post generation scoring and ranking according to an estimate of synthetic accessibility, but this approach is inefficient in that large numbers of structures are generated and then pruned in a computationally demanding process.
The approach used in SynSPROUT, a new variant of SPROUT, is to build synthetic constraints into the structure generation process by staring with a library of readily available starting materials, which are used in both the initial docking process and also in a build up process which only permits joins which correspond exactly to a chemical reaction defined in a user created knowledge base.
The current version of the program works well with medium sized databases of starting materials. For large databases such as ACD,the combinatorial nature of the structure generation process means that even the recently developed parallel version would be too slow and work in hand is geared to overcoming this problem. The presentation will provide an overview of the problems encountered and some solutions together with examples of the system in action.