Chemical Literature Data Extraction: Bond crossing in single and multiple structures

Florence Kam, R. W. Simpson, C. Tonnelier, Tibor Venczel, A. Peter Johnson
School of Chemistry
University of Leeds
Leeds, LS2 9JT
United Kingdom
Presentation held at:
1992 Chemical Information Conference
1992, Annecy, France

The procedure to convert a scanned image of a page of chemical structure diagrams (with accompanying text) into a set of connection tables is one of the primary aims of the CLiDE project. These connection tables can be used in a variety of computer-based applications such as building and maintaining databases. The image is decomposed into component graphics and text which are further analysed to find the lines, wedges, and chemical text strings. In an interpretation phase the connection tables for the molecules are build from these items. The correct interpretation of chemical bonding in the image is often hampered by the constraints of representing a three-dimensional molecule in two dimensions where one bond may be drawn over another. A method of identifying and successfully dealing with these situations is described. A related situation where a bond is drawn crossing a ring implying an undetermined point of attachment is also solved. Examples are presented to illustrate these situations and the rules implemented to handle these structures within the CLiDE program discussed.