Aniko T. Valko, A. Peter Johnson
Presentation held at:
248th ACS National Meeting
CINF, Hunting for Hidden Treasures: Chemistry Text Mining in Patents and Other Documents
August 2014, Philedalphia, PA, USA
ABSTRACT
We present an enhanced version of CLiDE, which is a long-term project aimed at detecting chemical structure diagrams rendered in images and converting these diagrams into chemical connection tables. The enhancement was achieved by introducing a feedback mechanism into CLiDE's interpretation process. This mechanism makes use of a series of domain- and spatial-specific rules for identifying drawing features that convey a complex or an ambiguous meaning. Once such a feature is found, CLiDE automatically corrects the structural information being compiled and passed through subsequent interpretation steps.
This enhancement has a considerable effect on CLiDE's accuracy in reconstructing chemical structures and auto-detecting interpretation errors. A detailed study of CLiDE's performance on a large validation corpus will be presented. The validation corpus will include benchmark sets created by other projects and a set of non-Markush structures collected from patent documents.