Chemistry Division
Special Libraries Association


Reports on the Chemical Reaction Database session

Chemistry Reaction Databases: an Introduction
Wednesday June 13, 2001: 1-4 pm
Reported by Robert Powers, Jet Propulsion Laboratory Library

The "Chemistry Reaction Databases" session consisted of two sessions, with a total of five speakers: Dr. Gunter Grethe of MDL Information Systems, Roger Schenck-Chemical Abstracts Services, Matthew Kellett-ISI, Guido F. Herrmann-Science of Synthesis/Houben-Weyl, and Bob Snyder-MDL Information Systems. The speakers gave introductory talks on searching for chemical reaction information, and provided vendor information on their systems.

Session I

Gary Wiggins of the Indiana University moderated the first session.

1)The first speaker was Dr. Gunter Grethe of MDL Information Systems, who talked about " Reaction Information Retrieval Problems and Rewards". Dr Grethe highlighted some of the problems chemists have in locating chemical reaction information. There are currently ~15-20 million reactions in a wide variety of chemical reaction databases (CAS React, Chem React, CrossFire Plus, etc). Dr Grethe's assessment is that working chemists are under-utilizing this reaction data. Why? Unlike molecule queries, which are relatively straightforward to conduct, reaction searches often require the interpretation of a synthetic chemist. In addition, the working chemist often does not have the time, or the training to mine this data. Grethe viewed the role of the "information intermediary" as primarily a teacher/trainer: information professional should educate the chemist in what systems are available, and show them how to use these systems.

Synthetic chemists are often looking for the best methodology to solve a reaction problem at hand. In order to do this they need a system that has the following characteristics:

Dr Grethe then reviewed four types of reaction queries.

Preparation searching is relatively straightforward using tools such as CrossFire Web. Transformation and reaction condition questions, however, often give too many results for the chemist to find the search useful. Dr. Grethe demonstrated several tools that are used in various MDL products, to help refine reaction searches. One of these tools is "Reaction Classification" based on reaction centers. MDL licenses InfoChem's RCP program to classify all their reaction databases. This program classifies reactions based on changes to reaction centers and the immediate vicinity. Software can use these classifications to cluster reactions based on type, manage post-processing of large hit lists, improve queries for transformation searches and link information from different sources.

2)The next speaker was Roger Schenck of CAS, who spoke about using CAS products to search reaction data. Roger reviewed the type and amount of reaction information located in the CA Plus and CAS React search services. CA Plus has about 2.7 million records that have some information on synthesis, going back to 1947. The CAS React database contains more than 200,000 documents, which cover about 3.9 million reactions. CAS React database covers the time frame of 1985 to present. Roger then talked about using STN Express and SciFinder for reaction information.

3)Mathew Kellett of ISI talked about the "ISI Chemistry Server: a versatile source of structured organic chemistry information". The ISI Chemistry Server consists of two products, the Reaction Center, and the Compound Center. The Reaction Center is a database of single and multi step synthetic methods, which includes reaction conditions, product yields, catalysts, as well as bibliographic information. The user can search the Reaction Center via structure/substructure searching, data searching, or by searching the bibliographic information. The Reaction Center consists of two databases: the Current Chemical Reactions database contains about 535,000 reactions, covering the literature from 1985 - present, and the Institut National de la Propriete Industrielle (INPI) database contains about 139,000 reactions, covering the literature from 1840 - 1986.

Session II

The moderator for the second session was Jennifer Kostelnik of Yale University's Sterling Chemistry Library.

4)Guido Herrmann, George Thieme Verlag presented information on Thieme's new edition of Houben-Weyl Methods of Organic Chemistry, which is titled Science of Synthesis, Houben-Weyl Methods of Molecular Transformations. The new edition will be published from 2000-2010. It will consist of between 30,000 - 40,000 pages, and will cover about 15,000 synthetic methods and 150,000 reactions.

In addition to the print version, the Science of Synthesis is designed as a web-based system, which will be available through the Internet.

Data is arranged in a hierarchical structure allowing the user to browse through his/her area of interest like a book. Information in classified by category (eg: organometallics, hetarenes, etc.), volume, product class (eg: organometallic complexes of chromium) and product subclass, method (eg: by a-H elimination from alkyl complexes), and variation (eg: ligand addition).

In addition to hierarchical searching, search engines provide direct access to the relevant methods. Keyword, full text, substructure, structure, and reaction are available. The web version uses ISIS draw for inputting structures for searching. Many of the article references have hypertext links to PDF versions of the full text documents.

5)The last speaker was Robert W. Snyder, Ph.D. Director, Chemistry Marketing MDL Information Systems, who talked about "Solving the reaction retrieval problem". Dr. Snyder started with an overview of reaction informatics.

Reaction infomatics consists of the following attributes:

Dr. Snyder then talked about reaction retrieval resources available through the MDL Information System. These resources include:

Database Name Primary Focus Features

Dr. Snyder ran through several example reaction substructure searches (RSS), using the RXNBROWS software to access MDL databases. Synthetic chemists will often look first for the specific reaction, and then expand the search by looking form more and more generic forms of the reation. Reaction schemes can be displayed for many of the MDL databases. These allow the user to see the full reaction scheme. They provide a concise representation of the overall synthetic strategy in an article. They are convenient for locating the starting materials, intermediates, and end products in a reaction.

Dr. Snyder finished up his presentation with a description of automatic search features that have been incorporated into the RXNBROWS software. The end-user provides a specific full reaction that is of interest to them. The automatic search then performs a series of reaction searches, each one with less specificity, until a relevant example is retrieved from the database. The search tries to do an exact match first, then the same transformation (narrow, medium, broad), then a reaction substructure search, and failing that, a similarity search (ranked from 100... 0). This process of searching emulates the thought process of an experienced synthetic chemist.


Comments to:
Susanne J. Redalje
Chemistry Division
(206)543-2070(voice)
(206)543-3863(fax)
curie@u.washington.edu

Copyright©2001 SLA All Rights Reserved
This page updated August 2001