ChemDig: new approaches to chemically significant indexing and searching of distributed web collections

Georgios V Gkoutos, Christopher Leach, Henry S Rzepa

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)


We describe an extension of the ht://Dig robot-based internet indexing and search engine to include the retrieval of information included in a variety of molecular data formats as defined by chemical MIME types. This is achieved by invoking chemical meta-parsers, software agents designed to provide key meta-data information about the content of the external chemical files. This meta-data can include, for example, derived molecular formula, molecular mass and atom connection table (SMILES) where the content of the file allows this, and other types of content such as author information and supplied keywords. These terms can be automatically added to the searchable terms, and the search outputs can be automatically linked via database requests to other external databases containing chemical information. We report our experience in applying this robot to indexing five different remote sites. We discuss different mechanisms for storing and searching for the chemical content, ranging from simple keyword-based searches qualified by chemically significant boolean terms, chemical similarity searches and our experiments in creating more highly structured content that expresses the chemical data using XML-based markup and where XSLT transforms for filtering, searching and rendering the information are used.
Original languageEnglish
Pages (from-to)656-666
Number of pages11
JournalNew Journal of Chemistry
Issue number5
Publication statusPublished - 2002


  • search engine ht://Dig chemical MIME types metadat


Dive into the research topics of 'ChemDig: new approaches to chemically significant indexing and searching of distributed web collections'. Together they form a unique fingerprint.

Cite this