Abstract
We report a set of tools to be used in conjunction with a robot-based Internet indexing engine which can be used to convert non-conforming HTML collections to well-formed and valid XHTML documents. The tools, inter alia, can correct invalid syntax which can occur in embedded RasMol scripts and extract chemical meta-information from normally inaccessible document components, including transcluded chemical files. The index that can be built from the transformed documents can be used to improve the quality of searches carried out in a chemical context.
Original language | English |
---|---|
Pages (from-to) | 635-638 |
Number of pages | 4 |
Journal | New Journal of Chemistry |
Volume | 25 |
Issue number | 4 |
DOIs | |
Publication status | Published - 2001 |