Java source code, version 2.2, 2004.04.05
Description
A program to find molecular substructures and discriminative fragments in a database of molecule descriptions. The algorithm is based on the Eclat algorithm for frequent item set mining.
Call the program without any arguments to get a list of options. See the shell script run (included in the source package) for an example of how to invoke the program. The example input files made available above (also contained in the data directory in the source package) show the input format.
Sister page with some more explanations and a worked example at the ALTANA Chair of Applied Computer Science (M.R. Berthold) of the University of Konstanz.
This program was developed in cooperation with Tripos, Inc., Data Analysis Research Lab, South San Francisco, CA, USA.
Details about the application and the algorithm can be found in these papers:
- Mining Molecular Fragments: Finding Relevant Substructures of Molecules
Christian Borgelt and Michael R. Berthold
IEEE International Conference on Data Mining (ICDM 2002, Maebashi, Japan), 51-58
IEEE Press, Piscataway, NJ, USA 2002
icdm_02.pdf (112 kb) icdm_02.ps.gz (69 kb) (8 pages) - Large Scale Mining of Molecular Fragments with Wildcards
Heiko Hofer, Christian Borgelt, and Michael Berthold.
Proc. 5th International Symposium on Intelligent Data Analysis (IDA 2003, Berlin, Germany), 380-389.
Springer-Verlag, Heidelberg, Germany 2003
ida_03.pdf (187 kb) ida_03.ps.gz (125 kb) (10 pages) - Finding Discriminative Molecular Fragments
Christian Borgelt, Heiko Hofer, and Michael Berthold
Workshop Information Mining - Navigating Large Heterogeneous Spaces of Multimedia Information
German Conference on Artificial Intelligence, Hamburg, Germany 2003
wsim_03.pdf (303 kb) wsim_03.ps.gz (143 kb) (13 pages)