University of Tübingen Homepage Lehrstuhl Kognitive Systeme, Prof Dr. Zell
print version HomeCognitive Systems Research Projects SOL COFEA
About Our Department
Research Projects
Memetic Algorithms
EA for Drug Design
Model-Based Online-Optimization
Mobile Service Robots
Visual Object and Face Tracking
Particle Filters
Visual Localization
Cooperating Mobile Robots
Indoor 3D Modelling
Outdoor Robot
Simulation of complex biological systems
Infering Regulatory Networks
Modeling Gene Expression
GO Cluster
Optimizing Metabolic Pathways
Automated Crystallization
String Learning
Parallel Sequence Analysis
Terminated projects
Diploma Theses
External Links
Internal Pages
Computer Science Dept.
University of Tübingen

COFEA: High speed searching methods using the Compressed Feature Matrix

The Compressed Feature Matrix (CFM) is a feature based molecular descriptor enabling fast adaptive similarity search, pharmacophore development and substructure search. Within the CFM descriptor a feature-vector contains the biochemical or physicochemical features that occur in the described molecule. The assignment of the structural patterns to feature types may be determined by the user. The second part of the descriptor is a distance matrix which correlates the comprised features. Depending on the particular purpose, the matrix may either be generated from topological or Euclidean molecular data, permitting both a two- and a three-dimensional encoding of the molecule.

Similarity search

In contrast to the common distance matrix, the CFM is based on features instead of atoms. Each kind of these features may be weighted separately, depending on its (estimated) contribution to the biological effect of the molecule. Therefore, the CFM allows to adapt similarity evaluation to particular ligand sets as well as to classification requirements. As a result, the CFM permits to focus on characteristic small parts of molecules - which are independent of the molecular scaffold - to serve as a basis for the calculation of similarity. Hence, the CFM is not only suitable for common similarity evaluation but also for techniques like lead or scaffold hopping.

Similarity search characteristics

  • The CFM-based similarity search may be performed via common molecule vs. molecule comparisons or by using a pharmacophore-model as the target structure.
  • COFEA provides two different ways of calculating pharmacophores, each of them being capable of 2D and 3D evaluations.
  • The average search speed is around 1200 compunds/second, i.e. 1,000,000 compounds in less than 15 minutes.
  • The CFM-based similarity search is suitable for interactive use even for large data bases.

Substructure search

While common substructure descriptors merely allow a screening for predefined patterns, the CFM permits a real substructure/ subgraph search, presuming that all desired elements of the query substructure are described by the selected feature set. Compared to graph-based searching methods, the CFM based matrix algorithm turned out to be up to several hundred times faster. Using the CFM as a basis for a basic substructure screening, the search speed is even accelerated by three orders of magnitude. Thus, the CFM based substructure search complies with the requirements of an interactive use, even for the evaluation of several 100,000 compounds.

Substructure search characteristics

  • Since the feature-set may be determined by the user, the search results may be adapted to particular requirements, e.g. concerning a certain biological effect.
  • Using a feature-set that is suitable for common pharmaceutical problems, the average search speed is between 30,000 and 100,000 compounds/second, i.e. 1,000,000 compounds in between 10 and 30 seconds.
  • COFEA permits the preclusion of compounds with unsuitable feature-composition.
  • Therefore, search speed may even be decreased up to 250,000 compounds/second, i.e. 1,000,000 compounds in about 4 seconds.


Badreddin Abolmaali, Tel.: +49 7071 29-78979, abolmaali at

Last changes: 19.03.2018, 18:46 CET. RA-Webmaster.
© 2001-2005 University of Tübingen