Functional Clustering of Genes Using the Gene Ontology


With the invention of high throughput methods, researchers are capable of producing large amounts of biological data. In many cases, during the analysis of such data, biologists end up with long lists of differentially expressed, up- or down-regulated or co-expressed genes that need to be further charaterized. Then, the need for a functional grouping of genes arises. The aim of this project is to develop methods that produce a functional categorization of genes based on biological annotation.


Our method can be split in two parts. First we calculate functional distances or similarities for the genes of the dataset. Then, we apply standard or specialized clustering methods to group the genes by function.

The Gene Ontology

The Gene Ontology (GO) is one of the most important ontologies within the bioinformatics community and is developed by the GO Consortium. It is specifically intended for annotating gene products with a consistent, controlled and structured vocabulary. Gene products are for instance sequences in databases as well as measured expression profiles. The GO represents terms in a directed acyclic graph (DAG) covering three orthogonal taxonomies or aspects: molecular function, biological process, and cellular component. The GO graph consists of over 18.000 terms represented as nodes, which are connected by relationships represented as edges. Terms are allowed to have multiple parents as well as multiple children. Two different kinds of relationship exist: the is-a relationship and the part-of relationship.


For the functional clustering of genes, it was nescessary to develop functional similarity or distance measures. All these measures are based on the Gene Ontology. For GO terms, the following measures were implemented: Resnik-Similarity, Lin-Similarity, Jiang-Distance. For genes, we developed:
  • Feature vector based distances
  • Reduced Feature vector based distances
  • Optimal Assignment based kernels

Clustering Algorithms

Several different cluster algorithms were implemented or especially developed for this purpose:
  • Spectral Clustering
  • k-means
  • Hierarchical Clustering (Average Linkage, Single Linkage, Complete Linkage Ward Linkage)
  • Cluster Algorithms based on Memetic Algorithms and Minimum Spanning trees


This project is funded as part of an explorative project by:

NGFN bmb+f


Nora Speer, Tel.: (07071) 29-78987, nspeer at