Parallel Sequence Algorithms for PC-Clusters

In modern molecular biology, computers are used for many different tasks. Probably the most frequent application is searching for similarities in DNA or amino-acid sequences. Classical tools like FASTA and BLAST provide for fast and efficient search in large databases of annotated sequences when the query sequence is known. The search problem becomes much more complicated if the query cannot be precisely defined, but only described in fuzzy terms, and if the search is to be performed in not annotated and possibly non-coding ranges of a DNA sequence. Due to fuzziness of the query, efficient heuristics for string matching cannot be directly used and the computation effort is much higher.

Kepler Cluster. Click to enlarge In order to be useful in practice, the response time must be within seconds or, in the worst case, minutes, to be acceptable for users. By intelligently distributing the computation over a number of machines, the response time can be sufficiently reduced. In this project, parallel sequence analysis algorithms are developed and combined with advanced heuristics, like neural networks and evolutionary algorithms for non-deterministic high-probability pattern matching. The algorithms are planned to run as a service on Kepler (depicted on the above photo), a highly parallel cluster (98 Dual Pentium III PCs nodes with a Myrinet interconnect), located at the University of Tübingen. Envisioned applications are differential analysis of staphylococci genome and molecular phylogenic of development-relevant transcription factors.

Additional resources

Java Webstart Prototyp (requires Java 1.4 or higher)
Documentation (PDF file, in German)

GUI screenshot
A screenshot of the GUI for sequence analysis

Project Members


Igor Fischer, Phone: +49 7071 29-77176,