Schröder, Adrian and Eichner, Johannes and Supper, Jochen and Eichner, Jonas and Wanke, Dierk and Henneges, Carsten and Zell, Andreas

Predicting DNA-Binding Specificities of Eukaryotic Transcription Factors

PLoS ONE vol. 5 (2010), no. 11, Public Library of Science, pp. e13876

Abstract

Today, annotated amino acid sequences of more and more transcription factors (TFs) are readily available. Quantitative information about their DNA-binding specificities, however, are hard to obtain. Position frequency matrices (PFMs), the most widely used models to represent binding specificities, are experimentally characterized only for a small fraction of all TFs. Even for some of the most intensively studied eukaryotic organisms (i.e., human, rat and mouse), roughly one-sixth of all proteins with annotated DNA-binding domain have been characterized experimentally. Here, we present a new method based on support vector regression for predicting quantitative DNA-binding specificities of TFs in different eukaryotic species. This approach estimates a quantitative measure for the PFM similarity of two proteins, based on various features derived from their protein sequences. The method is trained and tested on a dataset containing 1 239 TFs with known DNA-binding specificity, and used to predict specific DNA target motifs for 645 TFs with high accuracy.

Downloads and Links

[doi] [pdf] [pdf]

BibTeX

@article{Schroeder2010,
  author = {Schr\"oder, Adrian and Eichner, Johannes and Supper, Jochen and Eichner,
	Jonas and Wanke, Dierk and Henneges, Carsten and Zell, Andreas},
  title = {{Predicting DNA-Binding Specificities of Eukaryotic Transcription
	Factors}},
  journal = {PLoS ONE},
  publisher = {Public Library of Science},
  year = {2010},
  volume = {5},
  pages = {e13876},
  number = {11},
  month = nov,
  abstract = {Today, annotated amino acid sequences of more and more transcription
	factors (TFs) are readily available. Quantitative information about
	their DNA-binding specificities, however, are hard to obtain. Position
	frequency matrices (PFMs), the most widely used models to represent
	binding specificities, are experimentally characterized only for
	a small fraction of all TFs. Even for some of the most intensively
	studied eukaryotic organisms (i.e., human, rat and mouse), roughly
	one-sixth of all proteins with annotated DNA-binding domain have
	been characterized experimentally. Here, we present a new method
	based on support vector regression for predicting quantitative DNA-binding
	specificities of TFs in different eukaryotic species. This approach
	estimates a quantitative measure for the PFM similarity of two proteins,
	based on various features derived from their protein sequences. The
	method is trained and tested on a dataset containing 1 239 TFs with
	known DNA-binding specificity, and used to predict specific DNA target
	motifs for 645 TFs with high accuracy.},
  doi = {10.1371/journal.pone.0013876},
  pdf = {http://www.cogsys.cs.uni-tuebingen.de/publikationen/2010/schroederTFBSPrediction.pdf},
  publisher = {Public Library of Science},
  url = {http://dx.doi.org/10.1371%2Fjournal.pone.0013876}
}