Strauch, Martin and Supper, Jochen and Spieth, Christian and Wanke, Dierk and Kilian, Joachim and Harter, Klaus and Zell, Andreas

A Two-Step Clustering for 3-D Gene Expression Data Reveals the Main Features of the Arabidopsis Stress Response

Journal of Integrative Bioinformatics vol. 4 (2007), no. 1


Abstract

We developed an integrative approach for discovering gene modules, i.e. genes that are tightly correlated under several experimental conditions and applied it to a threedimensional Arabidopsis thaliana microarray dataset. The dataset consists of approximately 23000 genes responding to 9 abiotic stress conditions at 6-9 different points in time. Our approach aims at finding relatively small and dense modules lending themselves to a specific biological interpretation. In order to detect gene modules within this dataset, we employ a two-step clustering process. In the first step, a k-means clustering on one condition is performed, which is subsequently used in the second step as a seed for the clustering of the remaining conditions. To validate the significance of the obtained modules, we performed a permutation analysis and determined a null hypothesis to compare the module scores against, providing a p-value for each module. Significant modules were mapped to the Gene Ontology (GO) in order to determine the participating biological processes. As a result, we isolated modules showing high significance with respect to the p-values obtained by permutation analysis and GO mapping. In these modules we identified a number of genes that are either part of a general stress response with similar characteristics under different conditions (coherent modules), or part of a more specific stress response to a single stress condition (single response modules). We also found genes clustering within several conditions, which are, however, not part of a coherent module. These genes have a distinct temporal response under each condition. We call the modules they are contained in individual response modules (IR).


Downloads and Links

[doi] [pdf]


BibTeX

@article{2007_38,
  author = {Strauch, Martin and Supper, Jochen and Spieth, Christian and Wanke,
	Dierk and Kilian, Joachim and Harter, Klaus and Zell, Andreas},
  title = {A Two-Step Clustering for 3-D Gene Expression Data Reveals the Main
	Features of the Arabidopsis Stress Response},
  journal = {Journal of Integrative Bioinformatics},
  year = {2007},
  volume = {4},
  number = {1},
  abstract = {We developed an integrative approach for discovering gene modules,
	i.e. genes that are tightly correlated under several experimental
	conditions and applied it to a threedimensional Arabidopsis thaliana
	microarray dataset. The dataset consists of approximately 23000 genes
	responding to 9 abiotic stress conditions at 6-9 different points
	in time. Our approach aims at finding relatively small and dense
	modules lending themselves to a specific biological interpretation.
	In order to detect gene modules within this dataset, we employ a
	two-step clustering process. In the first step, a k-means clustering
	on one condition is performed, which is subsequently used in the
	second step as a seed for the clustering of the remaining conditions.
	To validate the significance of the obtained modules, we performed
	a permutation analysis and determined a null hypothesis to compare
	the module scores against, providing a p-value for each module. Significant
	modules were mapped to the Gene Ontology (GO) in order to determine
	the participating biological processes. As a result, we isolated
	modules showing high significance with respect to the p-values obtained
	by permutation analysis and GO mapping. In these modules we identified
	a number of genes that are either part of a general stress response
	with similar characteristics under different conditions (coherent
	modules), or part of a more specific stress response to a single
	stress condition (single response modules). We also found genes clustering
	within several conditions, which are, however, not part of a coherent
	module. These genes have a distinct temporal response under each
	condition. We call the modules they are contained in individual response
	modules (IR).},
  doi = {10.2390/biecoll-jib-2007-54},
  url = {http://journal.imbio.de/articles/pdf/jib-54.pdf}
}