Wrzodek, Clemens

Inference and integration of biochemical networks with multilayered omics data

Ph.D. thesis, University of Tuebingen, Verlag Dr. Hut, Sternstraße 18, München, Tübingen, Germany, 2013


Abstract

The technological advances of the last decade have led to numerous studies focusing on genome-wide transcriptomic changes. This is also reflected in a great number of publications that compare the gene expression levels of healthy and other samples. However, the detailed molecular mechanisms that cause these changes remain largely unknown. The different types of cancer are popular examples, for which altered transcriptomic expression profiles are known for a long time, and researchers nowadays focus on identifying causes and treatments against these alterations. Epigenetic effects, such as DNA methylation, have been identified as associated to most types of cancer. The integration of both omics layers---transcriptomic and epigenomic---allowed for explaining varied gene expression levels for some genes. Despite these promising results, automated methods for an integration of multiple omics datasets are still in an early stage of development. More omics layers need to be analyzed integratively and more approaches need to be developed that help researchers finding causes for varied expression levels. For this purpose, systems biology methods are required that interpret biological coherences as an overall system and help to identify complex interactions. However, the biochemical networks that underly those systems must be reconstructed before most systems biology methods can be applied or interactions can be identified. Thus, approaches for the reconstruction of biochemical networks and novel methods, which help identifying complex coherences between different omics layers are firstly described in this thesis. It thereby focuses on the inference of biochemical networks and their subsequent integration with multilayered omics data. A widely-used source for biochemical networks are biological pathways. Unfortunately, they are mostly stored as inaccurate and incomplete qualitative descriptions in proprietary formats and thus, cannot be used directly with systems biology methods. This thesis presents a method that automatically fixes those issues and translates pathways to well-defined computational models. As a result of this work, 142,050 revised models of metabolic and non-metabolic pathways have been generated and put into the popular BioModels repository for the systems biology modeling language (SBML). The ModuleMaster method is a second approach for the generation of biochemical networks. More specifically, transcriptional regulatory networks are inferred from combinations of experimental gene expression data and a priori knowledge. To this end, gene expression data is clustered and regulatory transcription factors are determined. The inferred interactions between both span the final network. The established biochemical networks are used for a joint visualization of multilayered omics data. For this purpose, it is discussed how different omics layers can be integrated into a single dataset. On this basis, a methodology has been developed that integratively depicts messenger RNA, microRNA, DNA methylation and protein data in a single network. Besides this integration technique, several further integrated omics data analysis methods are presented in this thesis. The concept of gene-set enrichment analysis is extended to multilayered omics data and methods that ease a joint inspection of these high-dimensional datasets are presented. Finally, a machine learning approach is described that links epigenetic profiles to genetic features. This is used to discover the mechanisms and features that induce changes in the epigenome.


Downloads and Links

[pdf]


BibTeX

@phdthesis{Wrzodek2013PhD,
  author = {Wrzodek, Clemens},
  title = {Inference and integration of biochemical networks with multilayered
	omics data},
  school = {University of Tuebingen},
  year = {2013},
  address = {T\"ubingen, Germany},
  month = jun,
  abstract = { The technological advances of the last decade have led to numerous
	studies focusing on genome-wide transcriptomic changes. This is also
	reflected in a great number of publications that compare the gene
	expression levels of healthy and other samples. However, the detailed
	molecular mechanisms that cause these changes remain largely unknown.
	The different types of cancer are popular examples, for which altered
	transcriptomic expression profiles are known for a long time, and
	researchers nowadays focus on identifying causes and treatments against
	these alterations. Epigenetic effects, such as DNA methylation, have
	been identified as associated to most types of cancer. The integration
	of both omics layers---transcriptomic and epigenomic---allowed for
	explaining varied gene expression levels for some genes. Despite
	these promising results, automated methods for an integration of
	multiple omics datasets are still in an early stage of development.
	More omics layers need to be analyzed integratively and more approaches
	need to be developed that help researchers finding causes for varied
	expression levels. For this purpose, systems biology methods are
	required that interpret biological coherences as an overall system
	and help to identify complex interactions. However, the biochemical
	networks that underly those systems must be reconstructed before
	most systems biology methods can be applied or interactions can be
	identified. Thus, approaches for the reconstruction of biochemical
	networks and novel methods, which help identifying complex coherences
	between different omics layers are firstly described in this thesis.
	It thereby focuses on the inference of biochemical networks and their
	subsequent integration with multilayered omics data. A widely-used
	source for biochemical networks are biological pathways. Unfortunately,
	they are mostly stored as inaccurate and incomplete qualitative descriptions
	in proprietary formats and thus, cannot be used directly with systems
	biology methods. This thesis presents a method that automatically
	fixes those issues and translates pathways to well-defined computational
	models. As a result of this work, 142,050 revised models of metabolic
	and non-metabolic pathways have been generated and put into the popular
	BioModels repository for the systems biology modeling language (SBML).
	The ModuleMaster method is a second approach for the generation of
	biochemical networks. More specifically, transcriptional regulatory
	networks are inferred from combinations of experimental gene expression
	data and a priori knowledge. To this end, gene expression data is
	clustered and regulatory transcription factors are determined. The
	inferred interactions between both span the final network. The established
	biochemical networks are used for a joint visualization of multilayered
	omics data. For this purpose, it is discussed how different omics
	layers can be integrated into a single dataset. On this basis, a
	methodology has been developed that integratively depicts messenger
	RNA, microRNA, DNA methylation and protein data in a single network.
	Besides this integration technique, several further integrated omics
	data analysis methods are presented in this thesis. The concept of
	gene-set enrichment analysis is extended to multilayered omics data
	and methods that ease a joint inspection of these high-dimensional
	datasets are presented. Finally, a machine learning approach is described
	that links epigenetic profiles to genetic features. This is used
	to discover the mechanisms and features that induce changes in the
	epigenome.},
  isbn = {978-3-8439-1116-0},
  keywords = {Integrator, InCroMAP, integration of omics data, machine learning,
	omics data, JSBML, MARCAR, metabolic modeling, gene-regulatory network,
	ModuleMaster, path2models, SBML, BioPAX, KEGG, KEGGtranslator, CpG
	island},
	InCroMAP, HepatoSys, MARCAR, NGFN-II, Spher4Sys},
  publisher = {Verlag Dr.~Hut, Sternstra{\ss}e 18, M\"unchen},
  url = {http://www.dr.hut-verlag.de/978-3-8439-1116-0.html}
}