The technological advances of the last decade have led to numerous studies focusing on genome-wide transcriptomic changes. This is also reflected in a great number of publications that compare the gene expression levels of healthy and other samples. However, the detailed molecular mechanisms that cause these changes remain largely unknown. The different types of cancer are popular examples, for which altered transcriptomic expression profiles are known for a long time, and researchers nowadays focus on identifying causes and treatments against these alterations. Epigenetic effects, such as DNA methylation, have been identified as associated to most types of cancer. The integration of both omics layers---transcriptomic and epigenomic---allowed for explaining varied gene expression levels for some genes. Despite these promising results, automated methods for an integration of multiple omics datasets are still in an early stage of development. More omics layers need to be analyzed integratively and more approaches need to be developed that help researchers finding causes for varied expression levels. For this purpose, systems biology methods are required that interpret biological coherences as an overall system and help to identify complex interactions. However, the biochemical networks that underly those systems must be reconstructed before most systems biology methods can be applied or interactions can be identified. Thus, approaches for the reconstruction of biochemical networks and novel methods, which help identifying complex coherences between different omics layers are firstly described in this thesis. It thereby focuses on the inference of biochemical networks and their subsequent integration with multilayered omics data. A widely-used source for biochemical networks are biological pathways. Unfortunately, they are mostly stored as inaccurate and incomplete qualitative descriptions in proprietary formats and thus, cannot be used directly with systems biology methods. This thesis presents a method that automatically fixes those issues and translates pathways to well-defined computational models. As a result of this work, 142,050 revised models of metabolic and non-metabolic pathways have been generated and put into the popular BioModels repository for the systems biology modeling language (SBML). The ModuleMaster method is a second approach for the generation of biochemical networks. More specifically, transcriptional regulatory networks are inferred from combinations of experimental gene expression data and a priori knowledge. To this end, gene expression data is clustered and regulatory transcription factors are determined. The inferred interactions between both span the final network. The established biochemical networks are used for a joint visualization of multilayered omics data. For this purpose, it is discussed how different omics layers can be integrated into a single dataset. On this basis, a methodology has been developed that integratively depicts messenger RNA, microRNA, DNA methylation and protein data in a single network. Besides this integration technique, several further integrated omics data analysis methods are presented in this thesis. The concept of gene-set enrichment analysis is extended to multilayered omics data and methods that ease a joint inspection of these high-dimensional datasets are presented. Finally, a machine learning approach is described that links epigenetic profiles to genetic features. This is used to discover the mechanisms and features that induce changes in the epigenome.
@phdthesis{Wrzodek2013PhD, author = {Wrzodek, Clemens}, title = {Inference and integration of biochemical networks with multilayered omics data}, school = {University of Tuebingen}, year = {2013}, address = {T\"ubingen, Germany}, month = jun, abstract = { The technological advances of the last decade have led to numerous studies focusing on genome-wide transcriptomic changes. This is also reflected in a great number of publications that compare the gene expression levels of healthy and other samples. However, the detailed molecular mechanisms that cause these changes remain largely unknown. The different types of cancer are popular examples, for which altered transcriptomic expression profiles are known for a long time, and researchers nowadays focus on identifying causes and treatments against these alterations. Epigenetic effects, such as DNA methylation, have been identified as associated to most types of cancer. The integration of both omics layers---transcriptomic and epigenomic---allowed for explaining varied gene expression levels for some genes. Despite these promising results, automated methods for an integration of multiple omics datasets are still in an early stage of development. More omics layers need to be analyzed integratively and more approaches need to be developed that help researchers finding causes for varied expression levels. For this purpose, systems biology methods are required that interpret biological coherences as an overall system and help to identify complex interactions. However, the biochemical networks that underly those systems must be reconstructed before most systems biology methods can be applied or interactions can be identified. Thus, approaches for the reconstruction of biochemical networks and novel methods, which help identifying complex coherences between different omics layers are firstly described in this thesis. It thereby focuses on the inference of biochemical networks and their subsequent integration with multilayered omics data. A widely-used source for biochemical networks are biological pathways. Unfortunately, they are mostly stored as inaccurate and incomplete qualitative descriptions in proprietary formats and thus, cannot be used directly with systems biology methods. This thesis presents a method that automatically fixes those issues and translates pathways to well-defined computational models. As a result of this work, 142,050 revised models of metabolic and non-metabolic pathways have been generated and put into the popular BioModels repository for the systems biology modeling language (SBML). The ModuleMaster method is a second approach for the generation of biochemical networks. More specifically, transcriptional regulatory networks are inferred from combinations of experimental gene expression data and a priori knowledge. To this end, gene expression data is clustered and regulatory transcription factors are determined. The inferred interactions between both span the final network. The established biochemical networks are used for a joint visualization of multilayered omics data. For this purpose, it is discussed how different omics layers can be integrated into a single dataset. On this basis, a methodology has been developed that integratively depicts messenger RNA, microRNA, DNA methylation and protein data in a single network. Besides this integration technique, several further integrated omics data analysis methods are presented in this thesis. The concept of gene-set enrichment analysis is extended to multilayered omics data and methods that ease a joint inspection of these high-dimensional datasets are presented. Finally, a machine learning approach is described that links epigenetic profiles to genetic features. This is used to discover the mechanisms and features that induce changes in the epigenome.}, isbn = {978-3-8439-1116-0}, keywords = {Integrator, InCroMAP, integration of omics data, machine learning, omics data, JSBML, MARCAR, metabolic modeling, gene-regulatory network, ModuleMaster, path2models, SBML, BioPAX, KEGG, KEGGtranslator, CpG island}, InCroMAP, HepatoSys, MARCAR, NGFN-II, Spher4Sys}, publisher = {Verlag Dr.~Hut, Sternstra{\ss}e 18, M\"unchen}, url = {http://www.dr.hut-verlag.de/978-3-8439-1116-0.html} } @article{Wrzodek2013a, author = {Wrzodek, Clemens and B\"uchel, Finja and Ruff, Manuel and Dr\"ager, Andreas and Zell, Andreas}, title = {Precise generation of systems biology models from {KEGG} pathways.}, journal = {BMC Systems Biology}, year = {2013}, volume = {7}, pages = {15}, number = {1}, month = jan, abstract = {Background: The KEGG PATHWAY database provides a plethora of pathways for a diversity of organisms. All pathway components are directly linked to other KEGG databases, such as KEGG COMPOUND or KEGG REACTION. Therefore, the pathways can be extended with an enormous amount of information and provide a foundation for initial structural modeling approaches. As a drawback, KGML-formatted KEGG pathways are primarily designed for visualization purposes and often omit important details for the sake of a clear arrangement of its entries. Thus, a direct conversion into systems biology models would produce incomplete and erroneous models. Results: Here, we present a precise method for processing and converting KEGG pathways into initial metabolic and signaling models encoded in the standardized community pathway formats SBML (Levels 2 and 3) and BioPAX (Levels 2 and 3). This method involves correcting invalid or incomplete KGML content, creating complete and valid stoichiometric reactions, translating relations to signaling models and augmenting the pathway content with various information, such as cross-references to Entrez Gene, OMIM, UniProt ChEBI, and many more. Finally, we compare several existing conversion tools for KEGG pathways and show that the conversion from KEGG to BioPAX does not involve a loss of information, whilst lossless translations to SBML can only be performed using SBML Level 3, including its recently proposed qualitative models and groups extension packages. Conclusions: Building correct BioPAX and SBML signaling models from the KEGG database is a unique characteristic of the proposed method. Further, there is no other approach that is able to appropriately construct metabolic models from KEGG pathways, including correct reactions with stoichiometry. The resulting initial models, which contain valid and comprehensive SBML or BioPAX code and a multitude of cross-references, lay the foundation to facilitate further modeling steps.}, doi = {10.1186/1752-0509-7-15}, issn = {1752-0509}, keywords = {KEGG, KGML, SBML, BioPAX, modeling, systems biology, qualitative modeling, quantitative modeling, converter, comparison}, pdf = {http://www.biomedcentral.com/content/pdf/1752-0509-7-15.pdf}, url = {http://www.biomedcentral.com/1752-0509/7/15} }