Tải bản đầy đủ - 0 (trang)
E-Cyanobacterium.org: A Web-Based Platform for Systems Biology of Cyanobacteria

E-Cyanobacterium.org: A Web-Based Platform for Systems Biology of Cyanobacteria

Tải bản đầy đủ - 0trang

E-Cyanobacterium.org



317



models traversing already through many branches of biology [3,12,17,19].

Unfortunately, cyanobacteria models are strongly under-represented in these

repositories, probably because of the natural differences that exist between common bacteria and phototrophic bacteria. There exist several online tools presenting biological networks or genome knowledge [1,4,12]. However, due to enormous

complexity of biological processes, it is a challenge to develop tools presenting

biological networks alongside with executable or mathematical models. Focusing

on domain-specific organisms allows us to integrate the knowledge and present it

in a concise and understandable form. This has been already proven on examples

of well-known model organisms such as E. coli [14] or C. elegans [7]. Nevertheless, these resources do not couple the presented knowledge with modelling.

In this tool paper, we present an online platform for cyanobacteria processes —

e-cyanobacterium.org 2 . The platform integrates several dedicated tools and is

distinct in the following aspects: formal rule-based representation of biochemical interactions facilitated by cyanobacteria biochemical entities; repository of

kinetic models providing basic analysis tools online (simulation, custom data sets,

basic static analysis); integration of models within the rule-based description and

export to SBML [10]; storage, maintenance and presentation of experimental data;

content visualisation (graphical presentation of models, biochemical space and

modelling/experimental data).

The presented release of e-cyanobacterium.org is implemented as an extension of our general database tool-kit — the so-called comprehensive modelling

platform [15]. Key updates are the formal rule-based language BCSL, wet-lab

experiments module, and improved analysis of models. Most importantly, several

well-annotated and curated models developed by the consortium are provided

within e-cyanobacterium.org including the formalisation of the respective part

of biochemical space in BCSL.



2



Web Platform Overview



The platform consists of several dedicated modules (Fig. 1) all connected to a

central module – Biochemical Space (BCS) [16] – that is the backbone of the

platform. BCS provides formal description of the biological problem and it is

based on the hierarchy of selected biological processes. It is accompanied with

schemas representing relevant biological processes in the context of cyanobacteria. For each process, there are presented relevant models, chemical entities, and

rules. Presentation of every process includes detailed information and links to

relevant internal and external sources.

2.1



Biochemical Space



Biochemical Space is constructed from hierarchy of entities interacting in rules

formally specified in Biochemical Space Language (BCSL). The main advantage

of BCSL is the adoption of the most important aspects of rule-based features

2



http://www.e-cyanobacterium.org.



318



M. Troj´

ak et al.



Fig. 1. Scheme of interconnections between modules of E-cyanobacterium.org platform.



while still keeping the syntax human-readable and accessible to communities outside computer science. On the one hand, BCSL has executable semantics that

allows basic analysis and consistency checking. On the other hand, the language

includes constructs for detailed biological annotation reflecting the known bioinformatics databases and ontologies. BCSL is developed with the consideration

of new extensions in SBML level 3. Once the SBML rule-based support becomes

to be actively used in tools, our platform will provide relevant export filters.

Rule description provides details of rules, explicit enumeration of substrates

and products, and available annotations. Moreover, the rule is schematically

visualised (Sect. 3.2) and presented with appropriate links to the process hierarchy.

Entity description provides information about associated models and rules.

Links to the process hierarchy and to external sources (UniProt [2], Kegg [12],

GeneOnthology [1], etc.) are available.

2.2



Model Repository



Model repository is a collection of implemented mathematical models describing particular parts of biological processes. Every model is represented as a set

of ordinary differential equations generated from the model reaction network.

Models are integrated within BCS. In particular, each model component should

be related to some BCS entity and each model reaction should be related to

some BCS rule. Moreover, a model is associated with some parameter value sets

(data sets) that enable simulation (Sect. 3.3) in a particular biologically-relevant

scenarios. Additionally, several basic non-parameter-specific static analysis techniques (Sect. 3.4) based on model stoichiometry are also provided.

An implemented model then includes complete biological annotation

(Sect. 3.1) of all components and reactions that is provided by mapping to

BCS. This might help to find connections and overlaps among models. Further,

implemented model can be exported to SBML (level 2).

Currently, the repository contains two models describing circadian clock

(Miyoshi et al. 2007 [18], Hertel et al. 2013 [8]) and a kinetic model of metabolism

(Jablonsky et al. 2014 [11]). Two other models present unpublished results

dynamics of carbon uxes (Mă

uller et al.) and photosynthesis (Plyusnina et al.).



E-Cyanobacterium.org



2.3



319



Experiments Repository



Experiments repository is a tool for storage and presentation of time-series data

from wet-lab experiments. Every experiment is well-grounded by precise description (device, medium, organism, etc.) and appropriate annotations. Experiments

are structured – several time series data can be attached to a single experiment.

Every time series targets a specific list of measured substances together with

time stamps of the individual measurements. Data can be imported/exported in

simple text formats. Time series are visualised in a chart (Sect. 3.3).

Registered users can add their own experiments while keeping the selected

experiments private or public. For repeated experiments, annotations and details

can be cloned from previously inserted experiments.



3



Website Features



E-cyanobacterium.org provides support for modelling and analysis of biological

systems. The most important fact is that all these parts are integrated within

the Biochemical Space. Therefore it is possible to reveal non-trivial relations

between biochemical substances and models.

3.1



Annotation



Annotation is an important task in modelling of biological systems. This is how

biological knowledge is mapped to the mathematical description. Our platform

considers annotation as an inherent and compulsory part of the modelling procedure. The following aspects of annotation are supported:

– BCS creation and maintenance,

– model annotation by mapping to BCS,

– experiment annotation.

BCS is being continuously extended and revised by the consortium.

Researchers supplying their models are required to integrate the models within

the current BCS. This gives good feedback to BCS maintainers. In the experiments repository module, emphasis is given to description of conditions under

which an experiment has been performed, which is important for interpretation

of the data as well as the possibility of reconstructing the measurements.

3.2



Visualisation



Static visualisation is provided by means of schemas showing the process hierarchy with most important objects from BCS. Graphical schemes are supplied

with detailed information on the visualised elements of BCS. This is achieved in

the information panel below the scheme. An accompanying feature is filtering

displayed data to All or Visible elements. The former option extends the content

to all relevant entities and rules that take the part in the displayed process but



320



M. Troj´

ak et al.



are not necessarily visualised (e.g., this applies to small “universal” molecules

such as ATP, ADP, etc.). Moreover, there is a special visualisation framework

for complex networks such as metabolism. This widget allows one to zoom in

a specific part of the complex network. This feature is currently employed for

metabolic map of cyanobacteria. Owing to requests coming out of the consortium, we have decided to employ manually handled visualisation. In next release,

we plan to provide automatically generated SBGN representation of the schemes.

Rule details (Sect. 2.1) are enriched by means of automatically generated

visualisation. For every rule, a graphical scheme that displays all its substrates

and products is generated.

3.3



Simulation



With every implemented model (Sect. 2.2) it is possible to execute simulations.

Registered users can change initial conditions and parameters of the model and

set the simulation options (the numerical solver and its parameters). To apply

such settings in the simulation, they have to be saved in terms of a custom

dataset (public or private). The simulation chart is generated for all available

datasets and the platform GUI allows one to switch between them. The chart is

interactive and allows one to change the axis type, zoom in the selected curves,

and show a focused value on a curve.

Simulation data are accessible by exporting them in several data formats.

Similar options are also available in charts visualising experimental data in the

experiments repository. Additionally, the simulated model including the selected

dataset can be downloaded as an Octave file [6]. Additionally, the platform

allows one to use different externally called numerical solvers for simulation. In

particular, model administrators can select either Octave (default), COPASI [9],

or remote simulation with COPASI web services [13].

3.4



Static Analysis



At the current stage, there are three static analysis tasks available. Matrix analysis produces incidence matrix, Conservation analysis produces mass conservation analysis (moiety conservation), and Modes analysis produces elementary

flux modes. For these tasks, the well-acclaimed third party tool COPASI [9] is

used which allows one to download the task data by means of an SBRML file [5].



4



Conclusion



E-cyanobacterium.org provides several features that contribute to production

and presentation of models targeting cyanobacteria. The most principal effort is

to interlink biological knowledge with benefits of computational systems biology

tools. This is enabled by means of a novel formal notation – Biochemical Space –

which allows us to integrate computational models with the biological knowledge

and wet-lab experiments.



E-Cyanobacterium.org



321



For future work, we plan to improve the mapping between mathematics and

biology, to enhance the website with more analysis tools (currently we are implementing an interface for existing property monitoring and robustness analysis

tools such as [20]), and to automatise the comparison of models against experimental data. Moreover, biochemical space of cyanobacteria is continuously being

extended and improved with interactive visualisations of reaction networks based

on formal description provided in Biochemical Space Language.



References

1. Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Gene

Ontology Consortium, Nat. Genet. 25(1), 25–29 (2000)

2. Bateman, A., et al.: Uniprot: a hub for protein information. Nucleic Acids Res.

43(D1), D204–D212 (2015)

3. Beard, D.A., et al.: CellML metadata standards, associated tools and repositories.

Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 367(1895), 1845–1867 (2009)

4. Croft, D., et al.: The reactome pathway knowledgebase. Nucleic Acids Res. 42(D1),

D472–D477 (2014)

5. Dada, J.O., et al.: SBRML: a markup language for associating systems biology

data with models. Bioinformatics 26(7), 932–938 (2010)

6. Eaton, J.W., et al.: GNU Octave version 3.0.1 manual: a high-level interactive language for numerical computations. CreateSpace Independent Publishing Platform,

Seattle (2009)

7. Harris, T.W., et al.: Wormbase: a comprehensive resource for nematode research.

Nucleic Acids Res. 38(suppl 1), D463–D467 (2010)

8. Hertel, S., et al.: Revealing a two-loop transcriptional feedback mechanism in the

cyanobacterial circadian clock. PLoS Comput. Biol. 9(3), 1–16 (2013)

9. Hoops, S., et al.: COPASI - a complex pathway simulator. Bioinformatics 22(24),

3067–3074 (2006)

10. Hucka, M., et al.: The systems biology markup language (SBML): a medium for

representation and exchange of biochemical network models. Bioinformatics 19(4),

524–531 (2003)

11. Jablonsky, J., et al.: Multi-level kinetic model explaining diverse roles of isozymes

in prokaryotes. PLoS ONE 9(8), 1–8 (2014)

12. Kanehisa, M., et al.: KEGG as a reference resource for gene and protein annotation.

Nucleic Acids Res. 44(D1), D457–D462 (2016)

13. Kent, E., et al.: Condor-COPASI: high-throughput computing for biochemical networks. BMC Syst. Biol. 6(1), 1–13 (2012)

14. Keseler, I.M., et al.: EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41(D1), D605–D612 (2013)

15. Klement, M., et al.: A comprehensive web-based platform for domain-specific biological models. Electron. Notes Theoret. Comput. Sci. 299, 61–67 (2013)

16. Klement, M., et al.: Biochemical space: a framework for systemic annotation of

biological models. In: Proceedings of the 5th International Workshop on Interactions Between Computer Science and Biology (CS2Bio-14). Electronic Notes in

Theoretical Computer Science, vol. 306, pp. 31–44 (2014)

17. Le Nov`ere, N., et al.: Biomodels database: a free, centralized database of curated,

published, quantitative kinetic models of biochemical and cellular systems. Nucleic

Acids Res. 34, D689–D691 (2006)



322



M. Troj´

ak et al.



18. Miyoshi, F., et al.: A mathematical model for the kai-protein-based chemical oscillator and clock gene expression rhythms in cyanobacteria. J. Biol. Rhythms 22(1),

69–80 (2007)

19. Olivier, B.G., et al.: Web-based kinetic modelling using jws online. Bioinformatics

20(13), 2143–2144 (2004)

20. Rizk, A., et al.: A general computational method for robustness analysis with

applications to synthetic gene networks. Bioinformatics 25(12), i169–i178 (2009)



PREMER: Parallel Reverse Engineering

of Biological Networks with Information Theory

Alejandro F. Villaverde1,2,4(B) , Kolja Becker3 , and Julio R. Banga4

1



4



Department of Systems and Control Engineering,

University of Vigo, Galicia, Spain

2

Centre for Biological Engineering, University of Minho,

Braga, Portugal

3

Modelling of Biological Networks, Institute of Molecular Biology GGmbH,

Mainz, Germany

Bioprocess Engineering Group, Instituto de Investigaci´

ons Mari˜

nas (IIM-CSIC),

Vigo, Galicia, Spain

afvillaverde@iim.csic.es



Abstract. A common approach for reverse engineering biological networks from data is to deduce the existence of interactions among nodes

from information theoretic measures. Estimating these quantities in a

multidimensional space is computationally demanding for large datasets.

This hampers the application of elaborate algorithms – which are crucial

for discarding spurious interactions and determining causal relationships –

to large-scale network inference problems. To alleviate this issue we

have developed PREMER, a software tool which can automatically run

in parallel and sequential environments, thanks to its implementation

of OpenMP directives. It recovers network topology and estimates the

strength and causality of interactions using information theoretic criteria,

and allowing the incorporation of prior knowledge. A preprocessing module takes care of imputing missing data and correcting outliers if needed.

PREMER (https://sites.google.com/site/premertoolbox/) runs on Windows, Linux and OSX, it is implemented in Matlab/Octave and Fortran

90, and it does not require any commercial software.

Keywords: Network

computing



1



inference



·



Information



theory



·



Parallel



Introduction



Many biological systems can be meaningfully represented as networks, that is,

as a set of nodes (variables) connected by links (interactions). In the context of

cellular networks the nodes are molecular entities such as genes, transcription

factors, proteins, metabolites, and so on [7]. The network inference problem consists of learning the interconnection structure of the nodes, using as data the

values of the variables (e.g. their expression levels or concentrations) at different

situations and/or time instants. The concept of mutual information [12] can be

c Springer International Publishing AG 2016

E. Bartocci et al. (Eds.): CMSB 2016, LNBI 9859, pp. 323–329, 2016.

DOI: 10.1007/978-3-319-45177-0 21



324



A.F. Villaverde et al.



used as a statistical measure for estimating the strength of the (possibly nonlinear) relations among nodes from a dataset. Indirect interactions, which take

place when an entity A exerts an influence in C by means of an intermediate

entity B (i.e. A → B → C), are difficult to detect, because a spurious interaction

may be deduced (not only A → B and B → C, but also A → C). The difficulty

of discriminating between them increases when dealing with higher-order interactions, involving four or more entities. Although a few methods can cope with

this issue, their application to large-scale problems is computationally costly [6],

especially when dealing with time-series data. One such method, MIDER [13], is

a general purpose network inference tool which takes into account time delays.

It distinguishes between direct and indirect interactions using entropy reduction

[9] and assigns directionality to the predicted links using transfer entropy [11]. It

is implemented in Matlab, a widely used programming environment which nevertheless has some drawbacks, mainly (i) the need of buying commercial licenses,

and (ii) low computational efficiency compared to other languages.

Here we present PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a tool that overcomes these issues. It includes an

advanced Fortran 90 implementation of the MIDER procedures, which allows for

faster computations than Matlab. Additionally, the use of OpenMP directives

enable it to run seamlessly in parallel environments, thus allowing for further

speedups in performance. Results obtained on different datasets show that PREMER can be orders of magnitude faster than MIDER. Additionally, PREMER’s

Matlab code is fully compatible with the free Octave environment. Furthermore,

PREMER offers two important additional capabilities. One is the ability to take

prior knowledge into account, allowing to specify if a particular interaction is

known to be non-existent. This is of particular importance in applications such

as gene regulatory network (GRN) inference, where only a subset of the genes —

the transcription factors, TFs — can regulate other genes. The second one is the

ability to handle datasets with missing values and/or outliers, using statistical

techniques to impute new values which are coherent with the latent structure

of the data. PREMER’s work-flow is depicted in Fig. 1. More details about the

methodology are given in the supplementary information (user’s manual).



2



Implementation and Availability



PREMER is provided as a set of Matlab/Octave scripts and an executable file

which carries out the core computations. It has a number of options, which can

be tuned by editing the main file, runPremer. Executable files are provided for

Windows, Linux and OSX, and also as source code in Fortran (F90), which can

be compiled to run on any operating system. The executable can also be invoked

from the command line, thus avoiding the need for Matlab/Octave. A key feature

of PREMER is its ability to run sequentially or in parallel. Parallelization has

been implemented using OpenMP directives [3] and is entirely transparent to the

user, who only needs to specify the number of threads in the main file. Mutual

information and multidimensional entropies are estimated using an adaptive



PREMER: Parallel Reverse Engineering of Biological Networks



325



Fig. 1. Work-flow of the PREMER algorithm. First, a data curation module imputes

missing data [4] and detects and corrects outliers, thus allowing the use of faulty

datasets. Then PREMER calculates the distance between every possible pair of variables d(X, Y ) for several time delays. To this end it estimates the entropies of all variables H(∗), as well as the joint entropies H(∗, ∗) and the mutual information I(∗, ∗)

of all pairs of variables. The user can choose to estimate also the multi-dimensional

joint entropies of 3 and 4 variables (H(∗, ∗, ∗), H(∗, ∗, ∗, ∗)), in order to use them in

the subsequent entropy reduction step. The aim of this step is to determine whether

all the variation in a variable Y can be explained by the variation in another variable

X or, more generally, in a set of variables X [9]. By iterating through cycles of adding

a variable X that reduces H(Y |X, X) until no further reductions are obtained, the

entropy reduction step yields the complete set of variables that control the variation

in Y . Finally, directions are assigned to the links using transfer entropy, TX→Y , a nonsymmetric measure of causality [11] calculated from time-lagged conditional entropies.



partitioning algorithm inspired in [2]. The PREMER toolbox is released under

the free and open source GNU GPLv3. It is available at https://sites.google.

com/site/premertoolbox/. Its use does not require any commercial software.



3



Selected Experimental Results



We tested PREMER on the same set of seven benchmark problems that was used

for assessing the performance of MIDER. It has been shown elsewhere [13] that

MIDER performs well compared with other state-of-the-art methods in terms

of precision and recall of the inferred networks. We found that PREMER predicts the same networks as MIDER (in examples without missing data or prior

information) achieving large reductions in computation times, as shown in Fig. 2.

Panel A plots the accelerations obtained with PREMER’s sequential implementation (i.e. using only one processor) with respect to MIDER. The most computationally costly problems give rise to the largest speed-ups: for example, for

benchmark B7 with 3 entropy reduction rounds the computation time decreases

from 42 h to roughly 1.5 h. This improvement is obtained using a single processor; additional speed-ups can be achieved in a parallel environment, as shown in

panel B. The combined effect of code acceleration and parallel speed-up results

in very significant reductions in computation time. For example, using a current

12-core desktop PC (hardware detailed in the caption of Fig. 2), PREMER runs

up to 170 times faster than MIDER.



326



A.F. Villaverde et al.

A



30



B



PREMER vs. MIDER: acceleration



Parallel vs. sequential F90: speedups & efficiency



20



1



25



10



0.75

Speedup

Ideal (linear) speedup

Efficiency

Ideal efficiency (=1)



10



0.5



5



Efficiency



15



Speedup



Speedup



15



B7

B6

B5

B4

B3

B2

B1



20



0.25



5

0

−2

10



0



2



4



10

10

10

Wall clock time in seconds (log scale)



6



10



0



1



3



5

7

9

Number of processors



11



0



Fig. 2. [A]: Accelerations achieved by PREMER w.r.t. MIDER, for benchmarks B1–

B7 of [13]. For every benchmark three points are plotted, depending on the number

of entropy reduction rounds performed: 1, 2, or 3. [B]: Speed-up and efficiency of

the parallel vs sequential versions of PREMER (benchmark B7, 3 entropy reduction

rounds). Results obtained in a multi-core PC running Windows 7 64-bit with 16 GB

RAM and 12 cores, 2 processors/core, Intel Xeon 2.30 GHz. (Color figure online)



By going through several entropy reduction rounds it is possible to discover

additional links, but in this process errors may appear: since the accuracy of

every network inference method is limited by the information content of the

data, some of the extra links can in fact be false positives. Therefore in many

cases there is a trade-off between precision and recall: increasing the number of

entropy reduction rounds leads to increased recall and decreased precision, and

vice versa. Table 1 shows this trade-off for the average of the seven benchmark

problems considered in [13].

Table 1. Trade-offs between precision and recall for different numbers of entropy reduction rounds. The values shown are the averages of the seven benchmark problems

(B1–B7) considered in [13].

Entropy reduction rounds



1



2



3



Average precision (B1–B7) 0.7676 0.6958 0.6311

Average recall (B1–B7)



0.5267 0.5676 0.5819



Finally, we illustrate the performance improvement that can be obtained by

taking prior knowledge into account. With this aim we create a benchmark network with GeneNetWeaver [10] consisting of 18 genes, out of which only 11 are

considered transcription factors, and we generate time course data of the expression of each gene at 24 different time points. We evaluate the performance of PREMER using two different modes of including information (removing interactions

a priori or a posteriori ) and we compare it to other network inference methods



PREMER: Parallel Reverse Engineering of Biological Networks



327



Fig. 3. Incorporating prior knowledge into network inference algorithms: (a) Reasoning

behind excluding interactions from gene regulatory networks. Only transcription factors

effectively serve as regulators in the network, hence interactions from the effector genes

to transcription factors or to other effectors can be excluded. (b) Regulatory interaction matrix returned by PREMER without incorporating prior knowledge. The heat

map scale represents the strength of the predicted interaction (0 = no interaction, 1 =

strongly predicted interaction). (c) Regulatory interaction matrix returned by PREMER incorporating prior knowledge by removing excluded interactions a posteriori.

(d) Regulatory interaction matrix returned by PREMER incorporating prior knowledge by removing excluded interactions a priori. (e) Comparison of several network

inference methods without incorporating prior knowledge. (f ) Comparison of network

inference methods incorporating prior knowledge either a posteriori (post) or a priori

(prior). In (e) and (f) horizontal dashed lines indicate the theoretical performance of a

random classifier.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

E-Cyanobacterium.org: A Web-Based Platform for Systems Biology of Cyanobacteria

Tải bản đầy đủ ngay(0 tr)

×