Tải bản đầy đủ - 0trang
E-Cyanobacterium.org: A Web-Based Platform for Systems Biology of Cyanobacteria
models traversing already through many branches of biology [3,12,17,19].
Unfortunately, cyanobacteria models are strongly under-represented in these
repositories, probably because of the natural diﬀerences that exist between common bacteria and phototrophic bacteria. There exist several online tools presenting biological networks or genome knowledge [1,4,12]. However, due to enormous
complexity of biological processes, it is a challenge to develop tools presenting
biological networks alongside with executable or mathematical models. Focusing
on domain-speciﬁc organisms allows us to integrate the knowledge and present it
in a concise and understandable form. This has been already proven on examples
of well-known model organisms such as E. coli  or C. elegans . Nevertheless, these resources do not couple the presented knowledge with modelling.
In this tool paper, we present an online platform for cyanobacteria processes —
e-cyanobacterium.org 2 . The platform integrates several dedicated tools and is
distinct in the following aspects: formal rule-based representation of biochemical interactions facilitated by cyanobacteria biochemical entities; repository of
kinetic models providing basic analysis tools online (simulation, custom data sets,
basic static analysis); integration of models within the rule-based description and
export to SBML ; storage, maintenance and presentation of experimental data;
content visualisation (graphical presentation of models, biochemical space and
The presented release of e-cyanobacterium.org is implemented as an extension of our general database tool-kit — the so-called comprehensive modelling
platform . Key updates are the formal rule-based language BCSL, wet-lab
experiments module, and improved analysis of models. Most importantly, several
well-annotated and curated models developed by the consortium are provided
within e-cyanobacterium.org including the formalisation of the respective part
of biochemical space in BCSL.
Web Platform Overview
The platform consists of several dedicated modules (Fig. 1) all connected to a
central module – Biochemical Space (BCS)  – that is the backbone of the
platform. BCS provides formal description of the biological problem and it is
based on the hierarchy of selected biological processes. It is accompanied with
schemas representing relevant biological processes in the context of cyanobacteria. For each process, there are presented relevant models, chemical entities, and
rules. Presentation of every process includes detailed information and links to
relevant internal and external sources.
Biochemical Space is constructed from hierarchy of entities interacting in rules
formally speciﬁed in Biochemical Space Language (BCSL). The main advantage
of BCSL is the adoption of the most important aspects of rule-based features
ak et al.
Fig. 1. Scheme of interconnections between modules of E-cyanobacterium.org platform.
while still keeping the syntax human-readable and accessible to communities outside computer science. On the one hand, BCSL has executable semantics that
allows basic analysis and consistency checking. On the other hand, the language
includes constructs for detailed biological annotation reﬂecting the known bioinformatics databases and ontologies. BCSL is developed with the consideration
of new extensions in SBML level 3. Once the SBML rule-based support becomes
to be actively used in tools, our platform will provide relevant export ﬁlters.
Rule description provides details of rules, explicit enumeration of substrates
and products, and available annotations. Moreover, the rule is schematically
visualised (Sect. 3.2) and presented with appropriate links to the process hierarchy.
Entity description provides information about associated models and rules.
Links to the process hierarchy and to external sources (UniProt , Kegg ,
GeneOnthology , etc.) are available.
Model repository is a collection of implemented mathematical models describing particular parts of biological processes. Every model is represented as a set
of ordinary diﬀerential equations generated from the model reaction network.
Models are integrated within BCS. In particular, each model component should
be related to some BCS entity and each model reaction should be related to
some BCS rule. Moreover, a model is associated with some parameter value sets
(data sets) that enable simulation (Sect. 3.3) in a particular biologically-relevant
scenarios. Additionally, several basic non-parameter-speciﬁc static analysis techniques (Sect. 3.4) based on model stoichiometry are also provided.
An implemented model then includes complete biological annotation
(Sect. 3.1) of all components and reactions that is provided by mapping to
BCS. This might help to ﬁnd connections and overlaps among models. Further,
implemented model can be exported to SBML (level 2).
Currently, the repository contains two models describing circadian clock
(Miyoshi et al. 2007 , Hertel et al. 2013 ) and a kinetic model of metabolism
(Jablonsky et al. 2014 ). Two other models present unpublished results
dynamics of carbon uxes (Mă
uller et al.) and photosynthesis (Plyusnina et al.).
Experiments repository is a tool for storage and presentation of time-series data
from wet-lab experiments. Every experiment is well-grounded by precise description (device, medium, organism, etc.) and appropriate annotations. Experiments
are structured – several time series data can be attached to a single experiment.
Every time series targets a speciﬁc list of measured substances together with
time stamps of the individual measurements. Data can be imported/exported in
simple text formats. Time series are visualised in a chart (Sect. 3.3).
Registered users can add their own experiments while keeping the selected
experiments private or public. For repeated experiments, annotations and details
can be cloned from previously inserted experiments.
E-cyanobacterium.org provides support for modelling and analysis of biological
systems. The most important fact is that all these parts are integrated within
the Biochemical Space. Therefore it is possible to reveal non-trivial relations
between biochemical substances and models.
Annotation is an important task in modelling of biological systems. This is how
biological knowledge is mapped to the mathematical description. Our platform
considers annotation as an inherent and compulsory part of the modelling procedure. The following aspects of annotation are supported:
– BCS creation and maintenance,
– model annotation by mapping to BCS,
– experiment annotation.
BCS is being continuously extended and revised by the consortium.
Researchers supplying their models are required to integrate the models within
the current BCS. This gives good feedback to BCS maintainers. In the experiments repository module, emphasis is given to description of conditions under
which an experiment has been performed, which is important for interpretation
of the data as well as the possibility of reconstructing the measurements.
Static visualisation is provided by means of schemas showing the process hierarchy with most important objects from BCS. Graphical schemes are supplied
with detailed information on the visualised elements of BCS. This is achieved in
the information panel below the scheme. An accompanying feature is ﬁltering
displayed data to All or Visible elements. The former option extends the content
to all relevant entities and rules that take the part in the displayed process but
ak et al.
are not necessarily visualised (e.g., this applies to small “universal” molecules
such as ATP, ADP, etc.). Moreover, there is a special visualisation framework
for complex networks such as metabolism. This widget allows one to zoom in
a speciﬁc part of the complex network. This feature is currently employed for
metabolic map of cyanobacteria. Owing to requests coming out of the consortium, we have decided to employ manually handled visualisation. In next release,
we plan to provide automatically generated SBGN representation of the schemes.
Rule details (Sect. 2.1) are enriched by means of automatically generated
visualisation. For every rule, a graphical scheme that displays all its substrates
and products is generated.
With every implemented model (Sect. 2.2) it is possible to execute simulations.
Registered users can change initial conditions and parameters of the model and
set the simulation options (the numerical solver and its parameters). To apply
such settings in the simulation, they have to be saved in terms of a custom
dataset (public or private). The simulation chart is generated for all available
datasets and the platform GUI allows one to switch between them. The chart is
interactive and allows one to change the axis type, zoom in the selected curves,
and show a focused value on a curve.
Simulation data are accessible by exporting them in several data formats.
Similar options are also available in charts visualising experimental data in the
experiments repository. Additionally, the simulated model including the selected
dataset can be downloaded as an Octave ﬁle . Additionally, the platform
allows one to use diﬀerent externally called numerical solvers for simulation. In
particular, model administrators can select either Octave (default), COPASI ,
or remote simulation with COPASI web services .
At the current stage, there are three static analysis tasks available. Matrix analysis produces incidence matrix, Conservation analysis produces mass conservation analysis (moiety conservation), and Modes analysis produces elementary
ﬂux modes. For these tasks, the well-acclaimed third party tool COPASI  is
used which allows one to download the task data by means of an SBRML ﬁle .
E-cyanobacterium.org provides several features that contribute to production
and presentation of models targeting cyanobacteria. The most principal eﬀort is
to interlink biological knowledge with beneﬁts of computational systems biology
tools. This is enabled by means of a novel formal notation – Biochemical Space –
which allows us to integrate computational models with the biological knowledge
and wet-lab experiments.
For future work, we plan to improve the mapping between mathematics and
biology, to enhance the website with more analysis tools (currently we are implementing an interface for existing property monitoring and robustness analysis
tools such as ), and to automatise the comparison of models against experimental data. Moreover, biochemical space of cyanobacteria is continuously being
extended and improved with interactive visualisations of reaction networks based
on formal description provided in Biochemical Space Language.
1. Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Gene
Ontology Consortium, Nat. Genet. 25(1), 25–29 (2000)
2. Bateman, A., et al.: Uniprot: a hub for protein information. Nucleic Acids Res.
43(D1), D204–D212 (2015)
3. Beard, D.A., et al.: CellML metadata standards, associated tools and repositories.
Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 367(1895), 1845–1867 (2009)
4. Croft, D., et al.: The reactome pathway knowledgebase. Nucleic Acids Res. 42(D1),
5. Dada, J.O., et al.: SBRML: a markup language for associating systems biology
data with models. Bioinformatics 26(7), 932–938 (2010)
6. Eaton, J.W., et al.: GNU Octave version 3.0.1 manual: a high-level interactive language for numerical computations. CreateSpace Independent Publishing Platform,
7. Harris, T.W., et al.: Wormbase: a comprehensive resource for nematode research.
Nucleic Acids Res. 38(suppl 1), D463–D467 (2010)
8. Hertel, S., et al.: Revealing a two-loop transcriptional feedback mechanism in the
cyanobacterial circadian clock. PLoS Comput. Biol. 9(3), 1–16 (2013)
9. Hoops, S., et al.: COPASI - a complex pathway simulator. Bioinformatics 22(24),
10. Hucka, M., et al.: The systems biology markup language (SBML): a medium for
representation and exchange of biochemical network models. Bioinformatics 19(4),
11. Jablonsky, J., et al.: Multi-level kinetic model explaining diverse roles of isozymes
in prokaryotes. PLoS ONE 9(8), 1–8 (2014)
12. Kanehisa, M., et al.: KEGG as a reference resource for gene and protein annotation.
Nucleic Acids Res. 44(D1), D457–D462 (2016)
13. Kent, E., et al.: Condor-COPASI: high-throughput computing for biochemical networks. BMC Syst. Biol. 6(1), 1–13 (2012)
14. Keseler, I.M., et al.: EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res. 41(D1), D605–D612 (2013)
15. Klement, M., et al.: A comprehensive web-based platform for domain-specific biological models. Electron. Notes Theoret. Comput. Sci. 299, 61–67 (2013)
16. Klement, M., et al.: Biochemical space: a framework for systemic annotation of
biological models. In: Proceedings of the 5th International Workshop on Interactions Between Computer Science and Biology (CS2Bio-14). Electronic Notes in
Theoretical Computer Science, vol. 306, pp. 31–44 (2014)
17. Le Nov`ere, N., et al.: Biomodels database: a free, centralized database of curated,
published, quantitative kinetic models of biochemical and cellular systems. Nucleic
Acids Res. 34, D689–D691 (2006)
ak et al.
18. Miyoshi, F., et al.: A mathematical model for the kai-protein-based chemical oscillator and clock gene expression rhythms in cyanobacteria. J. Biol. Rhythms 22(1),
19. Olivier, B.G., et al.: Web-based kinetic modelling using jws online. Bioinformatics
20(13), 2143–2144 (2004)
20. Rizk, A., et al.: A general computational method for robustness analysis with
applications to synthetic gene networks. Bioinformatics 25(12), i169–i178 (2009)
PREMER: Parallel Reverse Engineering
of Biological Networks with Information Theory
Alejandro F. Villaverde1,2,4(B) , Kolja Becker3 , and Julio R. Banga4
Department of Systems and Control Engineering,
University of Vigo, Galicia, Spain
Centre for Biological Engineering, University of Minho,
Modelling of Biological Networks, Institute of Molecular Biology GGmbH,
Bioprocess Engineering Group, Instituto de Investigaci´
Vigo, Galicia, Spain
Abstract. A common approach for reverse engineering biological networks from data is to deduce the existence of interactions among nodes
from information theoretic measures. Estimating these quantities in a
multidimensional space is computationally demanding for large datasets.
This hampers the application of elaborate algorithms – which are crucial
for discarding spurious interactions and determining causal relationships –
to large-scale network inference problems. To alleviate this issue we
have developed PREMER, a software tool which can automatically run
in parallel and sequential environments, thanks to its implementation
of OpenMP directives. It recovers network topology and estimates the
strength and causality of interactions using information theoretic criteria,
and allowing the incorporation of prior knowledge. A preprocessing module takes care of imputing missing data and correcting outliers if needed.
PREMER (https://sites.google.com/site/premertoolbox/) runs on Windows, Linux and OSX, it is implemented in Matlab/Octave and Fortran
90, and it does not require any commercial software.
Many biological systems can be meaningfully represented as networks, that is,
as a set of nodes (variables) connected by links (interactions). In the context of
cellular networks the nodes are molecular entities such as genes, transcription
factors, proteins, metabolites, and so on . The network inference problem consists of learning the interconnection structure of the nodes, using as data the
values of the variables (e.g. their expression levels or concentrations) at diﬀerent
situations and/or time instants. The concept of mutual information  can be
c Springer International Publishing AG 2016
E. Bartocci et al. (Eds.): CMSB 2016, LNBI 9859, pp. 323–329, 2016.
DOI: 10.1007/978-3-319-45177-0 21
A.F. Villaverde et al.
used as a statistical measure for estimating the strength of the (possibly nonlinear) relations among nodes from a dataset. Indirect interactions, which take
place when an entity A exerts an inﬂuence in C by means of an intermediate
entity B (i.e. A → B → C), are diﬃcult to detect, because a spurious interaction
may be deduced (not only A → B and B → C, but also A → C). The diﬃculty
of discriminating between them increases when dealing with higher-order interactions, involving four or more entities. Although a few methods can cope with
this issue, their application to large-scale problems is computationally costly ,
especially when dealing with time-series data. One such method, MIDER , is
a general purpose network inference tool which takes into account time delays.
It distinguishes between direct and indirect interactions using entropy reduction
 and assigns directionality to the predicted links using transfer entropy . It
is implemented in Matlab, a widely used programming environment which nevertheless has some drawbacks, mainly (i) the need of buying commercial licenses,
and (ii) low computational eﬃciency compared to other languages.
Here we present PREMER (Parallel Reverse Engineering with Mutual information & Entropy Reduction), a tool that overcomes these issues. It includes an
advanced Fortran 90 implementation of the MIDER procedures, which allows for
faster computations than Matlab. Additionally, the use of OpenMP directives
enable it to run seamlessly in parallel environments, thus allowing for further
speedups in performance. Results obtained on diﬀerent datasets show that PREMER can be orders of magnitude faster than MIDER. Additionally, PREMER’s
Matlab code is fully compatible with the free Octave environment. Furthermore,
PREMER oﬀers two important additional capabilities. One is the ability to take
prior knowledge into account, allowing to specify if a particular interaction is
known to be non-existent. This is of particular importance in applications such
as gene regulatory network (GRN) inference, where only a subset of the genes —
the transcription factors, TFs — can regulate other genes. The second one is the
ability to handle datasets with missing values and/or outliers, using statistical
techniques to impute new values which are coherent with the latent structure
of the data. PREMER’s work-ﬂow is depicted in Fig. 1. More details about the
methodology are given in the supplementary information (user’s manual).
Implementation and Availability
PREMER is provided as a set of Matlab/Octave scripts and an executable ﬁle
which carries out the core computations. It has a number of options, which can
be tuned by editing the main ﬁle, runPremer. Executable ﬁles are provided for
Windows, Linux and OSX, and also as source code in Fortran (F90), which can
be compiled to run on any operating system. The executable can also be invoked
from the command line, thus avoiding the need for Matlab/Octave. A key feature
of PREMER is its ability to run sequentially or in parallel. Parallelization has
been implemented using OpenMP directives  and is entirely transparent to the
user, who only needs to specify the number of threads in the main ﬁle. Mutual
information and multidimensional entropies are estimated using an adaptive
PREMER: Parallel Reverse Engineering of Biological Networks
Fig. 1. Work-ﬂow of the PREMER algorithm. First, a data curation module imputes
missing data  and detects and corrects outliers, thus allowing the use of faulty
datasets. Then PREMER calculates the distance between every possible pair of variables d(X, Y ) for several time delays. To this end it estimates the entropies of all variables H(∗), as well as the joint entropies H(∗, ∗) and the mutual information I(∗, ∗)
of all pairs of variables. The user can choose to estimate also the multi-dimensional
joint entropies of 3 and 4 variables (H(∗, ∗, ∗), H(∗, ∗, ∗, ∗)), in order to use them in
the subsequent entropy reduction step. The aim of this step is to determine whether
all the variation in a variable Y can be explained by the variation in another variable
X or, more generally, in a set of variables X . By iterating through cycles of adding
a variable X that reduces H(Y |X, X) until no further reductions are obtained, the
entropy reduction step yields the complete set of variables that control the variation
in Y . Finally, directions are assigned to the links using transfer entropy, TX→Y , a nonsymmetric measure of causality  calculated from time-lagged conditional entropies.
partitioning algorithm inspired in . The PREMER toolbox is released under
the free and open source GNU GPLv3. It is available at https://sites.google.
com/site/premertoolbox/. Its use does not require any commercial software.
Selected Experimental Results
We tested PREMER on the same set of seven benchmark problems that was used
for assessing the performance of MIDER. It has been shown elsewhere  that
MIDER performs well compared with other state-of-the-art methods in terms
of precision and recall of the inferred networks. We found that PREMER predicts the same networks as MIDER (in examples without missing data or prior
information) achieving large reductions in computation times, as shown in Fig. 2.
Panel A plots the accelerations obtained with PREMER’s sequential implementation (i.e. using only one processor) with respect to MIDER. The most computationally costly problems give rise to the largest speed-ups: for example, for
benchmark B7 with 3 entropy reduction rounds the computation time decreases
from 42 h to roughly 1.5 h. This improvement is obtained using a single processor; additional speed-ups can be achieved in a parallel environment, as shown in
panel B. The combined eﬀect of code acceleration and parallel speed-up results
in very signiﬁcant reductions in computation time. For example, using a current
12-core desktop PC (hardware detailed in the caption of Fig. 2), PREMER runs
up to 170 times faster than MIDER.
A.F. Villaverde et al.
PREMER vs. MIDER: acceleration
Parallel vs. sequential F90: speedups & efficiency
Ideal (linear) speedup
Ideal efficiency (=1)
Wall clock time in seconds (log scale)
Number of processors
Fig. 2. [A]: Accelerations achieved by PREMER w.r.t. MIDER, for benchmarks B1–
B7 of . For every benchmark three points are plotted, depending on the number
of entropy reduction rounds performed: 1, 2, or 3. [B]: Speed-up and eﬃciency of
the parallel vs sequential versions of PREMER (benchmark B7, 3 entropy reduction
rounds). Results obtained in a multi-core PC running Windows 7 64-bit with 16 GB
RAM and 12 cores, 2 processors/core, Intel Xeon 2.30 GHz. (Color ﬁgure online)
By going through several entropy reduction rounds it is possible to discover
additional links, but in this process errors may appear: since the accuracy of
every network inference method is limited by the information content of the
data, some of the extra links can in fact be false positives. Therefore in many
cases there is a trade-oﬀ between precision and recall: increasing the number of
entropy reduction rounds leads to increased recall and decreased precision, and
vice versa. Table 1 shows this trade-oﬀ for the average of the seven benchmark
problems considered in .
Table 1. Trade-oﬀs between precision and recall for diﬀerent numbers of entropy reduction rounds. The values shown are the averages of the seven benchmark problems
(B1–B7) considered in .
Entropy reduction rounds
Average precision (B1–B7) 0.7676 0.6958 0.6311
Average recall (B1–B7)
0.5267 0.5676 0.5819
Finally, we illustrate the performance improvement that can be obtained by
taking prior knowledge into account. With this aim we create a benchmark network with GeneNetWeaver  consisting of 18 genes, out of which only 11 are
considered transcription factors, and we generate time course data of the expression of each gene at 24 diﬀerent time points. We evaluate the performance of PREMER using two diﬀerent modes of including information (removing interactions
a priori or a posteriori ) and we compare it to other network inference methods
PREMER: Parallel Reverse Engineering of Biological Networks
Fig. 3. Incorporating prior knowledge into network inference algorithms: (a) Reasoning
behind excluding interactions from gene regulatory networks. Only transcription factors
eﬀectively serve as regulators in the network, hence interactions from the eﬀector genes
to transcription factors or to other eﬀectors can be excluded. (b) Regulatory interaction matrix returned by PREMER without incorporating prior knowledge. The heat
map scale represents the strength of the predicted interaction (0 = no interaction, 1 =
strongly predicted interaction). (c) Regulatory interaction matrix returned by PREMER incorporating prior knowledge by removing excluded interactions a posteriori.
(d) Regulatory interaction matrix returned by PREMER incorporating prior knowledge by removing excluded interactions a priori. (e) Comparison of several network
inference methods without incorporating prior knowledge. (f ) Comparison of network
inference methods incorporating prior knowledge either a posteriori (post) or a priori
(prior). In (e) and (f) horizontal dashed lines indicate the theoretical performance of a