Tải bản đầy đủ - 0 (trang)
3 Use Case: A Steel Mill Plant Engineering

3 Use Case: A Steel Mill Plant Engineering

Tải bản đầy đủ - 0trang

8 Semantic Data Integration: Tools and Architectures


Fig. 8.2 UML class diagram (without properties) for describing hardware configuration used by

engineering tools like ET9000 or SYCON.Net (based on Mordinyi et al. 2014)

the other tools will provide additional data records (about 30–40 % of the

exchanged data, which roughly about 400 k data records in a typical case) that are

shared with other specific engineering discipline and also needs to be managed.

Considering that about 20 different types of engineering tools from several

engineering disciplines will be required for the development of engineering plants,

the overall number of data records describing the engineering plant and shared

among the engineering tools is considerably higher.

Typically, processes synchronization of engineering discipline views occur at

fixed points in time in the traditional automation systems engineering (Winkler

et al. 2011; Moser et al. 2011b). It is recommended to make changes visible to the

entire project (i.e., to propagate changes) once a specific and cohesive

task/requirement has been realized, since the intention is to adopt software engineering concepts and practices. Assuming that such an engineering project lasts

about 18 months in average and that project engineers make their changes available

once a day, the overall number of data records exchanged, transformed, processed,

and versioned during the project would be once again significantly higher.

There is the need for near-time analysis of the project progress based on real

change data (Moser et al. 2011b) due to the distributed parallel activities requirement of engineering processes. Based on current and systematically integrated data,

project managers need to see between milestones the overview on project progress.

Relevant queries formulate intentions to: (a) operation types (insert, update, delete)

to identify the volatility of common concepts, (b) specific components of the

engineering plant, or (c) both in the context of a time frame. Based on (Moser et al.

2011b) examples of relevant queries for, e.g., project managers are


R. Mordinyi et al.

• Query 1: “What is the number of changes, deletions, and insertions during the


• Query 2: “What is the number of changes, deletions, and insertions when

comparing two specific weeks?”

• Query 3: “Which components have been added, changed, or deleted on a week

basis during the project?”

• Query 4: “Which sub-components of a specific component have been added,

changed, or deleted on a week basis during the project?”

• Query 5: “How often has a specific common concept been updated during the


• Query 6: “How often has a specific component been changed on a week basis

during the project?”


Integration Requirements

In this sub-section, we describe the explicit integration requirements of data storages for automation systems development projects based on the use case definition

that we explained previously in this section and general use case of engineering in

Chap. 2. The requirement focuses on three aspects, namely data insertion, data

transformation, and data querying. We will use the previously presented case study

as a typical example for explaining these requirements.

Data Insertion

Consistent project data in all the discipline-specific tools is an important aspect in

the context of integrating engineering tools from heterogeneous engineering disciplines. Traditional automation systems engineering processes follow a basic

sequential process structure with distributed parallel activities in specific project

phases. However, these processes are lacking systematic feedbacks to earlier steps

and suffer from inefficient change management and synchronization mechanisms.

Local tool changes must be committed to make changes available to all project

participants and their tools in order to improve these processes. To minimize

inconsistencies between the tools, the approach encourages engineers to commit

their changes as often as possible. A commit can also be interpreted as a bulk

operation referencing to a set of data management operations. The bulk operations

should be faster than the sequential execution of commands a bulk operation refers

to. Nevertheless, the performance limits are set by the time response that users

consider acceptable, since the user triggers any of the operations. Furthermore, the

consumption of memory has to be limited to a reasonable amount that does not

harm the entire system when loading the ontology or when large bulk operations are

executed. Additionally, it is important that the stored data can be easily accessed

8 Semantic Data Integration: Tools and Architectures


and are readable in order to support data migration and to ensure data ownership. Finally, in order to facilitate maintenance of the system at semantic level,

semantic facilities have to be provided.

Data Transformation

As previously stated, the first step towards a consistent view on an automation

system project is to commit local changes. However, every engineering tool uses its

own concepts, which is not always compatible to other tools’ concepts. Therefore,

during change propagation the data provided by the tools has to be transformed to

the common concepts (i.e., assuming that there are defined mapping between

common concepts and local tools concepts) and then propagated to the other tools.

By transforming information (provided by a specific) tool to every other tool,

information is shared among project participants and a consistent view is facilitated.

Basically, a transformation consists on transforming data from one to another

ontology. Data transformation may vary from simple string operations, such as

contact or substring, to complex operations, where external services are needed or

where transformation executions depend on specific attributes’ values. For a

complete reference of semantic mapping types used in engineering, we refer the

readers to Chap. 6 of this book. It is essential that the transformation executions do

not compromise the system performance since several transformations have to be

performed for propagating changes. In addition, provided techniques for transformation should make sure that memory consumption is kept to a minimum. The data

storage is required to support mapping and transformation mechanisms like SPIN

and EDOAL; and it is favored to make use of them as built-in tools rather than

external tools in order to minimize unnecessary additional maintenance efforts.

Also, the mappings and transformed data needs to be easily accessed and readable.

Data Query

Automation systems development projects typically manage a huge amount of data

spread over large number of heterogeneous data sources originated from different

engineering disciplines. This characteristic hampers data analysis and querying

across the project, which could lead to emergence of inconsistencies in the project

data. If such inconsistencies are not identified early, they may cause costly corrections during the commission phase or even failures during the operation phase.

In the interchange standardization approach, the usage of the common model as

a mediator (Wiederhold 1992) is a common technique for querying purposes. In this

case, a single access point and common conceptualization for querying local data

sources is provided by the common concepts. Thus, the query formulation becomes

independent of the mediated local data sources. Individual users do not need to have

a detailed knowledge on all the concepts used in all project disciplines, and could

focus instead on the common concepts that are relevant for the whole project.


R. Mordinyi et al.

Two important aspects for querying are performance and memory consumption.

Acceptable response time for a query will strongly depend on a specific application.

If a big volume of data must be processed and the accuracy of results is crucial in

the applications, a longer execution time might be acceptable, e.g., for consistency

checking, an important aspect in the use case. However, other applications, like

navigation to specific parts in the project or analysis of subsets of data (e.g.,

analyzing the created signals in a specific time period) require fast response times

(e.g., 0.1–1 s) to keep the user’s attention focused (Nielsen 1993). In regards with

memory consumption, it is essential to not load all affected data into memory for

not causing memory exceptions, since it is needed to manage a large volume of

data. Some techniques for avoiding this are, e.g., exploiting indexing techniques or

query optimization (Gottlob et al. 2011).

The support for SPARQL (Pérez et al. 2006), which is currently the W3C

standard for querying ontologies, is another important requirement for providing

proper storage usability. Thus, SPARQL queries can be used for querying project

data through the common concepts. SPARQL makes the queries more compact and

easier to describe, while reducing the debugging time, since its syntax makes

virtually all join operations implicit. In the case of data storages that exploit relational databases to manage local tools data, support for automated query transformation from SPARQL into SQL must be provided. It is also noted that a build-in

query transformation is favored rather than external tools, in order to minimize

additional maintenance efforts. In addition, inference capabilities have to be supported by the semantic storages since they can considerably simplify required



Engineering Knowledge Base Software Architecture


This section illustrates the four introduced Engineering Knowledge Base

(EKB) software architecture variants in detail.


Software Architecture Variant A—Ontology Store

The first EKB software architecture variant (SWA-A) uses a single ontology

component that stores and manages both concepts and individuals (see Fig. 8.3)

Engineering tools use the component to insert, update, or delete data, e.g., using the

Sesame API, on their local tool data models. Mappings, implemented in SPARQL,

describe the relation between two concepts, perform transformations on the provided data, and update the instance set of the targeted model. Queries formulated

8 Semantic Data Integration: Tools and Architectures


Fig. 8.3 Concepts and instances are stored in a single ontology store (based on Mordinyi et al.


and executed by other engineering tools will retrieve the transformed data. Details

on the process can be found in (Moser 2009).

Versioning of instances is executed according to the publicly available changeset

vocabulary.29 The used vocabulary defines a set of terms for describing changes to

resource descriptions.


Software Architecture Variant B—Relational

Database with RDF2RDB Mapper

The second EKB software architecture variant (SWA-B) manages and stores concepts and instances in two different storage components. As shown in Fig. 8.4, the

ontology store manages concepts only while individuals are stored in a relational

database. Engineering tools insert, update, or delete data (Step 1) using SQL within

their designated tables that reflect the same model as described in the ontology.

Adaptations on that table trigger a process (Step 2) that requests the ontology to

transform the change set to the models it is mapped to. The transformed change set

(Step 3) is used to update the corresponding tables of the identified models.

In case an engineering tool wants to request information from the system, it

formulates a SPARQL query based on the models described in the ontology (Step 4),

hands it over to the RDF2RDB Mapper (Step 5). After transforming the SPARQL

query to SQL, the mapper executes it on the relational database, and returns the

result to the application. Versioning of instances is performed according to the

schema as in SWA-A.


Changeset vocabulary: http://vocab.org/changeset/schema.html.


R. Mordinyi et al.

Fig. 8.4 Application of a mapping configuration and relational database (based on Mordinyi et al.



Software Architecture Variant C—Graph Database


The third EKB software architecture variant (SWA-C) relies on graph databases.

The authors of (Mordinyi et al. 2015) provide details of how to represent complex

models with the database concept as well as how to map ontology concepts on

graph database schema for efficient execution of operations.

Since graph database implementations do not provide versioning per se, a versioning model had to be developed (Mordinyi et al. 2015). Figure 8.5 illustrates the

schema used to handle multiple versions of an individual Sample. For each individual the approach creates a current, revision, and history node in the database.

Additionally, a Commit node is created that stores metainformation of the operation, such as a timestamp or the committer.

Current nodes (e.g., Sample) represent the latest state of an individual including

its attributes. They are created when an entity is inserted into the database, updated

when an individual gets changed, and removed when deletion of the entity is

requested. History node (e.g., SampleHistory) is created only when an entity is

inserted. It never is never removed and can therefore be used to query already

deleted individuals. The history node enables access to all versions of a particular

entity and can be accessed by following the revision link(s). Those links can also be

used to track changes. Revision nodes (e.g., SampleRevision) store the state of a

particular individual at a specific point in time. Whenever an individual is changed,

a new revision node is created and the current node is updated.

Figure 8.6 shows the EKB software architecture variant that uses an ontology

component to store and manage concepts and a NoSQL graph database to store and

8 Semantic Data Integration: Tools and Architectures


Fig. 8.5 Data Model for Versioning (based on Mordinyi et al. 2015)

Fig. 8.6 Concepts are stored in Ontology, while instances are managed in a NoSQL Graph

Database (based on Mordinyi et al. 2015)

version individuals. The back-end relies on the OrientDB30 implementation and

was selected because (a) it supports a multi-model approach (i.e., it combines graph

and document databases), (b) it provides an easy to use API and query language


OrientDB: http://www.orientdb.org.


R. Mordinyi et al.

similar to traditional SQL, and (c) it has out-of-the-box support for distributed

operations. The multi-model approach provided by OrientDB enables developers to

use it in a similar manner as the property graph model and to handle complex

(nested) data structures inside a single node (also called documents). OrientDB

supports two different query languages: Gremlin31 as provided by the TinkerPop32

stack and the proprietary OrientDB SQL.

In Step 1, the schema for the graph database which are used by the instances is

derived from the ontology. Engineering tools insert, update, or delete data (Step 2)

using the operations provided by the database within the schema description that

reflect the same model as described in the ontology. Adaptations on nodes (i.e.,

Commit node) trigger a process (Step 3) that requests the ontology to transform the

change set to the models it is mapped to. The transformed change set (Step 4) is

used to update corresponding models. Versioning instances are performed

according to the versioning model as described before.

As for requesting information from the system, the engineering tool formulates

an SQL query based on the available schema (Step 5) and forwards it to the graph

database. Orientdb is compatible with a subset of SQL ANSI-92, while for complex

queries it provides additional SQL-like features.


Software Architecture Variant D—Versioning

Management System

In comparison to the aforementioned architecture variants in which versioning has

to be considered and implemented explicitly, the fourth EKB software architecture

variant (SWA-D) inherently facilitates such property by making use of a versioning

system like Git. Consequently, the architecture distinguishes between models and

instances and stores them separately. While the versioning system is responsible for

managing individuals, the ontology store copes with concepts. Each model is

represented by a repository (see Fig. 8.7), and each individual is stored in a Turtle33

file. Turtle is a textual syntax for RDF which makes it beneficial to versioning

systems, as it enables easy change tracking due to its textual form. Additionally, a

mechanism, like Apache Jena ARQ,34 has to be introduced in order to enable

querying on RDF data stored in the repositories.

As shown in Fig. 8.7, an engineering tool commits (Step 1) changes to its local

repository, which is automatically pushed (Step 2) by a so-called hook to the master

repository of the model. Triggered by a hook on the master repository of the model,

the transaction manager identifies correlating models (Step 3, 4) and requests the


Gremlin: https://github.com/tinkerpop/gremlin/wiki.

Apache TinkerPop: http://tinkerpop.incubator.apache.org/.


Turtle: http://www.w3.org/TR/turtle/.


Apache Jena ARQ: http://jena.apache.org/documentation/query/index.html.


8 Semantic Data Integration: Tools and Architectures


Fig. 8.7 Versioning system and an ontology-based store (based on Mordinyi et al. 2014)

ontology to transform (Step 5, 6) the change set as defined in the mappings between

the concepts. Finally, the result is committed (Step 7) in the master repository of

affected models, from which the engineering tool pulls (Step 8) the latest changes.



In order to demonstrate the characteristics of the four different EKB software

architecture variants, we evaluated performance and used memory (disk space and

memory consumption) in the context of two evaluation scenarios (see Sect. 8.5.1)

and with respect to querying versioning information. The scenarios used for evaluation are a simplified version of the constellation of engineering tools as illustrated

in Sect. 8.3 and in (Moser and Biffl 2012). The integration setup consists of three

interconnected models (Serral et al. 2013): (a) model representing electrical plans

(EPL), (b) model representing programmable logic control (PLC), and (c) the

common concept signal which defines a relation between the two aforementioned

models. This means, that in case of a new EPL instance, the data is transformed,


R. Mordinyi et al.

propagated, and “inserted” as a new signal, and as a new PLC instances into the

storage. In the same way, in case of deletion or of an update, the corresponding data

records from the other models are removed or updated respectively.


Evaluation Process and Setup

The intention of evaluation scenario 1 (ES-1) is to investigate the behavior of the

architecture with respect to the operation types insert, update, and delete (see

Fig. 8.8). The scenario assumes a fixed amount of data records (e.g., 100.000) in

the system, while the number of data records added, deleted, and updated changes

in correlation with each operation (Fig. 8.8, Op. 0–Op. 20). Operations reflect

commits in the sense of a versioning system and are executed in sequence.

Please note that the fixed amount of data records refers to the latest project state,

and does not take versioning information into consideration, i.e., the history of

operations. With each commit the number of versioned information increases by the

number of added, deleted, and updated data records of the previous commit. The

scenario facilitates investigation on how the amount of data to be processed by an

operation type effects the systems, and how it compares to other operation types.

With respect to the figure, after Operation 20 1.05 million data records were added

and updated, and 950.000 data records removed.

On the other hand, the aim of evaluation scenario 2 (ES-2) is to enable evaluation on how operations perform with increasing project size. As shown in Fig. 8.9,

there are no delete operations. This means that with each commit new data added,

the size of the project and the amount of versioning information increases. With

respect to the figure, after Operation 0 there are 10.000 data records in the storage,

while after Operation 13 101.000 data records were processed (considering versioning information).

The evaluations were performed on an consumer laptop, with an Intel® Core™

i7-3537U Processor 2 GHz, 10 GB RAM, and 256 GB SSD hard disk running

Ubuntu 12.04 64 bit, OpenJDK 64 bit, and JRE 7.0_25 with a java heap size of

8 GB RAM. For evaluating SWA-A, originally Bigdata version 1.2.3 was picked,

Fig. 8.8 Amount and type of

operations in evaluation

scenario 1

8 Semantic Data Integration: Tools and Architectures


Fig. 8.9 Amount and type of

operations in evaluation

scenario 2

but due to high memory consumption resulting in OutOfMemoryExceptions both

scenarios could only perform four operations. Instead, Sesame Native store v2.6.3

was chosen. In case of a SWA-B D2RQ (Bizer and Seaborne 2004) version 0.8.1

along with mysql 5.5 was deployed, for SWA-C OrientDB version 2.0.5 was

selected, while in case of a SWA-D the version control system Git and Apache

Jena ARQ 2.11.0 were selected. In the following SWA-A, B, C, and D will refer to

the concrete implementations of the described EKB software architectures.


Evaluation of Data Management Capabilities

This section provides the evaluation results of the two evaluation scenarios applied

to the software architecture variants.

Performance Results of Evaluation Scenario 1

Figure 8.10 illustrates the performance in terms of the time required for inserting,

updating, and deleting a set of instance as defined by ES-1 for all four software

architecture variants. The figure shows that SWA-B is the fastest, while SWA-C the

slowest. The reason is the different approach on how versioning is done. As

described in Sect. 8.4.3 SWA-C stores information several times in order to keep

revisions transparent to query formulation, while SWA-B does it only once.

Additional updates on graph structures lead to additional performance costs. From

the figure we can also observe the fluctuations in time in case of SWA-A, while

SWA-D provides an almost continuously constant execution time. This results due

to the fact that at each Operation there is always the full amount of files to be

processed by the system.

Comparing the various architecture variants, using SWA-A over SWA-B

increases overall execution time by 2–3 times, by around 3–6 times in case of

SWA-C. While SWA-D is still slower than SWA-A or SWA-B, it is still faster than

SWA-C. Nevertheless, the main drawback of SWA-D is the huge number of files

the file system has to cope with. Although at Operation 20, Git had to process 2.2

million turtle files, the system needed to manage an additional 24.1 million files for

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

3 Use Case: A Steel Mill Plant Engineering

Tải bản đầy đủ ngay(0 tr)