Tải bản đầy đủ - 0trang
4 Realizing the “harmonious state” of standardized management of fire forces equipment procurement
Construction of an Electronic Health Record System …
In this research, we constructed the Electronic Health Record System for supporting a
zoo veterinarian. This system covers all items of paper medium animal health record,
and we constructed the input form design of this system with the same design as paper
medium animal health record. Thus, we succeeded to reduce the learning cost when a
zoo veterinarian uses the electronic health record system. And, when referring to a
past animal health record, the paper medium animal health record needs much time
and trouble. However, a zoo veterinarian could access a past animal health record by
realization of the electronic health record system. Moreover, a zoo veterinarian could
browse the animal health record on various terminals by digitalization. Therefore, a
zoo veterinarian could share an efficient and effective information.
We are planning to advance research and development of the zoo business integrated
management support system including the animal electronic diary management
system, the animal management ledger system, and the feed management system from
now on as shown in Fig. 8. The zoo business integrated management support system
consists of the following system group.
Animal Electronic Health Record System
Animal Electronic Diary Management System
Medical Treatment Electronic Diary System
Feed Management System
Animal Management Ledger System
Zoo Veterinarian Management System
Zoo Event Management System
Registration Information Management System
T. Oyanagi et al.
Fig. 8. Zoo Business Integrated Management Support System
Acknowledgments. The authors would like to thank N. Namae for total assistance
with the system construction. We also thank Kamine zoo staff for fruitful discussions
and valuable suggestions.
1. Kamine Zoo : Kamine Zoo Annual Report (2014).
2. Hitachi City Assembly, Hitachi-city Administrative Activities Case Report, http://hitachi-grgiindan.jp/photonews/image/2010-hitachi.pdf, last viewed September 2016.
Workshop CADSA-2016: 4th International
Workshop on Cloud and Distributed
An Architecture for processing of Heterogeneous
Flora Amato, Giovanni Cozzolino, Antonino Mazzeo, Sara Romano
Abstract Different sources of information generate every day huge amount of data.
For example, let us consider social networks: here the number of active users is
impressive; they process and publish information in different formats and data are
heterogeneous in their topics and in the published media (text, video, images, audio,
etc.). In this work, we present a general framework for event detection in processing of heterogeneous data from social networks. The framework we propose, implements some techniques that users can exploit for malicious events detection on
Social media networks have emerged as powerful means of communication for people looking to share and exchange information on a wide variety of real-world
events. These events range from popular, widely known ones (e.g., a concert by
a popular music band) to smaller scale, local events (e.g., a local social gathering, a protest, or an accident). Short messages posted on social media sites such
as Twitter can typically reﬂect these events as they happen. For this reason, the
content of such social media sites is particularly useful for real-time identiﬁcation
of real-world events. Social networks have become a virtually unlimited source of
knowledge, that can be used for scientiﬁc as well as commercial purposes. In fact,
the analysis of on-line public information collected by social networks is becoming increasingly popular, not only for the detection of particular trend, close to the
classic market research problem, but also to solve problems of different nature, such
as the identiﬁcation of fraudulent behaviour, optimization of web sites, tracking the
geographical location of particular users or, more generally, to ﬁnd meaningful patFlora Amato,Giovanni Cozzolino, Antonino Mazzeo, Sara Romano
DIETI - Università degli Studi di Napoli "Federico II", via Claudio 21, Naples, e-mail: flora.
© Springer International Publishing AG 2017
F. Xhafa et al. (eds.), Advances on P2P, Parallel, Grid, Cloud
and Internet Computing, Lecture Notes on Data Engineering
and Communications Technologies 1, DOI 10.1007/978-3-319-49109-7_65
F. Amato et al.
terns in a certain set of data. In particular, the rich knowledge that has accumulated
in Twitter enables to catch the happening of real world events in real-time. These
event messages can provide a set of unique perspectives, regardless of the event type
[1, 2], reﬂecting the points of view of users who are interested or even participate in
an event. In particular, for unplanned events (e.g., the Iran election protests, earthquakes), Twitter users sometimes spread news prior to the traditional news media
. Even for planned events, Twitter users often post messages in anticipation of the
event, which can lead to early identiﬁcation of interest in these events. Additionally,
Twitter users often post information on local, community-speciﬁc events where traditional news coverage is low or nonexistent. Thus Twitter can be considered as a
collector of real-time information that could be used by public authorities as an additional information source for obtaining warnings on event occurrence. In the last
few years, particular interest has given to the extraction and analysis of information from social media by authorities and structures responsible for the protection
of public order and safety. Increasingly, through the information posted on social
media, public authorities are able to conduct a variety of activities, such as prevention of terrorism and bio-terrorism, prevention of public order problems and safety
guarantee during demonstrations with large participation of people.
The heterogeneity of information and the huge scale of data makes the identiﬁcation of events from Twitter a challenging problem. In fact, Twitter messages,
or tweets, has a variety of content types, including personal updates, not related
to any particular real-world event, information about event happenings, retweets of
messages which are of interest for a user and so on . As an additional challenge,
Twitter messages contain little textual information, having by design the limit of 140
characters and often exhibit low quality . Several research efforts have focused
on identifying events in Twitter [6, 7]. For example, in recent years there has been
a lot of research efforts in analyzing Tweets to enhance health related alerts, following a natural calamity or a bio-terrorist attack, which can urge a rapid response
from health authority, as well as disease monitoring for prevention. The previous
work in this area includes validating the timeliness of Twitter by correlating Tweets
with the real-world statistics of disease activities, e.g., Inﬂuenza-like-Illness (ILI)
rates [9, 10, 11], E-coli , cholera  or ofﬁcially notiﬁed cases of dengue .
In this work, we present a general framework for event detection, starting from the
heterogeneous processing of data coming from social networks systems. The proposed system process heterogeneous information in order to detect anomalies in
Twitter stream. Event related anomalies are patterns in data that do not ﬁt the pattern of the expected normal behavior . Those anomalies might be induced in
the data for malicious activity, as for example cyber-intrusion or terrorist activity.
The proposed framework aims at analyzing tweets and automatically extract relevant event related information in order to raise a signal as soon as a malicious event
activity is detected.
The reminder of the paper is structured as follows: in Section 2 we describe
two examples that motivate our work; in Section 3 we present an overview of the
framework along with the goals and requirements that are of importance for event
detection from Twitter. Moreover in this Section we describe the main processing
stages of the system. In the end, in Section 4 we present some conclusions and future
An Architecture for processing of Heterogeneous Sources
2 Motivating example
Social Network, as Twitter, have become a virtually unlimited source of knowledge, that can be used for scientiﬁc as well as commercial purposes. Twitter users
spreads a huge amount of information ranging on a several “facts”. The analysis of
the information collected by social networks is becoming increasingly popular for
the detection of trends, fraudulent behavior or, more generally, for ﬁnding meaningful patterns in a certain set of data. This process of extraction and analysis of large
amounts of data is commonly deﬁned data mining.
Particular attention has been paid to the analysis of the User-Generated Content (UGC) coming from Twitter, which is one of the most popular microblogging
websites. As described in the Section 1, Twitter textual data (i.e., tweets) can be analyzed to discover user thoughts associated with speciﬁc events, as well as aspects
characterizing events according to user perception. Tweets are short, user-generated,
textual messages of at most 140 characters long and publicly visible by default. For
each tweet a list of additional features (e.g., GPS coordinates, timestamp) on the
context in which tweets have been posted is also available.
In this Section we show two examples of text mining on live tweets to monitor
possible public security issues. In the ﬁrst one we describe the topic relevance, in
the second one we analyze the results, using some realistic data taken during a sport
In the last years, especially from the episodes following Sept. 11, the fear of
bioterrorism is in the list of most urgent healthcare concerns, together with previous
health issues - cancer, AIDS, smoking or alcohol/drug abuse, heart disease and the
cost of healthcare and insurance. Gallup1 has asked Americans since 2001 to name
the most urgent health problem facing the U.S.. Cost of and access to healthcare
have generally topped the list in last few years, while Americans most frequently
mentioned diseases such as cancer and AIDS in the 1990s. Other health issues have
appeared near the top of the list over the past 15 years, including 2014, when Ebola
was listed among the top three health concerns. This was most likely due to multiple
Ebola outbreaks in West Africa and a few conﬁrmed cases all around the world,
which prompted widespread media coverage of the disease. In 2015, less than 0.5
of Americans listed Ebola as the most urgent issue, as the threat of the virus has
subsided. Similar short-lived spikes in the responses to this question have occurred
regarding other recent health threats, including the H1N1/swine ﬂu outbreak in 2009
and anthrax and bioterrorism attacks in 2001.
In our ﬁrst scenario, we consider as application domain a global health treat like
bioterrorism, so the system aim is to monitor the Twitter Stream detecting tweets
F. Amato et al.
containing given keywords related to bioterrorism attacks, like “attack”, “terror”,
“anthrax, “terrorism”, “jihadi”, etc.. In this scenario we focus our attention on gathering all the upcoming tweets about a particular event using Twitter’ streaming API.
Depending on the search term, we can gather tons of tweets within a few minutes.
This is especially true for live events with a world-wide coverage (World Cups,
Academy Awards, Election Day, you name it). A working example that gathers all
the new tweets with the #anthrax hashtag, or containing the word “anthrax”, shows
that a large number of collected tweets is not related to an upcoming real treat. In
our case, for example, the word “anthrax” is also related to a metal music band, so
we need a technique to distinguish relevant tweets from not relevant ones.
As for the second example, we consider a given event as search domain to make a
ﬁrst selection of relevant tweets on which we adopt deeper textual analysis to detect
the positive ones. A post hoc analysis of the information posted on Twitter, shows
that, after the occurrence of a given event, the number of tweets related to that event
is inﬂuenced by two main factors:
1. density of users in different geographic areas;
2. popularity of the event in the different geographic areas.
To argue this consideration we consider, as a simpliﬁed example, a sport event
as given event; more speciﬁcally, we focus our attention to the soccer match between SSC Napoli and Legia Varsavia teams, played on 10 December 2015 during
qualifying stages of Europa League competition. Before the match, tension, acts of
vandalism and violence breaks out as opposite hooligans ﬁght in city centre. The
news was reported by main national newspapers, as well as main on-line information sites. Obviously this news was also reported by people and mass media on their
We analyse relevant tweet concerning this given event, to estimate user perception of the episodes of violence. Searching the string “Napoli Legia Varsavia” and
ﬁltering tweets by date between 9 December and 11 December, we collect the total
amount of tweets concerning the given event. Reﬁning the result with the words
“arresti”, “scontri”, “polizia”, ect., we obtain the number of tweets related with the
public security disorders, instead of the soccer match. In Table 1 we summary the
Table 1 Search results.
Public security concern
9 December – 11 December
Napoli Legia Varsavia scontri,
Napoli Legia Varsavia Napoli Legia Varsavia arresti,
Napoli Legia Varsavia polizia
Number of tweets:
Date of interest:
From this analysis we note that a considerable number of tweets is not related
with the event itself, but rather with the occurrence of acts of vandalism and violence
during the social event. Moreover, the proportion between total number of tweets
An Architecture for processing of Heterogeneous Sources
and the number of tweets concerning with public security issue depends by the
severity of the episodes.
3 Framework description
In this work we present a general framework to be adopted for event detection from
Twitter. In particular, we aim at detecting anomalies in Twitter stream related to
malicious activity, as for example cyber-intrusion or terrorist activity. In this section
Fig. 1 Overview of the twitter stream processing framework for event detection and alerting.
we present an overview of the main processing stages of the framework along with
the goals and requirements that are of importance for event detection from Twitter.
Event detection from Twitter messages must efﬁciently and accurately ﬁlter relevant
information about events of speciﬁc interest, which is hidden within a large amount
of insigniﬁcant information. The proposed framework aims at analyzing tweets and
automatically extract relevant event related information in order to raise a signal as
F. Amato et al.
soon as a malicious event activity is detected. It is worth noting that in this work we
consider an event occurrence in Twitter stream whenever an anomaly bursts in the
analyzed data stream. The Twitter stream processing pipeline consists of three main
stages, as illustrated in Figure 1. The system aims at detecting user deﬁned phenomenon form the twitter stream in order to give alerts in case of event occurrence.
In the following we describe the main stages of the proposed system.
3.1 Collection and Filtering
The ﬁrst stage of the system is devoted to collect and ﬁlter tweets related to an
event. It is based on Information extraction task. Information extraction is the process of automatically scanning text for information relevant to some interest, including extracting entities, relations, and, most challenging, events (something happened in particular place at particular time) . It makes the information in the
text more accessible for further processing. The increasing availability of on-line
sources of information in the form of natural-language texts increased accessibility
of textual information. The overwhelming quantity of available information has led
to a strong interest in technology for processing text automatically in order to extract task-relevant information. Information extraction main task is to automatically
extract structured information from unstructured and/or semi-structured documents
exploiting different kinds of text analysis. Those are mostly related to techniques of
Natural Language Processing (NLP) and to cross-disciplinary perspectives including Statistical and Computational Linguistics, whose objective is to study and analyze natural language and its functioning through computational tools and models.
Moreover techniques of information extraction can be associated with text mining
and semantic technologies activities in order to detect relevant concepts from textual
data aiming at detecting events, indexing and retrieval of information as well as long
term preservation issues ,,. Standard approaches used for implementing
IE systems rely mostly on:
• Hand-written regular expressions. Hand-coded systems often rely on extensive
lists of people, organizations, locations, and other entity types.
• Machine Learning (ML) based Systems. Hand annotated corpus is costly thus
ML methods are used to automatically train an IE system to produce text annotation. Those systems are mostly based on supervised techniques to learn extraction patterns from plain or semi-structured texts. It is possible to distinguish
two types of ML systems:
– Classiﬁer based. A part of manually annotated corpus is used to train the IE
system in order to produce text annotation .
– Active learning (or bootstrapping). In preparing for conventional supervised learning, one selects a corpus and annotates the entire corpus from
beginning to end. The idea of active learning involves having the system
select examples for the user to annotate which are likely to be informative
An Architecture for processing of Heterogeneous Sources
which are likely to improve the accuracy of the model . Some examples
of IE systems are .
In our work we are interested in detecting speciﬁed event which relies on speciﬁc
information and features that are known about the event such as type and description, which are provided by the a domain expert of the event context. These features
are exploited by adapting traditional information extraction techniques to the Twitter messages characteristics. In the ﬁrst stage, the tweets are collected exploiting
the REST API provided from Twitter that allow programmatic access to read data
from the stream. Those API are costumed in order to crawl data exploiting user deﬁned rules (keywords, hashtag, user proﬁle, etc). The tweets are annotated, in the
ﬁlter module, with locations and temporal expressions using a series of language
processing tools for tokenization, part-of-speech tagging, temporal expression extraction, and tools for named entity recognition.
In order to access to Twitter data programmatically, it’s necessary to register
a trusted application, associated with an user account, that interacts with Twitter
APIs. Registering a trusted application, Twitter provides the required credentials
(consumer keys) that can be used to authenticate the REST calls via the OAuth
protocol. To create and manage REST calls we integrate in the Filter Module a
Python wrapper for the Twitter API, called Tweepy, that includes a set of class
enabling the interaction between the system and Twitter Stream. For example, the
API class provides access to the entire twitter RESTful API methods. Each method
can accept various parameters and return responses.
To gather all the upcoming tweets about a particular event, we call the Streaming
API extending Tweepy’s StreamListener() class in order to customise the way we
process the incoming data. In the Listing 1 we simply gather all the new tweets with
the #anthrax hashtag:
Listing 1 Custom Listener to gather all the new tweets with the #python hashtag and save them in
class TwitterListener ( StreamListener ):
def on_data ( s e l f , data ) :
w i t h open ( ’ a n t h r a x . j s o n ’ , ’ a ’ ) a s f :
f . write ( data )
except BaseException as e :
p r i n t ( " E r r o r o n _ d a t a : %s " % s t r ( e ) )
def on_error ( s e l f , s t a t u s ) :
print ( s t a t u s )
t w i t t e r _ s t r e a m = Stream ( auth , T w i t t e r L i s t e n e r ( ) )
F. Amato et al.
t w i t t e r _ s t r e a m . f i l t e r ( t r a c k =[ ’ # a n t h r a x ’ ] )
All gathered tweets will be furthermore ﬁltered and then passed to the Classiﬁcation module for deeper textual processing[23, 24, 25].
3.2 Classifying relevant information
The collected and annotated tweets are then classiﬁed, in the second stage, with
respect to their relevance. As stated before in the paper, event detection from Twitter messages is a challenging task since the relevant information about a speciﬁc
event is hidden within a large amount of insigniﬁcant messages. Thus a classiﬁer
must efﬁciently and accurately ﬁlter relevant information. The classiﬁer is trained
exploiting a domain dependent thesaurus that include a list of relevant terms for
event detection. It is responsible for ﬁltering out irrelevant messages.
3.3 Generating alerts
Once the set of irrelevant tweets are discarded, those remaining must be conveniently aggregated with respect to an event of interest. The third stage of the proposed system is aimed at detecting anomalies in the event related set of tweets.
Anomalies are patterns in data that do not ﬁt the pattern of the expected normal
behavior. Anomalies might be induced in the data for a variety of reasons, such as
malicious activity, as for example cyber-intrusion or terrorist activity, but all of the
reasons have a common characteristic that they are interesting to the analyst. In literature there are several techniques exploited for the anomaly detection. Those range
between several disciplines and approaches such as statistics, machine learning, data
mining, information theory, spectral theory.
In our proposed system, the set of event related tweets are processed by the alert
module that is responsible to raise an alert in case of anomaly within the collected
tweets. The alert module implements a set of burst detection algorithms in order to
generate alerts whenever an event occurs that means an anomaly in the analyzed data
stream is detected. The detected events are then displayed to the end user exploiting
the visualization module. An important aspect is that the analyzed data can be tuned
by the end user for novelty detection which aims at detecting previously unobserved
(emergent, novel) patterns in the data, such as a new topic. The novel patterns are
novel patterns are typically incorporated into the normal model after being detected.
Moreover with the tuning phase the parameters for the tweets classiﬁcation module
can be modiﬁed in case the messages that raised an alert are considered not relevant
for the event occurrence.