Tải bản đầy đủ - 0 (trang)
4 Realizing the “harmonious state” of standardized management of fire forces equipment procurement

4 Realizing the “harmonious state” of standardized management of fire forces equipment procurement

Tải bản đầy đủ - 0trang

Construction of an Electronic Health Record System …



6



675



Conclusion



In this research, we constructed the Electronic Health Record System for supporting a

zoo veterinarian. This system covers all items of paper medium animal health record,

and we constructed the input form design of this system with the same design as paper

medium animal health record. Thus, we succeeded to reduce the learning cost when a

zoo veterinarian uses the electronic health record system. And, when referring to a

past animal health record, the paper medium animal health record needs much time

and trouble. However, a zoo veterinarian could access a past animal health record by

realization of the electronic health record system. Moreover, a zoo veterinarian could

browse the animal health record on various terminals by digitalization. Therefore, a

zoo veterinarian could share an efficient and effective information.



7



Future Tasks



We are planning to advance research and development of the zoo business integrated

management support system including the animal electronic diary management

system, the animal management ledger system, and the feed management system from

now on as shown in Fig. 8. The zoo business integrated management support system

consists of the following system group.

z

z

z

z

z

z

z

z



Animal Electronic Health Record System

Animal Electronic Diary Management System

Medical Treatment Electronic Diary System

Feed Management System

Animal Management Ledger System

Zoo Veterinarian Management System

Zoo Event Management System

Registration Information Management System



676



T. Oyanagi et al.



Fig. 8. Zoo Business Integrated Management Support System



Acknowledgments. The authors would like to thank N. Namae for total assistance

with the system construction. We also thank Kamine zoo staff for fruitful discussions

and valuable suggestions.



References

1. Kamine Zoo : Kamine Zoo Annual Report (2014).

2. Hitachi City Assembly, Hitachi-city Administrative Activities Case Report, http://hitachi-grgiindan.jp/photonews/image/2010-hitachi.pdf, last viewed September 2016.



Part V



Workshop CADSA-2016: 4th International

Workshop on Cloud and Distributed

System Applications



An Architecture for processing of Heterogeneous

Sources

Flora Amato, Giovanni Cozzolino, Antonino Mazzeo, Sara Romano



Abstract Different sources of information generate every day huge amount of data.

For example, let us consider social networks: here the number of active users is

impressive; they process and publish information in different formats and data are

heterogeneous in their topics and in the published media (text, video, images, audio,

etc.). In this work, we present a general framework for event detection in processing of heterogeneous data from social networks. The framework we propose, implements some techniques that users can exploit for malicious events detection on

Twitter.



1 Introduction

Social media networks have emerged as powerful means of communication for people looking to share and exchange information on a wide variety of real-world

events. These events range from popular, widely known ones (e.g., a concert by

a popular music band) to smaller scale, local events (e.g., a local social gathering, a protest, or an accident). Short messages posted on social media sites such

as Twitter can typically reflect these events as they happen. For this reason, the

content of such social media sites is particularly useful for real-time identification

of real-world events. Social networks have become a virtually unlimited source of

knowledge, that can be used for scientific as well as commercial purposes. In fact,

the analysis of on-line public information collected by social networks is becoming increasingly popular, not only for the detection of particular trend, close to the

classic market research problem, but also to solve problems of different nature, such

as the identification of fraudulent behaviour, optimization of web sites, tracking the

geographical location of particular users or, more generally, to find meaningful patFlora Amato,Giovanni Cozzolino, Antonino Mazzeo, Sara Romano

DIETI - Università degli Studi di Napoli "Federico II", via Claudio 21, Naples, e-mail: flora.

amato,giovanni.cozzolino,mazzeo,sara.romano@unina.it



© Springer International Publishing AG 2017

F. Xhafa et al. (eds.), Advances on P2P, Parallel, Grid, Cloud

and Internet Computing, Lecture Notes on Data Engineering

and Communications Technologies 1, DOI 10.1007/978-3-319-49109-7_65



679



680



F. Amato et al.



terns in a certain set of data. In particular, the rich knowledge that has accumulated

in Twitter enables to catch the happening of real world events in real-time. These

event messages can provide a set of unique perspectives, regardless of the event type

[1, 2], reflecting the points of view of users who are interested or even participate in

an event. In particular, for unplanned events (e.g., the Iran election protests, earthquakes), Twitter users sometimes spread news prior to the traditional news media

[3]. Even for planned events, Twitter users often post messages in anticipation of the

event, which can lead to early identification of interest in these events. Additionally,

Twitter users often post information on local, community-specific events where traditional news coverage is low or nonexistent. Thus Twitter can be considered as a

collector of real-time information that could be used by public authorities as an additional information source for obtaining warnings on event occurrence. In the last

few years, particular interest has given to the extraction and analysis of information from social media by authorities and structures responsible for the protection

of public order and safety. Increasingly, through the information posted on social

media, public authorities are able to conduct a variety of activities, such as prevention of terrorism and bio-terrorism, prevention of public order problems and safety

guarantee during demonstrations with large participation of people.

The heterogeneity of information and the huge scale of data makes the identification of events from Twitter a challenging problem. In fact, Twitter messages,

or tweets, has a variety of content types, including personal updates, not related

to any particular real-world event, information about event happenings, retweets of

messages which are of interest for a user and so on [4]. As an additional challenge,

Twitter messages contain little textual information, having by design the limit of 140

characters and often exhibit low quality [5]. Several research efforts have focused

on identifying events in Twitter [6, 7]. For example, in recent years there has been

a lot of research efforts in analyzing Tweets to enhance health related alerts[8], following a natural calamity or a bio-terrorist attack, which can urge a rapid response

from health authority, as well as disease monitoring for prevention. The previous

work in this area includes validating the timeliness of Twitter by correlating Tweets

with the real-world statistics of disease activities, e.g., Influenza-like-Illness (ILI)

rates [9, 10, 11], E-coli [12], cholera [13] or officially notified cases of dengue [14].

In this work, we present a general framework for event detection, starting from the

heterogeneous processing of data coming from social networks systems. The proposed system process heterogeneous information in order to detect anomalies in

Twitter stream. Event related anomalies are patterns in data that do not fit the pattern of the expected normal behavior [15]. Those anomalies might be induced in

the data for malicious activity, as for example cyber-intrusion or terrorist activity.

The proposed framework aims at analyzing tweets and automatically extract relevant event related information in order to raise a signal as soon as a malicious event

activity is detected.

The reminder of the paper is structured as follows: in Section 2 we describe

two examples that motivate our work; in Section 3 we present an overview of the

framework along with the goals and requirements that are of importance for event

detection from Twitter. Moreover in this Section we describe the main processing

stages of the system. In the end, in Section 4 we present some conclusions and future

work.



An Architecture for processing of Heterogeneous Sources



681



2 Motivating example

Social Network, as Twitter, have become a virtually unlimited source of knowledge, that can be used for scientific as well as commercial purposes. Twitter users

spreads a huge amount of information ranging on a several “facts”. The analysis of

the information collected by social networks is becoming increasingly popular for

the detection of trends, fraudulent behavior or, more generally, for finding meaningful patterns in a certain set of data. This process of extraction and analysis of large

amounts of data is commonly defined data mining.

Particular attention has been paid to the analysis of the User-Generated Content (UGC) coming from Twitter, which is one of the most popular microblogging

websites. As described in the Section 1, Twitter textual data (i.e., tweets) can be analyzed to discover user thoughts associated with specific events, as well as aspects

characterizing events according to user perception. Tweets are short, user-generated,

textual messages of at most 140 characters long and publicly visible by default. For

each tweet a list of additional features (e.g., GPS coordinates, timestamp) on the

context in which tweets have been posted is also available.

In this Section we show two examples of text mining on live tweets to monitor

possible public security issues. In the first one we describe the topic relevance, in

the second one we analyze the results, using some realistic data taken during a sport

event.

In the last years, especially from the episodes following Sept. 11, the fear of

bioterrorism is in the list of most urgent healthcare concerns, together with previous

health issues - cancer, AIDS, smoking or alcohol/drug abuse, heart disease and the

cost of healthcare and insurance. Gallup1 has asked Americans since 2001 to name

the most urgent health problem facing the U.S.. Cost of and access to healthcare

have generally topped the list in last few years, while Americans most frequently

mentioned diseases such as cancer and AIDS in the 1990s. Other health issues have

appeared near the top of the list over the past 15 years, including 2014, when Ebola

was listed among the top three health concerns. This was most likely due to multiple

Ebola outbreaks in West Africa and a few confirmed cases all around the world,

which prompted widespread media coverage of the disease. In 2015, less than 0.5

of Americans listed Ebola as the most urgent issue, as the threat of the virus has

subsided. Similar short-lived spikes in the responses to this question have occurred

regarding other recent health threats, including the H1N1/swine flu outbreak in 2009

and anthrax and bioterrorism attacks in 2001.

In our first scenario, we consider as application domain a global health treat like

bioterrorism, so the system aim is to monitor the Twitter Stream detecting tweets

1



http://www.gallup.com/



682



F. Amato et al.



containing given keywords related to bioterrorism attacks, like “attack”, “terror”,

“anthrax, “terrorism”, “jihadi”, etc.. In this scenario we focus our attention on gathering all the upcoming tweets about a particular event using Twitter’ streaming API.

Depending on the search term, we can gather tons of tweets within a few minutes.

This is especially true for live events with a world-wide coverage (World Cups,

Academy Awards, Election Day, you name it). A working example that gathers all

the new tweets with the #anthrax hashtag, or containing the word “anthrax”, shows

that a large number of collected tweets is not related to an upcoming real treat. In

our case, for example, the word “anthrax” is also related to a metal music band, so

we need a technique to distinguish relevant tweets from not relevant ones.

As for the second example, we consider a given event as search domain to make a

first selection of relevant tweets on which we adopt deeper textual analysis to detect

the positive ones. A post hoc analysis of the information posted on Twitter, shows

that, after the occurrence of a given event, the number of tweets related to that event

is influenced by two main factors:

1. density of users in different geographic areas;

2. popularity of the event in the different geographic areas.

To argue this consideration we consider, as a simplified example, a sport event

as given event; more specifically, we focus our attention to the soccer match between SSC Napoli and Legia Varsavia teams, played on 10 December 2015 during

qualifying stages of Europa League competition. Before the match, tension, acts of

vandalism and violence breaks out as opposite hooligans fight in city centre. The

news was reported by main national newspapers, as well as main on-line information sites. Obviously this news was also reported by people and mass media on their

Twitter profiles.

We analyse relevant tweet concerning this given event, to estimate user perception of the episodes of violence. Searching the string “Napoli Legia Varsavia” and

filtering tweets by date between 9 December and 11 December, we collect the total

amount of tweets concerning the given event. Refining the result with the words

“arresti”, “scontri”, “polizia”, ect., we obtain the number of tweets related with the

public security disorders, instead of the soccer match. In Table 1 we summary the

results.

Table 1 Search results.

Event related

Public security concern

9 December – 11 December

Napoli Legia Varsavia scontri,

Query string:

Napoli Legia Varsavia Napoli Legia Varsavia arresti,

Napoli Legia Varsavia polizia

Number of tweets:

709

96

Date of interest:



From this analysis we note that a considerable number of tweets is not related

with the event itself, but rather with the occurrence of acts of vandalism and violence

during the social event. Moreover, the proportion between total number of tweets



An Architecture for processing of Heterogeneous Sources



683



and the number of tweets concerning with public security issue depends by the

severity of the episodes.



3 Framework description

In this work we present a general framework to be adopted for event detection from

Twitter. In particular, we aim at detecting anomalies in Twitter stream related to

malicious activity, as for example cyber-intrusion or terrorist activity. In this section



Fig. 1 Overview of the twitter stream processing framework for event detection and alerting.



we present an overview of the main processing stages of the framework along with

the goals and requirements that are of importance for event detection from Twitter.

Event detection from Twitter messages must efficiently and accurately filter relevant

information about events of specific interest, which is hidden within a large amount

of insignificant information. The proposed framework aims at analyzing tweets and

automatically extract relevant event related information in order to raise a signal as



684



F. Amato et al.



soon as a malicious event activity is detected. It is worth noting that in this work we

consider an event occurrence in Twitter stream whenever an anomaly bursts in the

analyzed data stream. The Twitter stream processing pipeline consists of three main

stages, as illustrated in Figure 1. The system aims at detecting user defined phenomenon form the twitter stream in order to give alerts in case of event occurrence.

In the following we describe the main stages of the proposed system.



3.1 Collection and Filtering

The first stage of the system is devoted to collect and filter tweets related to an

event. It is based on Information extraction task. Information extraction is the process of automatically scanning text for information relevant to some interest, including extracting entities, relations, and, most challenging, events (something happened in particular place at particular time) [16]. It makes the information in the

text more accessible for further processing. The increasing availability of on-line

sources of information in the form of natural-language texts increased accessibility

of textual information. The overwhelming quantity of available information has led

to a strong interest in technology for processing text automatically in order to extract task-relevant information. Information extraction main task is to automatically

extract structured information from unstructured and/or semi-structured documents

exploiting different kinds of text analysis. Those are mostly related to techniques of

Natural Language Processing (NLP) and to cross-disciplinary perspectives including Statistical and Computational Linguistics, whose objective is to study and analyze natural language and its functioning through computational tools and models.

Moreover techniques of information extraction can be associated with text mining

and semantic technologies activities in order to detect relevant concepts from textual

data aiming at detecting events, indexing and retrieval of information as well as long

term preservation issues [17],[18],[19]. Standard approaches used for implementing

IE systems rely mostly on:

• Hand-written regular expressions. Hand-coded systems often rely on extensive

lists of people, organizations, locations, and other entity types.

• Machine Learning (ML) based Systems. Hand annotated corpus is costly thus

ML methods are used to automatically train an IE system to produce text annotation. Those systems are mostly based on supervised techniques to learn extraction patterns from plain or semi-structured texts. It is possible to distinguish

two types of ML systems:

– Classifier based. A part of manually annotated corpus is used to train the IE

system in order to produce text annotation [20].

– Active learning (or bootstrapping). In preparing for conventional supervised learning, one selects a corpus and annotates the entire corpus from

beginning to end. The idea of active learning involves having the system

select examples for the user to annotate which are likely to be informative



An Architecture for processing of Heterogeneous Sources



685



which are likely to improve the accuracy of the model [21]. Some examples

of IE systems are [22].

In our work we are interested in detecting specified event which relies on specific

information and features that are known about the event such as type and description, which are provided by the a domain expert of the event context. These features

are exploited by adapting traditional information extraction techniques to the Twitter messages characteristics. In the first stage, the tweets are collected exploiting

the REST API provided from Twitter that allow programmatic access to read data

from the stream. Those API are costumed in order to crawl data exploiting user defined rules (keywords, hashtag, user profile, etc). The tweets are annotated, in the

filter module, with locations and temporal expressions using a series of language

processing tools for tokenization, part-of-speech tagging, temporal expression extraction, and tools for named entity recognition.

In order to access to Twitter data programmatically, it’s necessary to register

a trusted application, associated with an user account, that interacts with Twitter

APIs. Registering a trusted application, Twitter provides the required credentials

(consumer keys) that can be used to authenticate the REST calls via the OAuth

protocol. To create and manage REST calls we integrate in the Filter Module a

Python wrapper for the Twitter API, called Tweepy, that includes a set of class

enabling the interaction between the system and Twitter Stream. For example, the

API class provides access to the entire twitter RESTful API methods. Each method

can accept various parameters and return responses.

To gather all the upcoming tweets about a particular event, we call the Streaming

API extending Tweepy’s StreamListener() class in order to customise the way we

process the incoming data. In the Listing 1 we simply gather all the new tweets with

the #anthrax hashtag:

Listing 1 Custom Listener to gather all the new tweets with the #python hashtag and save them in

a file



class TwitterListener ( StreamListener ):

def on_data ( s e l f , data ) :

try :

w i t h open ( ’ a n t h r a x . j s o n ’ , ’ a ’ ) a s f :

f . write ( data )

return True

except BaseException as e :

p r i n t ( " E r r o r o n _ d a t a : %s " % s t r ( e ) )

return True

def on_error ( s e l f , s t a t u s ) :

print ( s t a t u s )

return True

t w i t t e r _ s t r e a m = Stream ( auth , T w i t t e r L i s t e n e r ( ) )



686



F. Amato et al.



t w i t t e r _ s t r e a m . f i l t e r ( t r a c k =[ ’ # a n t h r a x ’ ] )

All gathered tweets will be furthermore filtered and then passed to the Classification module for deeper textual processing[23, 24, 25].



3.2 Classifying relevant information

The collected and annotated tweets are then classified, in the second stage, with

respect to their relevance. As stated before in the paper, event detection from Twitter messages is a challenging task since the relevant information about a specific

event is hidden within a large amount of insignificant messages. Thus a classifier

must efficiently and accurately filter relevant information. The classifier is trained

exploiting a domain dependent thesaurus that include a list of relevant terms for

event detection. It is responsible for filtering out irrelevant messages.



3.3 Generating alerts

Once the set of irrelevant tweets are discarded, those remaining must be conveniently aggregated with respect to an event of interest. The third stage of the proposed system is aimed at detecting anomalies in the event related set of tweets.

Anomalies are patterns in data that do not fit the pattern of the expected normal

behavior. Anomalies might be induced in the data for a variety of reasons, such as

malicious activity, as for example cyber-intrusion or terrorist activity, but all of the

reasons have a common characteristic that they are interesting to the analyst. In literature there are several techniques exploited for the anomaly detection. Those range

between several disciplines and approaches such as statistics, machine learning, data

mining, information theory, spectral theory.

In our proposed system, the set of event related tweets are processed by the alert

module that is responsible to raise an alert in case of anomaly within the collected

tweets. The alert module implements a set of burst detection algorithms in order to

generate alerts whenever an event occurs that means an anomaly in the analyzed data

stream is detected. The detected events are then displayed to the end user exploiting

the visualization module. An important aspect is that the analyzed data can be tuned

by the end user for novelty detection which aims at detecting previously unobserved

(emergent, novel) patterns in the data, such as a new topic. The novel patterns are

novel patterns are typically incorporated into the normal model after being detected.

Moreover with the tuning phase the parameters for the tweets classification module

can be modified in case the messages that raised an alert are considered not relevant

for the event occurrence.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

4 Realizing the “harmonious state” of standardized management of fire forces equipment procurement

Tải bản đầy đủ ngay(0 tr)

×