Tải bản đầy đủ - 0 (trang)
3 Scenario 2: Considering Components' Size, Provider, and Type

3 Scenario 2: Considering Components' Size, Provider, and Type

Tải bản đầy đủ - 0trang

298



J. Criado et al.



A21

S



T



T2



S1



L1



LL2



L



M



M1



LL



Fig. 4. Transformed UI in scenario 1 and its architecture



initial UI must be restructured accordingly. Figure 5 shows the three alternatives

that can be reached from A21. The architecture A31 replaces the previous map

M 1 by M 2 and uses C3 for containing it. In addition, C3 also contains previous

LL2 and L2 components. The new map is resolved with an M 1 component.

The second alternative, A32, replaces the initial map M 1 by M 3, a map which

includes the layer list and legend functionality. The new map is M 1 type. The

alternative A33 includes the same replacement of A31 but, in this case, the new

map is M 2 type and it is contained in a C3 component.

In this transformation process, we want to show that the approach could be

used not only for QAs, but also for constraints. In this case, we consider three

constraints from Step 1 as drivers to chose a valid alternative for the stakeholders.

As a part of Step 2, the metrics used to measure the constraints are:

(a) tsize for components’ size constraints – The total size of components must

be minimized and architectures with a value over 5MB will be rejected.

Thus, we try to improve the response time of the browser by reducing the

payload of the web components that must be initialized in the UI.

(b) hp for components’ provider constraints – The homogenization among components’ providers must be maximized because, in this scenario, UIs with

similar representation are preferred over components with heterogeneous

representations. The use of the same provider does not guarantee the pursued homogenization, but the possibilities are greater if the entity providing

the components is the same.

(c) htype for components’ type constraints – The homogenization among components’ types must be maximized because it is important to offer the maximum degree of consistency in the structure and representation of the UI’s

components. Therefore, components of the same type offer their functionality in the same manner.

Regarding the alternatives of Fig. 5, each architecture accomplishes the best

value for a different metric. In the case of tsize, the value of 925 MB from A32



Exploring Quality-Aware Architectural Transformations at Run-Time



299



is the best alternative. Focusing on hp, the best alternative is the architecture

A31 because it gathers four components (M 1, M 2, L2 and LL2) from the same

provider among the total of seven. With respect to htype, the best alternative is

A33, because it contains four components (M 2 * 2 and C3 * 2) having elements

of the same type in the architecture. Therefore model transformation M T 4 is

chosen in the case of prioritize hp, whereas M T 5 and M T 6 are selected if tsize

and htype are prioritized, respectively.

4.4



Discussion About Using the Approach in ENIA



Answering the research question stated in Sect. 1, we can say that considering

QAs at run-time has improved the modifiability and flexibility of generated

architectures by model transformations in the ENIA case.

The main advantage of using metrics related to QAs and constraints in ENIA

is the incorporation of quality information in the process of selecting the best

transformation operation that can be applied in UI adaptation. This allows us to

use additional information (to functional interfaces) for solving the transformation process. In this sense, if these metrics are not applied, the transformation

can generate architectures which may result in some drawbacks for the present

use or future modifications.

For example, looking at the first scenario (see Fig. 3), it is possible to obtain

A22 as a solution instead of A21. In this case, we are ‘loosing’ the capability

of having a session component which can be resized. On the contrary, using

our approach we are able to offer ‘resizability’ of the components through the

maximization of the rr metric. If we do not give the maximum priority to rr

but we take it into account in the adaptation, at least the transformation at

run-time will be enriched to improve the flexibility of generated UIs.

A21



S



T2



S1



L1



L



LL2



M



M1



LL



MT4



A31



T



MT5



T2



S1



S1

T2

C3

M1

M2

M3

L2

LL2



:

:

:

:

:

:

:

:



400 KB

75 KB

100 KB

200 KB

200 KB

250 KB

120 KB

50 KB



A33

S



T



: S1, T2, C3

: M1, M2, M3, L2, LL2



MT6



A32

S



Provider 1

Provider 2



S1



T2



T



S



T



T2



S1



M

M



M2

LL2



M

M



M

M



M1



L



M3



M



M1



LL



L2



M2

LL2



LL



LL



L



M



L2



L



L

LL



tsize

hp

htype



=

=



=

max(3/7, 4/7) =

0/7 =



C3



L



C3



C3



LL



1145

0.5714

0.0



tsize

hp

htype



=

=



=

max(2/4, 2/4) =

0/4 =



925

0.5

0.0



tsize

hp

htype



M2



M



=

=



=

max(4/8, 4/8) =

4/8 =



1245

0.5

0.5



Fig. 5. Transformation alternatives using three example constraints



300



J. Criado et al.



With regard to the future modifications, let us suppose that in the scenario 2

none of the metrics are applied and consequently, the generated transformation

is equivalent to T 5 and the resulting architecture is A32. In this case, if the next

adaptation step is aimed to remove the capability of selecting the layers to be

displayed on the map (i.e., LL provided interface), we faced two options: (1)

the component M 3 must be modified for hiding this interface and disabling its

functionality, or (2) the component M 3 must be replaced by other map which

does not include this functionality, such as M 1 or M 2. In both options, we

have to perform additional operations compared to those required in the case of

starting the adaptation from the architectures A31 or A33, scenario in which we

only should remove the component LL2.

Apart from these advantages, nothing is free in software engineering, and the

performance of the QA-aware model transformation approach is an important

trade-off that must be noted. Performance is related to the computation time

necessary to (a) build each transformation alternative, (b) execute them obtaining the resulting architecture, and (c) measure each architecture to decide which

transformation alternative is the best in terms of the quality information. The

cost of these three execution times must be incorporated to the evaluation of the

adaptation process described in [10] and, consequently, may not be possible to

evaluate a large number of alternatives at run-time, having to limit the number

of architectures evaluated to satisfy performance requirements.



5



Conclusions and Future Work



It is well accepted in the software architecture community that QAs are the

most important drivers of architecture design [2]. Therefore, QAs should guide

the selection of alternative software architectures from a model transformation

process, considering the synergies and conflicts among them [5].

This work has analyzed how considering QAs at run-time can improve model

transformation processes. Results in the ENIA case, a dashboard UI, show

that using a quality-aware architectural transformation at run-time can improve

architectural-significant QAs such as modifiability and flexibility. The main contribution of this paper is a quality-aware transformation approach at run-time,

which consists of three steps: identifying relevant QAs and constraints, measuring them at run-time, and selecting the best alternative model transformation.

Future work spreads in several directions. First, once we analyzed in the

ENIA case that quality-aware transformations can improve significant requirements in adaptive dashboards UIs, the presented set of metrics can be refined.

Thus, we plan to work further in a reference set of QAs and their corresponding

metrics for adaptive dashboard UIs out of practice, and then provide guidelines

for using those metrics (e.g., combination of metrics). Second, more experimentations and reports can be done in other adaptive domains besides dashboard

UIs. Third, we will study the possibility of handling the QAs during the generation of the alternative architectures to reduce the number of variants. Finally, a

formal validation process in terms of execution times and model checking of the

generated architectures could improve the proposed approach.



Exploring Quality-Aware Architectural Transformations at Run-Time



301



Acknowledgments. This work was funded by the Spanish MINECO and the

Andalusian Government under TIN2013-41576-R and P10-TIC-6114 projects.



References

1. ACG: ENIA Poject - Development of an intelligent web agent of environmental

information. http://acg.ual.es/projects/enia/

2. Ameller, D., Ayala, C., Cabot, J., Franch, X.: Non-functional requirements in architectural decision making. IEEE Softw. 30(2), 61–67 (2013)

3. Ameller, D., Franch, X., Cabot, J.: Dealing with non-functional requirements in

model-driven development. In: RE 2010, pp. 189–198. IEEE (2010)

4. Bencomo, N., Blair, G.: Using architecture models to support the generation and

operation of component-based adaptive systems. In: Cheng, B.H.C., de Lemos, R.,

Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive

Systems. LNCS, vol. 5525, pp. 183–200. Springer, Heidelberg (2009)

5. Boehm, B.: Architecture-based quality attribute synergies and conflicts. In: SAM

2015, pp. 29–34. IEEE Press (2015)

6. Carney, D., Leng, F.: What do you mean by COTS? Finally, a useful answer. IEEE

Softw. 17(2), 83–86 (2000)

7. Carriere, J., Kazman, R., Ozkaya, I.: A cost-benefit framework for making architectural decisions in a business context. In: ICSE 2010, pp. 149–157. IEEE (2010)

8. Criado, J., Iribarne, L., Padilla, N., Ayala, R.: Semantic matching of components

at run-time in distributed environments. In: Ciuciu, I., et al. (eds.) OTM 2015

Workshops. LNCS, vol. 9416, pp. 431–441. Springer, Heidelberg (2015). doi:10.

1007/978-3-319-26138-6 46

9. Criado, J., Mart´ınez, S., Iribarne, L., Cabot, J.: Enabling the reuse of stored model

transformations through annotations. In: Kolovos, D., Wimmer, M. (eds.) ICMT

2015. LNCS, vol. 9152, pp. 43–58. Springer, Heidelberg (2015)

10. Criado, J., Rodr´ıguez-Gracia, D., Iribarne, L., Padilla, N.: Toward the adaptation of component-based architectures by model transformation: behind smart user

interfaces. Softw. Pract. Exp. 45(12), 1677–1718 (2015)

11. Daniel, F., Matera, M.: Mashups - Concepts Models and Architectures. Springer,

Heidelberg (2014)

12. Insfran, E., Gonzalez-Huerta, J., Abrah˜

ao, S.: Design guidelines for the development of quality-driven model transformations. In: Petriu, D.C., Rouquette, N.,

Haugen, Ø. (eds.) MODELS 2010, Part II. LNCS, vol. 6395, pp. 288–302. Springer,

Heidelberg (2010)

13. ISO/IEC: ISO/IEC 25000. Systems and software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Guide to SQuaRE (2014)

14. Loniewsli, G., Borde, E., Blouin, D., Insfran, E.: An automated approach for architectural model transformations. In: Escalona, M.J., Arag´

on, G., Linger, H., Lang,

M., Barry, C., Schneider, C. (eds.) Information System Development: Improving

Enterprise Communication, pp. 295–306. Springer, Switzerland (2014)

15. Mart´ınez-Fern´

andez, S., Ayala, C.P., Franch, X., Marques, H.M.: REARM: a reusebased economic model for software reference architectures. In: Favaro, J., Morisio,

M. (eds.) ICSR 2013. LNCS, vol. 7925, pp. 97–112. Springer, Heidelberg (2013)

16. Ozkaya, I., Nord, R.L., Koziolek, H., Avgeriou, P.: Second international workshop

on software architecture and metrics. In: ICSE 2015, pp. 999–1000. IEEE Press

(2015)



302



J. Criado et al.



17. Salehie, M., Tahvildari, L.: Self-adaptive software: landscape and research challenges. ACM Trans. Auton. Adapt. Syst. 4(2), 14:1–14:42 (2009)

18. Solberg, A., Oldevik, J., Aagedal, J.Ø.: A framework for QoS-aware model transformation, using a pattern-based approach. In: Meersman, R. (ed.) OTM 2004.

LNCS, vol. 3291, pp. 1190–1207. Springer, Heidelberg (2004)

19. Stevanetic, S., Javed, M.A., Zdun, U.: Empirical evaluation of the understandability of architectural component diagrams. In: WICSA 2014 Companion, pp. 4:1–4:8.

ACM, New York (2014)

20. Weyns, D., Ahmad, T.: Claims and evidence for architecture-based self-adaptation:

a systematic literature review. In: Drira, K. (ed.) ECSA 2013. LNCS, vol. 7957,

pp. 249–265. Springer, Heidelberg (2013)

21. Yu, J., Benatallah, B., Casati, F., Daniel, F.: Understanding mashup development.

IEEE Internet Comput. 12(5), 44–52 (2008)

22. Zimmermann, O.: Metrics for architectural synthesis and evaluation: requirements

and compilation by viewpoint: an industrial experience report. In: SAM 2015, pp.

8–14. IEEE Press (2015)



A Credibility and Classification-Based Approach

for Opinion Analysis in Social Networks

Lobna Azaza1(B) , Fatima Zohra Ennaji2 , Zakaria Maamar3 ,

Abdelaziz El Fazziki2 , Marinette Savonnet1 , Mohamed Sadgal2 , Eric Leclercq1 ,

Idir Amine Amarouche4 , and Djamal Benslimane5

1



5



LE2I Laboratory, Dijon University, Dijon, France

Lobna.Azaza@u-bourgogne.fr

2

LISI Laboratory, University of Marrakech, Marrakesh, Morocco

3

Zayed University, Dubai, United Arab Emirates

4

USTHB, Algiers, Algeria

LIRIS Laboratory, University Claude Bernard Lyon 1, Villeurbanne, France



Abstract. There is an ongoing interest in examining users’ experiences

made available through social media. Unfortunately these experiences

like reviews on products and/or services are sometimes conflicting and

thus, do not help develop a concise opinion on these products and/or

services. This paper presents a multi-stage approach that extracts and

consolidates reviews after addressing specific issues such as user multiidentity and user limited credibility. A system along with a set of experiments demonstrate the feasibility of the approach.

Keywords: Social media

identity



1



· Reputation · Credibility · Opinion · Multi-



Introduction



The democratization of the Internet and social media has sustained the growth

of different areas ranging from education and health to trade and entertainment. Nowadays users are “flooded” with a huge volume of content that needs

to be filtered so that it can be used properly. Relying on others’ opinions seems

offering good solutions to certain challenges of this democratization. In this

context, sentiment analysis has become an important research topic. Advanced

techniques and systems should be of a great value to those who need to collect

and analyze opinions posted on the Web with focus on social networks. However, with the abundance of social content, reputation surfaces as a decisive element in identifying the most relevant opinions, and hence developing reputable

sources of information over time. Opinions collected from social networks can be

“polluted” by users in different ways. Owing to the large number of opinion

sources, determining whether these opinions are subjective or not (objective)

remains hard [1]. Unfortunately, analyzing collected data is not an easy task

due to the fact that they are hugely different from traditional data that we are

familiar with. This contrast is illustrated not only at the level of the extracted

data size, but also at the level of its noisiness and formlessness.

c Springer International Publishing Switzerland 2016

L. Bellatreche et al. (Eds.): MEDI 2016, LNCS 9893, pp. 303–316, 2016.

DOI: 10.1007/978-3-319-45547-1 24



304



1.1



L. Azaza et al.



Challenges



This paper tackles two specific challenges: opinion refinement in term of collection and opinion analysis in term of content.

Opinions collected from social networks can be “biased”. Some users have

malicious behaviors while others provide subjective feedback. The objective is

to either promote or degrade the reputation of a product/service sometimes

purposely. For instance, users can evaluate the same products several times from

different accounts or use different identifiers so they do not reveal their real

identities [2–4,10]. This leads into redundant and/or inconsistent opinions. Users

can also provide opinions about products/services that they have not even selfexperienced. In fact they rely on social friends’ opinions [8] and sometimes wordof-mouth. Taking into account opinions without user credibility has a serious

impact on product or service reputation. Hence, while assessing reputation based

on opinions, we should first deal with users who have multiple identifiers and

second define credibility of users who express opinions.

An opinion is a judgment, a viewpoint, or a statement about matters commonly considered to be subjective. Thus, opinion mining paradigm is the fact

of performing a sentiment analysis (i.e. opinions extraction) from heterogeneous

and large amounts of data (Big Data). Several constraints need to be taken

into account while extracting an opinion from sentences that are written by

humans: spam and fake detection (may occur when a user misleads peers by

posting reviews that contain false positive or malicious negative opinions) bipolar words (some words can have a negative and positive meaning like dark

soul versus dark chocolate), and negation (using negation words like “not” and

“nor” along with certain prefixes/suffixes like “un-”, “dis-”, and “-less”).

1.2



Contributions



We tackle the above challenges by proposing an approach that extracts an entity

(e.g., product and service) reputation from social networks. Our contributions

are manifold including: (i) a model for opinions refinement by detecting virtual

users who refer to the same user and calculating users credibility; (ii) a model

for analyzing the refined opinions; and (iii) a validation through experiments.

This is based on random data generation and a variation of different criteria

considered in the detection of the true users.

The rest of the paper is organized as follows. Section 2 discusses briefly some

related work. Section 3 is dedicated to the proposed approach including opinions

refinement as a first step to clean collected data and opinion analysis as a second

step. In Sect. 4, we present some experimental results of the proposed system

followed by some concluding remarks and future-work perspectives in Sect. 5.



2



Related Work



This section discusses opinion refinement and then opinion analysis. Both are

deemed necessary in any exercise of tapping into the opportunities of social

media.



Opinion Analysis in Social Networks



2.1



305



Opinion Refinement



Due to the dynamic and open nature of the Web, discovering same users with

multiple identities is a concern. This concern is tackled in the literature and

different solutions are proposed such as string based, stylometric-based, time

profile-based, and social network-based. These solutions aim to compare different identities to measure to what extent they refer or not to the same real user.

Unfortunately the experimental results of these solutions [2] are not satisfactory

because they do not offer techniques combination in user comparison. Multiidentifier detection is also discussed in [3] and an approach is proposed, it relies

on supervised learning to determine whether a document is written by a certain

author. In [4], multi-identifier users are identified based on a model of communication exchange in a public forum. Hung-Ching et al. observe that posts of

multi-identifier users are not as frequent as those from single-identifier users [4].

These users’ posts are correlated and spread out over time. The proposed solution detects the identifiers whose posts display such statistical anomalies and

identify them as coming from multi-identifier users. In [10] Arjun et al. treat

multi-identifiers as spammers, model “spamicity” as latent and allows exploiting various observed behavioral footprints of users. The intuition is that opinion

spammers have different behavioral distributions than non-spammers.

Opinion diversity is also important when filtering opinions. The TIDY system [6] allows sources diversity by aggregating similar sources into a single virtual source. Such aggregation is based on similarity metrics and learning techniques. Meanwhile, Truth Finder system uses similarity metrics between sources

to choose those that are less similar, and therefore the most representative of all

evaluators [11]. Other approaches like [8] focus on source diversity to minimize

dependency between sources and the influence that they have on each other.

A diversified source reduces the number of samples and improves the ability to

correctly distinguish facts from rumors.

Some approaches consider the evaluator’s credibility through various types

of information to estimate the expertise relative to the subject of study. In

fact, the credibility of evaluators is simply associated with their expertise [12].

Others argue that a user’s credibility must be based on evaluating opinions.

Users are credible if peers find their opinions reliable. In [7], authors proposed

algorithms for a continuous update of the evaluations. It is based on a model

that discriminates high-reputation reviewers from low-reputation reviewers.

2.2



Opinions Analysis



Broadly, each company use two main sources of data (internal and external

sources). Both internal (e.g. customer feedback collected from emails and call

centers) and external sources (e.g. Web) contains opinionated documents. Opinions classification aims to evaluate the mood expressed about a particular product or service.

Different solutions have been proposed in order to extract the polarity of the

opinionated data, they can be categorized into two main approaches. Knowledge



306



L. Azaza et al.



based approach [23] that searches the words occurrences of the documents. It

relies on a predefined dictionary viewed as a set of words with positive or negative

sentiment orientations.

The second approach [24] is based on supervised learning techniques. It aims

to train a statistical classifier on pre-labeled texts and use it to predict the

sentiment orientation of new texts. In [14], a sentiment model, in which too

models were created in order to capture sentiment information and to predict

sales performance. In [15], reviews were used to rank products and merchants

using a statistic and heuristic based models for estimating its true quality.



3



Proposed Approach



Figure 1 illustrates the proposed approach for opinion collection from online

social networks along with the refinement of these opinions based on user credibility. Our approach includes:



Fig. 1. The proposed approach architecture



– A data Extraction module: its goal is to extract the important and valuable

data from the different social networks.

– An opinion refinement module: it aims to eliminates the unvaluable data from

those extracted. To do so, two main tasks are performed: (i) detecting virtual users who refer to the same user using different criteria including email

addresses, profile names, opinion publication dates, and friends’ lists. We rely

on Hierarchical Agglomerative Clustering (HAC) [13] to establish the uniqueness of users. (ii) assessing user credibility that takes into account their behaviors, the consistency of opinions, and influence of the virtual environment on

users.

– An opinion analysis module: it allow to analyze the refined reviews in order

to get the general opinion about the targeted product/service. This model

includes a subjectivity and a polarity classification.



Opinion Analysis in Social Networks



3.1



307



Data Extraction



Data collection refers to all the methods that aims to gather data from online

social networks in order to provide an input data source for the system. But, the

way how this task is accomplished depends especially on the source (Facebook,

Twitter, blogs, etc.) and how the information are represented on the web.

The results obtained from the execution are used to build the social network,

and stored into the Hadoop Distributed File System (HDFS), from where they

can be consulted, acquired or even reprocessed with another Hadoop application.

3.2



Opinion Refinement



The aims of refining and filtering opinions is to improve their quality prior to

carrying out any processing like product reputation (Fig. 2). The opinion filtering module contains three components The multi-identifier detection component

establishes the identifiers of the same user. Afterwards, the opinion data representation component eliminates redundant opinions to produce a heterogeneous

graph that connects users, products, evaluation sites, and evaluations together.

Finally, the credibility component assigns credibility scores to users by exploiting the principles of opinion consistency and user inter-evaluation and interinfluence. After each step of the process, the user profile database is updated

with new credibility scores.



Fig. 2. Opinions refinement



A. Multi-identifier Detection. It is known that users in social networks signup in different accounts for different reasons such as malicious behaviors so they

evaluate the same product many times and/or evaluate the same product from

different Web sites. Hence, detecting virtual users who refer to the same person

is critical to avoid redundant opinions.

Detection Criteria. In our approach, we propose a multi-identifier detection

model using the following criteria:

– E-mail address. Users, especially non malicious, generally sign-up in

accounts in different social Web sites using the same e-mail address.



308



L. Azaza et al.



– Profile name. Assuming the non-malicious behavior of users, they generally provide the same profile name for different social Web sites. In [5],

Muniba et al. focus on finding similarities in profile names based on orthographic variations to detect multi-identifiers. In [2] Fredrik et al. use of

Jaro-Winkler distance to compare profile names.

– Opinion publication dates. Users tend to post evaluations from different

login within a short time period [2]. The Euclidean distance has been used

to calculate how similar two identifiers of publication dates are.

– Friend list. It represents the list of followers or friends connected with

a certain user. Users mostly have similar friend lists in different social

networks [2].

Description of the Multi-identifiers Detection Model. A multi-criteria

detection of multiple identifiers aims to generate classes of identifiers. Each

class represents identifiers that probably correspond to the same user. The

aforementioned criteria, taken alone, produce different classifications. Two

identifiers can belong to the same user according to one criterion and to

different users according to another :

1. In order to detect multi-identifiers using email-address/profile name criterion, we create a class for all the identifiers sharing the same email

(respectively, profile name). The number of classes generated by each criterion is not the same, neither the population of classes is.

2. Detection of multi-Identifiers with opinion publication date criterion. We

consider two opinions having the same publication date if the difference

in dates does not exceed a certain threshold. Detection of multi-identifiers

via this criterion is different from what was previously defined because

each user may publish many opinions as he has one profile name and email

address. Therefore, the principle of multi-identifier detection via publication dates consists first, of comparing the dates between each pair of identifiers i and j and then calculating a similarity score ωij between i and j.

Equation (1) indicates how ωij is calculated, where X represents the number of publications in a common time interval for i and j, and variables Pi

and Pj represent, respectively, the total number of publications for i and

j. The higher the number of publications in common dates is, the higher

the probability that the two identifiers are from the same user.

ωij =



2X

|Pi + Pj |



0 ≤ ωij ≤ 1



(1)



The set of similarity scores between couples of identifiers is represented

in a similarity matrix (Fig. 3a). This matrix is then used via techniques

of HAC to generate classes of similar identifiers (Fig. 3b) HAC is automatic and used in data analysis from a set of individuals. Its purpose

is to classify individuals with similar behaviors by a similarity criterion

defined in advance. The most similar individuals will be put together in

homogeneous groups. The classification is agglomerative; it starts from a

situation where all individuals are put alone in one class; and is hierarchical, it produces classes increasingly large.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

3 Scenario 2: Considering Components' Size, Provider, and Type

Tải bản đầy đủ ngay(0 tr)

×