Tải bản đầy đủ - 0trang
3 Scenario 2: Considering Components' Size, Provider, and Type
J. Criado et al.
Fig. 4. Transformed UI in scenario 1 and its architecture
initial UI must be restructured accordingly. Figure 5 shows the three alternatives
that can be reached from A21. The architecture A31 replaces the previous map
M 1 by M 2 and uses C3 for containing it. In addition, C3 also contains previous
LL2 and L2 components. The new map is resolved with an M 1 component.
The second alternative, A32, replaces the initial map M 1 by M 3, a map which
includes the layer list and legend functionality. The new map is M 1 type. The
alternative A33 includes the same replacement of A31 but, in this case, the new
map is M 2 type and it is contained in a C3 component.
In this transformation process, we want to show that the approach could be
used not only for QAs, but also for constraints. In this case, we consider three
constraints from Step 1 as drivers to chose a valid alternative for the stakeholders.
As a part of Step 2, the metrics used to measure the constraints are:
(a) tsize for components’ size constraints – The total size of components must
be minimized and architectures with a value over 5MB will be rejected.
Thus, we try to improve the response time of the browser by reducing the
payload of the web components that must be initialized in the UI.
(b) hp for components’ provider constraints – The homogenization among components’ providers must be maximized because, in this scenario, UIs with
similar representation are preferred over components with heterogeneous
representations. The use of the same provider does not guarantee the pursued homogenization, but the possibilities are greater if the entity providing
the components is the same.
(c) htype for components’ type constraints – The homogenization among components’ types must be maximized because it is important to oﬀer the maximum degree of consistency in the structure and representation of the UI’s
components. Therefore, components of the same type oﬀer their functionality in the same manner.
Regarding the alternatives of Fig. 5, each architecture accomplishes the best
value for a diﬀerent metric. In the case of tsize, the value of 925 MB from A32
Exploring Quality-Aware Architectural Transformations at Run-Time
is the best alternative. Focusing on hp, the best alternative is the architecture
A31 because it gathers four components (M 1, M 2, L2 and LL2) from the same
provider among the total of seven. With respect to htype, the best alternative is
A33, because it contains four components (M 2 * 2 and C3 * 2) having elements
of the same type in the architecture. Therefore model transformation M T 4 is
chosen in the case of prioritize hp, whereas M T 5 and M T 6 are selected if tsize
and htype are prioritized, respectively.
Discussion About Using the Approach in ENIA
Answering the research question stated in Sect. 1, we can say that considering
QAs at run-time has improved the modiﬁability and ﬂexibility of generated
architectures by model transformations in the ENIA case.
The main advantage of using metrics related to QAs and constraints in ENIA
is the incorporation of quality information in the process of selecting the best
transformation operation that can be applied in UI adaptation. This allows us to
use additional information (to functional interfaces) for solving the transformation process. In this sense, if these metrics are not applied, the transformation
can generate architectures which may result in some drawbacks for the present
use or future modiﬁcations.
For example, looking at the ﬁrst scenario (see Fig. 3), it is possible to obtain
A22 as a solution instead of A21. In this case, we are ‘loosing’ the capability
of having a session component which can be resized. On the contrary, using
our approach we are able to oﬀer ‘resizability’ of the components through the
maximization of the rr metric. If we do not give the maximum priority to rr
but we take it into account in the adaptation, at least the transformation at
run-time will be enriched to improve the ﬂexibility of generated UIs.
: S1, T2, C3
: M1, M2, M3, L2, LL2
max(3/7, 4/7) =
max(2/4, 2/4) =
max(4/8, 4/8) =
Fig. 5. Transformation alternatives using three example constraints
J. Criado et al.
With regard to the future modiﬁcations, let us suppose that in the scenario 2
none of the metrics are applied and consequently, the generated transformation
is equivalent to T 5 and the resulting architecture is A32. In this case, if the next
adaptation step is aimed to remove the capability of selecting the layers to be
displayed on the map (i.e., LL provided interface), we faced two options: (1)
the component M 3 must be modiﬁed for hiding this interface and disabling its
functionality, or (2) the component M 3 must be replaced by other map which
does not include this functionality, such as M 1 or M 2. In both options, we
have to perform additional operations compared to those required in the case of
starting the adaptation from the architectures A31 or A33, scenario in which we
only should remove the component LL2.
Apart from these advantages, nothing is free in software engineering, and the
performance of the QA-aware model transformation approach is an important
trade-oﬀ that must be noted. Performance is related to the computation time
necessary to (a) build each transformation alternative, (b) execute them obtaining the resulting architecture, and (c) measure each architecture to decide which
transformation alternative is the best in terms of the quality information. The
cost of these three execution times must be incorporated to the evaluation of the
adaptation process described in  and, consequently, may not be possible to
evaluate a large number of alternatives at run-time, having to limit the number
of architectures evaluated to satisfy performance requirements.
Conclusions and Future Work
It is well accepted in the software architecture community that QAs are the
most important drivers of architecture design . Therefore, QAs should guide
the selection of alternative software architectures from a model transformation
process, considering the synergies and conﬂicts among them .
This work has analyzed how considering QAs at run-time can improve model
transformation processes. Results in the ENIA case, a dashboard UI, show
that using a quality-aware architectural transformation at run-time can improve
architectural-signiﬁcant QAs such as modiﬁability and ﬂexibility. The main contribution of this paper is a quality-aware transformation approach at run-time,
which consists of three steps: identifying relevant QAs and constraints, measuring them at run-time, and selecting the best alternative model transformation.
Future work spreads in several directions. First, once we analyzed in the
ENIA case that quality-aware transformations can improve signiﬁcant requirements in adaptive dashboards UIs, the presented set of metrics can be reﬁned.
Thus, we plan to work further in a reference set of QAs and their corresponding
metrics for adaptive dashboard UIs out of practice, and then provide guidelines
for using those metrics (e.g., combination of metrics). Second, more experimentations and reports can be done in other adaptive domains besides dashboard
UIs. Third, we will study the possibility of handling the QAs during the generation of the alternative architectures to reduce the number of variants. Finally, a
formal validation process in terms of execution times and model checking of the
generated architectures could improve the proposed approach.
Exploring Quality-Aware Architectural Transformations at Run-Time
Acknowledgments. This work was funded by the Spanish MINECO and the
Andalusian Government under TIN2013-41576-R and P10-TIC-6114 projects.
1. ACG: ENIA Poject - Development of an intelligent web agent of environmental
2. Ameller, D., Ayala, C., Cabot, J., Franch, X.: Non-functional requirements in architectural decision making. IEEE Softw. 30(2), 61–67 (2013)
3. Ameller, D., Franch, X., Cabot, J.: Dealing with non-functional requirements in
model-driven development. In: RE 2010, pp. 189–198. IEEE (2010)
4. Bencomo, N., Blair, G.: Using architecture models to support the generation and
operation of component-based adaptive systems. In: Cheng, B.H.C., de Lemos, R.,
Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive
Systems. LNCS, vol. 5525, pp. 183–200. Springer, Heidelberg (2009)
5. Boehm, B.: Architecture-based quality attribute synergies and conflicts. In: SAM
2015, pp. 29–34. IEEE Press (2015)
6. Carney, D., Leng, F.: What do you mean by COTS? Finally, a useful answer. IEEE
Softw. 17(2), 83–86 (2000)
7. Carriere, J., Kazman, R., Ozkaya, I.: A cost-benefit framework for making architectural decisions in a business context. In: ICSE 2010, pp. 149–157. IEEE (2010)
8. Criado, J., Iribarne, L., Padilla, N., Ayala, R.: Semantic matching of components
at run-time in distributed environments. In: Ciuciu, I., et al. (eds.) OTM 2015
Workshops. LNCS, vol. 9416, pp. 431–441. Springer, Heidelberg (2015). doi:10.
9. Criado, J., Mart´ınez, S., Iribarne, L., Cabot, J.: Enabling the reuse of stored model
transformations through annotations. In: Kolovos, D., Wimmer, M. (eds.) ICMT
2015. LNCS, vol. 9152, pp. 43–58. Springer, Heidelberg (2015)
10. Criado, J., Rodr´ıguez-Gracia, D., Iribarne, L., Padilla, N.: Toward the adaptation of component-based architectures by model transformation: behind smart user
interfaces. Softw. Pract. Exp. 45(12), 1677–1718 (2015)
11. Daniel, F., Matera, M.: Mashups - Concepts Models and Architectures. Springer,
12. Insfran, E., Gonzalez-Huerta, J., Abrah˜
ao, S.: Design guidelines for the development of quality-driven model transformations. In: Petriu, D.C., Rouquette, N.,
Haugen, Ø. (eds.) MODELS 2010, Part II. LNCS, vol. 6395, pp. 288–302. Springer,
13. ISO/IEC: ISO/IEC 25000. Systems and software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Guide to SQuaRE (2014)
14. Loniewsli, G., Borde, E., Blouin, D., Insfran, E.: An automated approach for architectural model transformations. In: Escalona, M.J., Arag´
on, G., Linger, H., Lang,
M., Barry, C., Schneider, C. (eds.) Information System Development: Improving
Enterprise Communication, pp. 295–306. Springer, Switzerland (2014)
andez, S., Ayala, C.P., Franch, X., Marques, H.M.: REARM: a reusebased economic model for software reference architectures. In: Favaro, J., Morisio,
M. (eds.) ICSR 2013. LNCS, vol. 7925, pp. 97–112. Springer, Heidelberg (2013)
16. Ozkaya, I., Nord, R.L., Koziolek, H., Avgeriou, P.: Second international workshop
on software architecture and metrics. In: ICSE 2015, pp. 999–1000. IEEE Press
J. Criado et al.
17. Salehie, M., Tahvildari, L.: Self-adaptive software: landscape and research challenges. ACM Trans. Auton. Adapt. Syst. 4(2), 14:1–14:42 (2009)
18. Solberg, A., Oldevik, J., Aagedal, J.Ø.: A framework for QoS-aware model transformation, using a pattern-based approach. In: Meersman, R. (ed.) OTM 2004.
LNCS, vol. 3291, pp. 1190–1207. Springer, Heidelberg (2004)
19. Stevanetic, S., Javed, M.A., Zdun, U.: Empirical evaluation of the understandability of architectural component diagrams. In: WICSA 2014 Companion, pp. 4:1–4:8.
ACM, New York (2014)
20. Weyns, D., Ahmad, T.: Claims and evidence for architecture-based self-adaptation:
a systematic literature review. In: Drira, K. (ed.) ECSA 2013. LNCS, vol. 7957,
pp. 249–265. Springer, Heidelberg (2013)
21. Yu, J., Benatallah, B., Casati, F., Daniel, F.: Understanding mashup development.
IEEE Internet Comput. 12(5), 44–52 (2008)
22. Zimmermann, O.: Metrics for architectural synthesis and evaluation: requirements
and compilation by viewpoint: an industrial experience report. In: SAM 2015, pp.
8–14. IEEE Press (2015)
A Credibility and Classification-Based Approach
for Opinion Analysis in Social Networks
Lobna Azaza1(B) , Fatima Zohra Ennaji2 , Zakaria Maamar3 ,
Abdelaziz El Fazziki2 , Marinette Savonnet1 , Mohamed Sadgal2 , Eric Leclercq1 ,
Idir Amine Amarouche4 , and Djamal Benslimane5
LE2I Laboratory, Dijon University, Dijon, France
LISI Laboratory, University of Marrakech, Marrakesh, Morocco
Zayed University, Dubai, United Arab Emirates
USTHB, Algiers, Algeria
LIRIS Laboratory, University Claude Bernard Lyon 1, Villeurbanne, France
Abstract. There is an ongoing interest in examining users’ experiences
made available through social media. Unfortunately these experiences
like reviews on products and/or services are sometimes conﬂicting and
thus, do not help develop a concise opinion on these products and/or
services. This paper presents a multi-stage approach that extracts and
consolidates reviews after addressing speciﬁc issues such as user multiidentity and user limited credibility. A system along with a set of experiments demonstrate the feasibility of the approach.
Keywords: Social media
· Reputation · Credibility · Opinion · Multi-
The democratization of the Internet and social media has sustained the growth
of diﬀerent areas ranging from education and health to trade and entertainment. Nowadays users are “ﬂooded” with a huge volume of content that needs
to be ﬁltered so that it can be used properly. Relying on others’ opinions seems
oﬀering good solutions to certain challenges of this democratization. In this
context, sentiment analysis has become an important research topic. Advanced
techniques and systems should be of a great value to those who need to collect
and analyze opinions posted on the Web with focus on social networks. However, with the abundance of social content, reputation surfaces as a decisive element in identifying the most relevant opinions, and hence developing reputable
sources of information over time. Opinions collected from social networks can be
“polluted” by users in diﬀerent ways. Owing to the large number of opinion
sources, determining whether these opinions are subjective or not (objective)
remains hard . Unfortunately, analyzing collected data is not an easy task
due to the fact that they are hugely diﬀerent from traditional data that we are
familiar with. This contrast is illustrated not only at the level of the extracted
data size, but also at the level of its noisiness and formlessness.
c Springer International Publishing Switzerland 2016
L. Bellatreche et al. (Eds.): MEDI 2016, LNCS 9893, pp. 303–316, 2016.
DOI: 10.1007/978-3-319-45547-1 24
L. Azaza et al.
This paper tackles two speciﬁc challenges: opinion refinement in term of collection and opinion analysis in term of content.
Opinions collected from social networks can be “biased”. Some users have
malicious behaviors while others provide subjective feedback. The objective is
to either promote or degrade the reputation of a product/service sometimes
purposely. For instance, users can evaluate the same products several times from
diﬀerent accounts or use diﬀerent identiﬁers so they do not reveal their real
identities [2–4,10]. This leads into redundant and/or inconsistent opinions. Users
can also provide opinions about products/services that they have not even selfexperienced. In fact they rely on social friends’ opinions  and sometimes wordof-mouth. Taking into account opinions without user credibility has a serious
impact on product or service reputation. Hence, while assessing reputation based
on opinions, we should ﬁrst deal with users who have multiple identiﬁers and
second deﬁne credibility of users who express opinions.
An opinion is a judgment, a viewpoint, or a statement about matters commonly considered to be subjective. Thus, opinion mining paradigm is the fact
of performing a sentiment analysis (i.e. opinions extraction) from heterogeneous
and large amounts of data (Big Data). Several constraints need to be taken
into account while extracting an opinion from sentences that are written by
humans: spam and fake detection (may occur when a user misleads peers by
posting reviews that contain false positive or malicious negative opinions) bipolar words (some words can have a negative and positive meaning like dark
soul versus dark chocolate), and negation (using negation words like “not” and
“nor” along with certain preﬁxes/suﬃxes like “un-”, “dis-”, and “-less”).
We tackle the above challenges by proposing an approach that extracts an entity
(e.g., product and service) reputation from social networks. Our contributions
are manifold including: (i) a model for opinions reﬁnement by detecting virtual
users who refer to the same user and calculating users credibility; (ii) a model
for analyzing the reﬁned opinions; and (iii) a validation through experiments.
This is based on random data generation and a variation of diﬀerent criteria
considered in the detection of the true users.
The rest of the paper is organized as follows. Section 2 discusses brieﬂy some
related work. Section 3 is dedicated to the proposed approach including opinions
reﬁnement as a ﬁrst step to clean collected data and opinion analysis as a second
step. In Sect. 4, we present some experimental results of the proposed system
followed by some concluding remarks and future-work perspectives in Sect. 5.
This section discusses opinion reﬁnement and then opinion analysis. Both are
deemed necessary in any exercise of tapping into the opportunities of social
Opinion Analysis in Social Networks
Due to the dynamic and open nature of the Web, discovering same users with
multiple identities is a concern. This concern is tackled in the literature and
diﬀerent solutions are proposed such as string based, stylometric-based, time
proﬁle-based, and social network-based. These solutions aim to compare diﬀerent identities to measure to what extent they refer or not to the same real user.
Unfortunately the experimental results of these solutions  are not satisfactory
because they do not oﬀer techniques combination in user comparison. Multiidentiﬁer detection is also discussed in  and an approach is proposed, it relies
on supervised learning to determine whether a document is written by a certain
author. In , multi-identiﬁer users are identiﬁed based on a model of communication exchange in a public forum. Hung-Ching et al. observe that posts of
multi-identiﬁer users are not as frequent as those from single-identiﬁer users .
These users’ posts are correlated and spread out over time. The proposed solution detects the identiﬁers whose posts display such statistical anomalies and
identify them as coming from multi-identiﬁer users. In  Arjun et al. treat
multi-identiﬁers as spammers, model “spamicity” as latent and allows exploiting various observed behavioral footprints of users. The intuition is that opinion
spammers have diﬀerent behavioral distributions than non-spammers.
Opinion diversity is also important when ﬁltering opinions. The TIDY system  allows sources diversity by aggregating similar sources into a single virtual source. Such aggregation is based on similarity metrics and learning techniques. Meanwhile, Truth Finder system uses similarity metrics between sources
to choose those that are less similar, and therefore the most representative of all
evaluators . Other approaches like  focus on source diversity to minimize
dependency between sources and the inﬂuence that they have on each other.
A diversiﬁed source reduces the number of samples and improves the ability to
correctly distinguish facts from rumors.
Some approaches consider the evaluator’s credibility through various types
of information to estimate the expertise relative to the subject of study. In
fact, the credibility of evaluators is simply associated with their expertise .
Others argue that a user’s credibility must be based on evaluating opinions.
Users are credible if peers ﬁnd their opinions reliable. In , authors proposed
algorithms for a continuous update of the evaluations. It is based on a model
that discriminates high-reputation reviewers from low-reputation reviewers.
Broadly, each company use two main sources of data (internal and external
sources). Both internal (e.g. customer feedback collected from emails and call
centers) and external sources (e.g. Web) contains opinionated documents. Opinions classiﬁcation aims to evaluate the mood expressed about a particular product or service.
Diﬀerent solutions have been proposed in order to extract the polarity of the
opinionated data, they can be categorized into two main approaches. Knowledge
L. Azaza et al.
based approach  that searches the words occurrences of the documents. It
relies on a predeﬁned dictionary viewed as a set of words with positive or negative
The second approach  is based on supervised learning techniques. It aims
to train a statistical classiﬁer on pre-labeled texts and use it to predict the
sentiment orientation of new texts. In , a sentiment model, in which too
models were created in order to capture sentiment information and to predict
sales performance. In , reviews were used to rank products and merchants
using a statistic and heuristic based models for estimating its true quality.
Figure 1 illustrates the proposed approach for opinion collection from online
social networks along with the reﬁnement of these opinions based on user credibility. Our approach includes:
Fig. 1. The proposed approach architecture
– A data Extraction module: its goal is to extract the important and valuable
data from the diﬀerent social networks.
– An opinion reﬁnement module: it aims to eliminates the unvaluable data from
those extracted. To do so, two main tasks are performed: (i) detecting virtual users who refer to the same user using diﬀerent criteria including email
addresses, proﬁle names, opinion publication dates, and friends’ lists. We rely
on Hierarchical Agglomerative Clustering (HAC)  to establish the uniqueness of users. (ii) assessing user credibility that takes into account their behaviors, the consistency of opinions, and inﬂuence of the virtual environment on
– An opinion analysis module: it allow to analyze the reﬁned reviews in order
to get the general opinion about the targeted product/service. This model
includes a subjectivity and a polarity classiﬁcation.
Opinion Analysis in Social Networks
Data collection refers to all the methods that aims to gather data from online
social networks in order to provide an input data source for the system. But, the
way how this task is accomplished depends especially on the source (Facebook,
Twitter, blogs, etc.) and how the information are represented on the web.
The results obtained from the execution are used to build the social network,
and stored into the Hadoop Distributed File System (HDFS), from where they
can be consulted, acquired or even reprocessed with another Hadoop application.
The aims of reﬁning and ﬁltering opinions is to improve their quality prior to
carrying out any processing like product reputation (Fig. 2). The opinion ﬁltering module contains three components The multi-identiﬁer detection component
establishes the identiﬁers of the same user. Afterwards, the opinion data representation component eliminates redundant opinions to produce a heterogeneous
graph that connects users, products, evaluation sites, and evaluations together.
Finally, the credibility component assigns credibility scores to users by exploiting the principles of opinion consistency and user inter-evaluation and interinﬂuence. After each step of the process, the user proﬁle database is updated
with new credibility scores.
Fig. 2. Opinions reﬁnement
A. Multi-identifier Detection. It is known that users in social networks signup in diﬀerent accounts for diﬀerent reasons such as malicious behaviors so they
evaluate the same product many times and/or evaluate the same product from
diﬀerent Web sites. Hence, detecting virtual users who refer to the same person
is critical to avoid redundant opinions.
Detection Criteria. In our approach, we propose a multi-identiﬁer detection
model using the following criteria:
– E-mail address. Users, especially non malicious, generally sign-up in
accounts in diﬀerent social Web sites using the same e-mail address.
L. Azaza et al.
– Proﬁle name. Assuming the non-malicious behavior of users, they generally provide the same proﬁle name for diﬀerent social Web sites. In ,
Muniba et al. focus on ﬁnding similarities in proﬁle names based on orthographic variations to detect multi-identiﬁers. In  Fredrik et al. use of
Jaro-Winkler distance to compare proﬁle names.
– Opinion publication dates. Users tend to post evaluations from diﬀerent
login within a short time period . The Euclidean distance has been used
to calculate how similar two identiﬁers of publication dates are.
– Friend list. It represents the list of followers or friends connected with
a certain user. Users mostly have similar friend lists in diﬀerent social
Description of the Multi-identifiers Detection Model. A multi-criteria
detection of multiple identiﬁers aims to generate classes of identiﬁers. Each
class represents identiﬁers that probably correspond to the same user. The
aforementioned criteria, taken alone, produce diﬀerent classiﬁcations. Two
identiﬁers can belong to the same user according to one criterion and to
diﬀerent users according to another :
1. In order to detect multi-identiﬁers using email-address/proﬁle name criterion, we create a class for all the identiﬁers sharing the same email
(respectively, proﬁle name). The number of classes generated by each criterion is not the same, neither the population of classes is.
2. Detection of multi-Identiﬁers with opinion publication date criterion. We
consider two opinions having the same publication date if the diﬀerence
in dates does not exceed a certain threshold. Detection of multi-identiﬁers
via this criterion is diﬀerent from what was previously deﬁned because
each user may publish many opinions as he has one proﬁle name and email
address. Therefore, the principle of multi-identiﬁer detection via publication dates consists ﬁrst, of comparing the dates between each pair of identiﬁers i and j and then calculating a similarity score ωij between i and j.
Equation (1) indicates how ωij is calculated, where X represents the number of publications in a common time interval for i and j, and variables Pi
and Pj represent, respectively, the total number of publications for i and
j. The higher the number of publications in common dates is, the higher
the probability that the two identiﬁers are from the same user.
|Pi + Pj |
0 ≤ ωij ≤ 1
The set of similarity scores between couples of identiﬁers is represented
in a similarity matrix (Fig. 3a). This matrix is then used via techniques
of HAC to generate classes of similar identiﬁers (Fig. 3b) HAC is automatic and used in data analysis from a set of individuals. Its purpose
is to classify individuals with similar behaviors by a similarity criterion
deﬁned in advance. The most similar individuals will be put together in
homogeneous groups. The classiﬁcation is agglomerative; it starts from a
situation where all individuals are put alone in one class; and is hierarchical, it produces classes increasingly large.