Tải bản đầy đủ - 0 (trang)
3 Second Experiment: SMS Spam Filtering with Polarity Score

3 Second Experiment: SMS Spam Filtering with Polarity Score

Tải bản đầy đủ - 0trang

Short Messages Spam Filtering Using Sentiment Analysis



151



On this way, we identify the best 6 classifiers in Table 4 and combined each

one with 56 different filter settings. We analyze the achieved results and we get

the classifiers that obtained the best ten results in terms of accuracy. The next

step is to apply those classifiers to the new datasets that we created using the

sentiment classifiers. Those results are shown in Table 5.

Table 5. Comparison between Top10 results

Sentiment analyzer

None

Tb 005



Tb 01



Tb -005



Spam classifier



FP Acc



FP Acc



FP Acc



NBMU.i.c.stwv.go.ngtok



28



98.85 36



98.73 36



98.73 35



98.74



NBMU.i.t.c.stwv.go.ngtok



27



98.82 17



98.60 16



98.71 8



98.76



NBM.i.t.c.stwv.go.ngtok



32



98.78 37



98.74 37



98.74 33



98.78



NBMU.i.t.c.stwv.go.ngtok.stemmer 23



98.78 36



98.71 36



98.71 34



98.74



NBM.c.stwv.go.wtok



13



98.76 33



98.78 32



98.80 28



98.85



NBM.i.t.c.stwv.go.ngtok.stemmer



34



98.76 34



98.74 33



98.74 32



98.76



NBMU.c.stwv.go.wtok



13



98.76 17



98.60 16



98.71 8



98.76



CNB.i.t.c.stwv.go.ngtok.stemmer



37



98.73 28



98.85 28



98.85 27



98.82



FP Acc



NBM.i.c.stwv.go.ngtok



37



98.73 26



98.85 25



98.87 22



98.91



NBM.i.c.stwv.go.ngtok.stemmer



36



98.73 23



98.80 22



98.82 19



98.82



The table shows that a higher accuracy than in the previous experiment is

obtained applying new settings of the filters to the original SMS dataset.

Analyzing the data we realize that in half of cases polarity helps to improve

the accuracy, and also that by applying the Bayesian Logistic Regression classifier to the dataset created by TextBlob-005 classifier we improve the best result.

While without polarity the best result is 98.85 %, using the polarity a 98.91 %

of accuracy is obtained.

Furthermore, in same cases where better accuracy is not obtained, polarity

helps to reduce the number of false positives. Obtaining a percentage of 98.76 %

and 8 false positives in two cases, reducing from 27 false positives in one case

and from 13 in the other.



5



Conclusions



This work shows that sentiment analysis can help improving short messages

spam filtering. We have demonstrated that, adding the polarity obtained during a sentiment analysis of the short text messages, in most of the cases the

result is improved in terms of accuracy. Moreover, we have proved our hypothesis obtaining better results with the polarity score than the top result without

polarity. (98.91 % versus 98.58 %). Despite the difference in the percentage does



152



E. Ezpeleta et al.



not seem to be relevant, if we take into account the amount of real SMS traffic

the improvement is significant.

In addition, a substantial improvement in terms of the number of false positive messages have been achieved in this work. For instance, during the first

experiment the best accuracy with 0 false positives is obtained using polarity:

98.67 %.

Acknowledgments. This work has been partially funded by the Basque Department of Education, Language policy and Culture under the project SocialSPAM

(PI 2014 1 102).



References

1. Almeida, T.A., G´

omez Hidalgo, J.M., Yamakami, A.: Contributions to the study

of SMS spam filtering: new collection and results. In: Proceedings of the 11th ACM

Symposium on Document Engineering, pp. 259–262. ACM (2011)

2. Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: an enhanced lexical

resource for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 2200–

2204 (2010)

3. Delany, S.J., Buckley, M., Greene, D.: SMS spam filtering: methods and data.

Expert Syst. Appl. 39(10), 9899–9908 (2012)

4. Echeverria Briones, P.F., Altamirano Valarezo, Z.V., Pinto Astudillo, A.B.,

Sanchez Guerrero, J.D.C.: Text mining aplicado a la clasificaci´

on y distribuci´

on

autom´

atica de correo electr´

onico y detecci´

on de correo spam (2009)

5. Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for

opinion mining. In: Proceedings of LREC, vol. 6, pp. 417–422. Citeseer (2006)

6. Ezpeleta, E., Zurutuza, U., G´

omez Hidalgo, J.M.: Does sentiment analysis help

´

in Bayesian spam filtering? In: Mart´ınez-Alvarez,

F., Troncoso, A., Quinti´

an, H.,

Corchado, E. (eds.) HAIS 2016. LNCS, vol. 9648, pp. 79–90. Springer, Heidelberg

(2016). doi:10.1007/978-3-319-32034-2 7

7. Giyanani, R., Desai, M.: Spam detection using natural language processing. Int. J.

Comput. Sci. Res. Technol. 1, 55–58 (2013)

8. Gon¸calves, P., Ara´

ujo, M., Benevenuto, F., Cha, M.: Comparing and combining sentiment analysis methods. In: Proceedings of the First ACM Conference

on Online Social Networks, pp. 27–38. ACM (2013)

9. Kumar, R.K., Poonkuzhali, G., Sudhakar, P.: Comparative study on email spam

classifier using data mining techniques. In: Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 1, pp. 14–16 (2012)

10. Lau, R.Y.K., Liao, S.Y., Kwok, R.C.W., Xu, K., Xia, Y., Li, Y.: Text mining and probabilistic language modeling for online review spam detection.

ACM Trans. Manag. Inf. Syst. 2(4), 25:1–25:30 (2012). http://doi.acm.org/

10.1145/2070710.2070716

11. Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In:

Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Berlin

(2012).

http://scholar.google.de/scholar.bib?q=info:CEE7xsbkW6cJ:scholar.

google.com/&output=citation&hl=de&as sdt=0&as ylo=2012&ct=citation&

cd=1



Short Messages Spam Filtering Using Sentiment Analysis



153



12. Musto, C., Semeraro, G., Polignano, M.: A comparison of lexicon-based approaches

for sentiment analysis of microblog posts. In: Information Filtering and Retrieval,

p. 59 (2014)

13. Nagwani, N.K., Sharaff, A.: SMS spam filtering and thread identification using

bi-level text classification and clustering techniques. J. Inf. Sci. 1–13, 3 December

2015. doi:10.1177/0165551515616310

14. Nakov, P., Kozareva, Z., Ritter, A., Rosenthal, S., Stoyanov, V., Wilson, T.:

Semeval-2013 task 2: Sentiment analysis in Twitter (2013)

15. Narayan, A., Saxena, P.: The curse of 140 characters: evaluating the efficacy of

SMS spam detection on android. In: Proceedings of the Third ACM Workshop on

Security and Privacy in Smartphones and Mobile Devices, pp. 33–42. ACM (2013)

16. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr.

2(1–2), 1–135 (2008)

17. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using

machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86, EMNLP 2002,

Association for Computational Linguistics, Stroudsburg, PA, USA (2002). http://

dx.doi.org/10.3115/1118693.1118704

18. Turney, P.D.: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on

Association for Computational Linguistics, pp. 417–424, ACL 2002, Association

for Computational Linguistics, Stroudsburg, PA, USA (2002). http://dx.doi.org/

10.3115/1073083.1073153



Preliminary Study on Automatic Recognition

of Spatial Expressions in Polish Texts

Michal Marci´

nczuk(B) , Marcin Oleksy, and Jan Wieczorek

G4.19 Research Group, Department of Computational Intelligence,

Faculty of Computer Science and Management,

Wroclaw University of Technology, Wroclaw, Poland

{michal.marcinczuk,marcin.oleksy,jan.wieczorek}@pwr.edu.pl

Abstract. In the paper we cover the problem of spatial expression

recognition in text for Polish language. A spatial expression is a text

fragment which describes a relative location of two or more physical

objects to each other. The first part of the paper treats about a Polish

corpus annotated with spatial expressions and annotators agreement. In

the second part we analyse the feasibility of spatial expression recognition by overviewing relevant tools and resources for text processing

for Polish. Then we present a knowledge-based approach which utilizes

the existing tools and resources for Polish, including: a morpho-syntactic

tagger, shallow parsers, a dependency parser, a named entity recognizer,

a general ontology, a wordnet and a wordnet to ontology mapping. We

also present a dedicated set of manually created syntactic and semantic

patterns for generating and filtering candidates of spatial expressions. In

the last part we discuss the results obtained on the reference corpus with

the proposed method and present detailed error analysis.

Keywords: Information extraction

relations



1



·



Spatial expressions



·



Spatial



Introduction



Spatial information describes a physical location of an object in a space. The

location of the object can be encoded using some absolute values in a coordinate

system or by relative references to other entities in the space. The latter are

called spatial relations. The spatial relations can be expressed directly by spatial

expressions [1] or indirectly by a chain of semantic relations [2]. A comprehensive

recognition of spatial relations between objects described in a text requires a

complex chain of processing and reasoning, including a morphological analysis of

text, a recognition of object mentions, a text parsing, a named entity recognition

and classification, a coreference resolution, a semantic relation recognition and

interpretation. Thus, the feasibility and the quality relies on the availability of

certain tools and their performance.

In the article we focus on spatial expressions in which the spatial information

is encoded using a spatial preposition. According to our preliminary research this

type of spatial relation is predominant in Polish texts (more than 53 % of all

expressions), while the second dominant group for which the spatial information

is contained by a verb covers only 24 % of all instances.

c Springer International Publishing Switzerland 2016

P. Sojka et al. (Eds.): TSD 2016, LNAI 9924, pp. 154–162, 2016.

DOI: 10.1007/978-3-319-45510-5 18



Preliminary Study on Automatic Recognition of Spatial Expressions



2



155



Reference Corpora



To annotate the corpora we followed the guideline for Task 3: Spatial Role Labeling [1]. In the current research we annotated only four types of elements: trajector

(localized object, henceforth TR), landmark (object of reference, henceforth LM),

spatial indicator (henceforth SI) and region (henceforth RE). The remaining elements (i.e. path, direction, motion) will be included in the future. Below is a brief

description of the two sets of documents and Table 1 contains their statistics.

WGT — a set of 50 geographical texts from Polish Wikipedia (Wikipedia

Geographical Texts). This type of articles contains many spatial relations

between objects. The set contains 17,407 tokens and 484 spatial expressions

(one expression for every 36 tokens). This set was annotated by two linguists

independently and the inter-annotator agreement was measured by means

of the Dice coefficient. The agreement was 82 %. This set of documents was

used to define syntactic patterns (see Sect. 3.3) and semantic constraints (see

Sect. 3.4).

KPWr — 1,526 document from the KPWr corpus [3]. The set contains 419,769

tokens and 2,581 spatial expressions (one relation for every 162 tokens). The

documents were annotated only by one linguist. The set was used to evaluate

our method and to perform an error analysis.

Table 1. Statistic of corpora annotated with spatial expressions

WGT

Documents

Tokens

Spatial expressions



KPWr



50



1, 526



17, 407



419, 769



484



2, 581



Annotations

Spatial object



3

3.1



1212



5050



Spatial indicator



743



2615



Region



114



149



Recognition of Spatial Expressions

Procedure



We assume that the text will be preprocessed with a morphological tagger, a

shallow parser, a dependency parser, a named entity recognizer and a word

sense disambiguation (see Sect. 3.2). We will also use a wordnet for Polish, an

ontology and a mapping between the wordnet and the ontology. We will use

a set of syntactic patterns to identify spatial expression candidates, i.e., tuples

containing a trajector (TR), a spatial preposition (SI), a landmark (LM) and

optionally a region (RE) (see Sect. 3.3). In the last step, the set of generated

candidates will be tested against a set of semantic constrains (see Sect. 3.4).



156



3.2



M. Marci´

nczuk et al.



Preprocessing



Different ways of expressing spatial relations require specialized tools and

resources to make the task feasible. The basic text processing, which includes

text segmentation, morphological analysis and disambiguation, can be easily performed with any of the existing taggers for Polish, i.e., WCRFT [4], Concraft

[5] or Pantera [6]. The accuracy of the taggers is satisfactory and varies between

89–91 %.

First we need to identify relevant entity mentions. The mentions can be:

named entities, nominal phrases, pronouns and null verbs (verbs which do not

have an explicit subject cf. [7]). The spans of entity mentions can be recognized

using a shallow parser for Polish, i.e., Spejd [8] with a NKJP grammar [9] or

IOBBER [10]. Spejd recognizes a flat structure of nominal groups (NG) with

their semantic and syntactic heads. A noun group preceded by a preposition is

marked with the preposition as a prepositional nominal group (PrepNG). Every

noun and pronoun creates a separate nominal group. The only exception is a

sequence of nouns that is annotated as a single nominal group. IOBBER also

recognizes a flat structure of nominal phrases (NP). A nominal phrase is defined

as a phrase which is a subject or an object of a predicative-argument structure.

This means, that some NP can contain several NGs. For example “m czyzna

siedz cy w piwnicy” (a man sitting in the basement) is a single NP that contains

two NGs: “m czyzna” (a man) and “piwnicy” (the basement) as a part of the

PrepNG “w piwnicy”. Spejd combined with IOBBER can be used to identify

expressions with a spatial preposition within a single NP. According to [11] the

NKJP grammars evaluated on the NKJP corpus obtained 78 % of precision and

81 % of recall in recognition of NGs, PrepNGs, NumNGs and PrepNumNGs.

IOBBER evaluated on the KPWr corpus obtained 74 % of precision and 74 % of

recall in recognition of NPs [10].

Second we need to categorize the entities into physical and non-physical. For

nominal phrases this can done using a mapping between plWordNet [12] and the

SUMO ontology [13]. The mapping contains more than 175000 links between

synsets from plWordNet and SUMO concepts. Other types of mentions (i.e.,

named entities, pronouns and null verbs) require additional processing. Most

named entities are not present in the plWordNet so they cannot be mapped onto

SUMO through the mapping. However, they can be mapped by their categories

which can be recognized using one of the named entity recognition tools for

Polish, i.e., Liner2 [14] or Nerf [5]. Liner2 for a coarse-grained model recognizing

top 9 categories obtained 73 % of precision and 69 % of recall, and for a finegrained model with 82 categories 67 % and 59 %, respectively. Nevertheless, a

mapping of named entity categories onto SUMO is required.

3.3



Spatial Expression Patterns



Using the WGT dataset we generated a list of most frequent syntactis patterns

for spatial expressions. We used phrases recognized by Spejd and IOBBER, i.e.

noun groups and noun phrases. We have identified the following types of patterns:



Preliminary Study on Automatic Recognition of Spatial Expressions



157



1. TR and LM appear in the same noun phrase:

– TR is followed directly by SI and LM, i.e. “a cat on the roof”,

– TR and LM are arguments of a participle, i.e. “a cat sitting on the roof

(...)”

2. TR and LM appear in different noun phrases — they are arguments of the

same verb:

– TR and LM are single objects, i.e. “a cat is sitting on the roof”,

– TR and/or LM are lists of objects.

For the first group of patters we consider every noun phrases (NP) containing

all the elements in a certain order, i.e. a noun group (NG) as a TR, an optional

participle, a preposition (SI), a noun group as a LM with potential RE. For the

second group of patterns we used a dependency parser for Polish [15] and select

verbs with attached a noun group as a subject (TR) and a preposition with a

noun group (LM).

3.4



Semantic Constraints



The last step is categorization of the expressions into spatial and non-spatial.

For English, a common approach is to categorize the expressions on the basis of

preposition categorization [16]. Since there are no resources nor tools to recognize

spatial prepositions for Polish, we decided to apply a knowledge-based approach,

i.e., the candidate expressions are tested against a set of semantic constraints.

Information about the type of a spatial relation comes not only from the

meaning of a preposition (spatial indicator). Lexemes referring to a TR and

to a LM also influence the identification of the relation denoted in a text. We

can use the same preposition (in a formal sense, i.e., in a combination with the

same grammatical case of a noun) to introduce information about spatial or

non-spatial relations (e.g. time)). For example:

1. Piotr siedzial przed domem. (Piotr was sitting in front of the house.)

2. Piotr siedzial przed godzina w biurze. (Piotr was sitting in the office an hour

ago.)



Preposition: on (Pol. na)

Interpretation: Object TR is outside the LM, typically in contact with external limit of LM by

applying pressure with its weight.

Example of usage: “ksi ka le y na stole” (a book is on the table)

TR’s semantic restrictions (SUMO classes): artifact, contentbearingobject, device, animal,

plant, pottery, meat, preparedfood, chain,

LM’s semantic restrictions (SUMO classes): artifact, LandTransitway, boardorblock, boatdeck, shipdeck, stationaryartifact



Fig. 1. Schema #1 for preposition ON (Pol. “NA”)



158



M. Marci´

nczuk et al.



The semantic restrictions of TR and LM can be used to distinguish a specific

meaning of the preposition due to a specific spatial cognitive pattern [17]. We

described them using classes from the SUMO ontology trying to capture the

prototypical conceptualization of the patterns. The set of consists contains over

170 cognitive schemes for spatial relations (including the specificity of the objects

in the relation). For example there are 18 schemes for preposition “NA” (on). A

sample schema for preposition “NA” is presented on Fig. 1.



4



Evaluation



The evaluation was performed on the KPWr corpus presented in Sect. 2.

4.1



Generation of Spatial Expression Candidates



The candidates of spatial expressions are recognized using syntactic patterns

presented in Sect. 3.3. We were able to recognize 44.58 % of all expressions with

the precision of 11.12 %. At this stage the precision is not an issue (the candidates

will be filtered in the second step). The problem was a low recall, however, we do

not cover this problem in this article and we have left it for the future research.

4.2



Semantic Filtering of Candidates



Table 2 shows the impact of semantic filtering of spatial expressions. The number

of false positives was dramatically reduced and the precision increased from

11.12 % to 66.67 %. At the same time the number of true positives was lowered

and the recall dropped from 44.58 % to 29.81 %. In the next two sections we

discuss the reasons of false positives and false negatives.

Table 2. Complete evaluation

Semantic filtering Precision Recall



4.3



F-measure



No



11.12 %



44.58 % 17.80 %



Yes



66.67 %



29.81 % 41.20 %



Analysis of False Positives



We have carefully analysed more than 200 false positives in order to identify the

source of errors. We have identified the following types of errors grouped into

two categories (external and procedure):

– external errors — errors committed by the external tools:

• WSD error (ca. 19 %) — the tool for word sense disambiguation assigned

an incorrect meaning of the trajector or landmark what caused an error

of mapping the noun to a SUMO concept. For example “szczyt dlugiej

przerwy” (Eng. peak of the playtime) was interpreted as a mountain peak

and mapped onto the LandArea concept.



Preliminary Study on Automatic Recognition of Spatial Expressions



159



• Spejd error (ca. 13 %) — two adjacent noun groups were incorrectly

joined into a single noun group. For example “przejmowanie przez [banki

przestrzeni publicznej] ” (Eng. acquiring by [banks the public space])

should be recognized as two separate noun groups, i.e. “przejmowanie

przez [banki] [przestrzeni publicznej] ” (Eng. acquiring by [banks] [the public space] ). This error leads to incorrect assignment of phrase head.

• Liner2 error (ca. 10 %) — the named entity recognition tool assigned an

incorrect category of named entity, for example a person was marked as

a city.

• dependency parser error (ca. 3 %) — trajector and/or landmark were

assigned to an incorrect verb,

• WCRFT error (ca. 1 %) — incorrect morphological disambiguation.

– procedure errors — errors committed by our procedure:

• the meaning of the verb is required to interpret the expression

(ca. 17 %) — this problem affects candidates generated with dependency

patterns (a verb with arguments). The interpretation of the expression

depends on the meaning of the verb, for example “Adam stoi przed

Marta” (Eng. Adam is standing in front of Marta) reflects a spatial relation but “Adam schowal zeszyt przed Marta” (Eng. Adam hid a notebook

from Marta) does not.

• semantic filtering error (ca. 9 %) — this type of error is caused by too

general cognitive schemes.

• motion expressions (ca. 10 %) — semantic constrains do not distinguish

between static and motion expressions. In our experiment we focused only

on static expressions and motion ones were not annotated in the corpus.

• TR is the phrase head’s modifier, not the head itself (ca. 10 %) — for

example in the phrase “[szef rady osiedla] w [Sokolnikach]” (Eng. [the

head of council estate] in [Sokolniki] ) the council estate should be the

TR, not the head,

• non-physical objects (ca. 5 %) — a non-physical object is recognized as a

TR, for example “napis na ´scianie” (Eng. writing on the wall ).

• inverse order of the TR and LM (ca. 3 %) — in most cases the TR is

followed by a preposition and a LM. In some cases the order is shifted, for

example “[lawki] w [galeriach handlowych] pod [schodami ruchomymi]”

(Eng. [benches] in [shopping malls] under [the escalator] ).

Near 46 % of false positives were caused by errors committed by the external

tools used in the text preprocessing (tagging, chunking, dependency parsing,

named entity recognition, word sense disambiguation). It might be a laborious

task to improve them as it requires an improvement of the tools which are already

state-of-the-art.

The remaining 54 % of false positives are caused by our procedure of spatial

expression recognition and there is still a room for improvement. The largest

group of errors are caused by the fact that we did not consider verbs in the

semantic filtering. The preliminary experiment proofed that the verbs should be

included. The second largest groups of errors are committed by the current set



160



M. Marci´

nczuk et al.



of semantic schemes which are in same cases to general. The set of schemes need

to be revised.

4.4



Analysis of False Negatives



We also carefully analysed about 130 false negatives (candidates incorrectly discarded by the semantic filtering) to identify the main sources of errors. We

identified the following groups of errors:

– WSD error (ca. 50 %) — a candidate was discarded because the WSD tool

assigned incorrect sense which was latter mapped on a SUMO concept not

present in the schemas,

– missing mapping to SUMO (ca. 15 %) — a candidate was discarded because

trajector and/or landmark were not mapped onto SUMO and the semantic

filtering could not be applied,

– missing schema for semantic filtering (ca. 14 %) — a candidate was discarded

due to missing semantic schema.

– Liner2 error (ca. 11 %) — a candidate was discarded due to incorrect proper

name category assignment.

In the case of false negatives (candidates incorrectly discarded by semantic

filtering) the majority (76 %) were caused by errors committed by external tools.

Only 15 % of candidates were discarded due to missing semantic schemas.



5



Conclusions and Future Work



In the paper we discussed the problem of spatial expression recognition for

Polish language. We presented and evaluated a proof of concept for recognition

of spatial expressions for Polish using a knowledge-based two-stage approach.

We focused on expressions containing a spatial preposition. The preliminary

results are promising in terms for precision — 66.67 %. There is still a room

for improvement by revising the set of semantic schemes and including semantic

of the verbs. The main problem which still needs to be addressed is the recall

of spatial expression candidates. The current set of patterns without semantic

filtering was able to discover only 44.58 % of all expressions. The next 15 % of

expressions are lost due to missing schemes for semantic filtering. The other way

to improve the performance of semantic expression recognition is to improve the

tools used in the preprocessing. However, this is a laborious task as the tools

already have state-of-the-art performance and the small errors committed by

every single tools cumulate to a large number.

Acknowledgements. Work financed as part of the investment in the CLARIN-PL

research infrastructure funded by the Polish Ministry of Science and Higher Education.



Preliminary Study on Automatic Recognition of Spatial Expressions



161



References

1. Kolomiyets, O., Kordjamshidi, P., Bethard, S., Moens, M.: SemEval-2013 task 3:

spatial role labeling. In: Second Joint Conference on Lexical and Computational

Semantics (SEM). Proceedings of the Seventh International Workshop on Semantic

Evaluation (SemEval 2013), Atlanta, USA. ACL, East Stroudsburg (2013)

2. LDC: ACE (Automatic Content Extraction) English Annotation Guidelines for

Relations. Argument (2008)

3. Broda, B., Marci´

nczuk, M., Maziarz, M., Radziszewski, A., Wardy´

nski, A.: KPWr:

towards a free corpus of Polish. In: Calzolari, N., Choukri, K., Declerck, T.,

Do˘

gan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey. European Language Resources Association

(ELRA), May 2012

4. Radziszewski, A.: A tiered CRF tagger for Polish. In: Bembenik, R., Skonieczny, L.,

Rybi´

nski, H., Kryszkiewicz, M., Niezg´

odka, M. (eds.) Intelligent Tools for Building

a Scientific Information. SCI, vol. 467, pp. 215–230. Springer, Heidelberg (2013)

5. Waszczuk, J.: Harnessing the CRF complexity with domain-specific constraints.

The case of morphosyntactic tagging of a highly inflected language. In: Proceedings

of COLING 2012, no. December 2012, pp. 2789–2804 (2012)

6. Aceda´

nski, S.: A morphosyntactic Brill tagger for inectional languages. In:

Loftsson, H., Ră

ognvaldsson, E., Helgad

ottir, S. (eds.) IceTAL 2010. LNCS, vol.

6233, pp. 3–14. Springer, Heidelberg (2010)

7. Kaczmarek, A., Marci´

nczuk, M.: Heuristic algorithm for zero subject detection in

Polish. In: Kr´

al, P., Matouˇsek, V. (eds.) TSD 2015. LNCS, vol. 9302, pp. 378–386.

Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6 43

8. Przepi´

orkowski, A.: Powierzchniowe przetwarzanie jezyka polskiego. Problemy

wsp´

olczesnej nauki, teoria i zastosowania: In˙zynieria lingwistyczna. Akademicka

Oficyna Wydawnicza “Exit” (2008)

9. Glowi´

nska, K.: Anotacja skladniowa NKJP. In: Przepi´

orkowski, A., Ba´

nko, M.,



orski, R.L., Lewandowska-Tomaszczyk, B. (eds.) Narodowy Korpus Jezyka

Polskiego, pp. 107–127. Wydawnictwo Naukowe PWN, Warsaw (2012)

10. Radziszewski, A., Pawlaczek, A.: Large-scale experiments with NP chunking of

Polish. In: Sojka, P., Hor´

ak, A., Kopeˇcek, I., Pala, K. (eds.) TSD 2012. LNCS, vol.

7499, pp. 143–149. Springer, Heidelberg (2012)

11. Radziszewski, A.: Metody znakowania morfosyntaktycznego i automatycznej

plytkiej analizy skladniowej jezyka polski. Ph.D. thesis, Politechnika Wroclawska,

Wroclaw (2012)

12. Maziarz, M., Piasecki, M., Szpakowicz, S.: Approaching plWordNet 2.0. In: Proceedings of the 6th Global Wordnet Conference, Matsue, Japan, January 2012

13. Pease, A., Niles, I., Li, J.: The suggested upper merged ontology: a large ontology

for the semantic web and its applications. In: Working Notes of the AAAI-2002

Workshop on Ontologies and the Semantic Web (2002)

14. Marci´

nczuk, M., Koco´

n, J., Janicki, M.: Liner2 — a customizable framework for

proper names recognition for Polish. In: Bembenik, R., Skonieczny, L., Rybi´

nski, H.,

Kryszkiewicz, M., Niezg´

odka, M. (eds.) Intelligent Tools for Building a Scientific

Information. SCI, vol. 467, pp. 231–254. Springer, Heidelberg (2013)

15. Wr´

oblewska, A., Woli´

nski, M.: Preliminary experiments in Polish dependency parsing. In: Bouvry, P., Klopotek, M.A., Lepr´evost, F., Marciniak, M., Mykowiecka, A.,

Rybi´

nski, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 279–292. Springer, Heidelberg

(2012)



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

3 Second Experiment: SMS Spam Filtering with Polarity Score

Tải bản đầy đủ ngay(0 tr)

×