Tải bản đầy đủ - 0 (trang)
5 Experiment 2: Using Morphologically Reconstructed Forms

5 Experiment 2: Using Morphologically Reconstructed Forms

Tải bản đầy đủ - 0trang

210



I.E. Onyenwe and M. Hepple



were classed as only either SUF (suffix) or PRE (prefix). In Pattern2, an “rV”

tag was used for past tense suffixes.3 In Pattern3, more morph-tags for suffixes

were added to indicate grammatical functions (see Table 4 for a list of the morphtags). In Pattern4, prefix and stem were collapsed to form one part (e.g. changing

“a/PRE bi.a/VSI XS kwa/LSUF” to “abi.a/VSI XS kwa/LSUF”), eliminating

the “PRE” tag. Morph-tags serve as important clues for disambiguation (Tables 3

and 4).

Table 3. Some samples of morphological reconstructed words into stems and affixes.

Word form FnTBL initial state



FnTBL truth state



Pattern1 PRE+SUF

nwukwasi. nwu/ROOT kwasi./SUF

nwukwara nwu/ROOT kwa/SUF ra/SUF



nwu/VSI XS kwasi./SUF

nwu/VrV XS kwa/SUF ra/SUF



nwukwasi.ri. nwu/ROOT kwasi./SUF ri./SUF

.ino.donwu i./PRE no./ROOT do/SUF nwu/SUF

abi.akwara a/PRE bi.a/ROOT kwa/SUF ra/SUF



nwu/VrV XS kwasi./SUF ri./SUF

.i/PRE no./VIF XS do/SUF nwu/SUF

a/PRE bi.a/VPP XS kwa/SUF ra/SUF



nu.ru.kwanu. nu./ROOT ru./SUF kwa/SUF nu./SUF

nu./VSI XS ru./SUF kwa/SUF nu./SUF

enwechaghi. e/PRE nwe/ROOT cha/SUF ghi./SUF

e/PRE nwe/VSI XS cha/SUF ghi./SUF

Pattern2 added “rV” to pattern1 and pattern3 added all Morpho-tags

nwukwasi. nwu/ROOT kwasi./LSUF

nwukwara nwu/ROOT kwa/rSUF ra/rV



nwu/VSI XS kwasi./LSUF

nwu/VrV XS kwa/rSUF ra/rV



nwukwasi.ri. nwu/ROOT kwasi./rSUF ri./rV

i.no.donwu i./PRE no./ROOT do/iSUF nwu/iSUF

abi.akwara a/PRE bi.a/ROOT kwa/eSUF ra/APP



nwu/VrV XS kwasi./rSUF ri./rV

.i/PRE no./VIF XS do/iSUF nwu/iSUF



a/PRE bi.a/VPP XS kwa/eSUF ra/APP

nu.ru.kwanu. nu./ROOT ru./xSUF kwa/xSUF nu./LSUF nu./VSI XS ru./xSUF kwa/xSUF nu./LSUF

enwechaghi. e/PRE nwe/ROOT cha/xSUF ghi./NEG e/PRE nwe/VSI XS cha/xSUF ghi./NEG



We applied FnTBL and SLLT to the morphologically reconstructed data

(here referring to the latter as SLLT2, to differentiate from its earlier use in

experiment 1). Note that the reconstructed representations for individual words

are presented in isolation, i.e. so the systems cannot exploit contextual information of neighbouring words/tags (in contrast to experiment 1). FnTBL was

chosen due to its effective pattern induction method, and SLLT because it outperformed the other systems in experiment 1. SLLT2 was simply trained directly

over the reconstructed data. For FnTBL, we intervene to specify a particular initial state for TBL, in which the stem is given the initial tag “ROOT”. Hence,

TBL should generate only rules that, based on the morphological context, replace

a ROOT tag with a final tag, the latter being a POS tag for a complete inflected

unknown word. Results are shown in Table 2 under experiment2.

3



Here, “rV” means letter r and any vowel (a,e,u,o,i,i.,o.,u.) attached to a word in Igbo

like “bi.ara” came, “ko.ro.” told, “riri” ate, “nwuru” shone, etc. It is a past tense

marker if attached to active verb or indicate stative/passive meaning if attached to

a stative verb [3]. Therefore, it is an important cue in predicting past tense verbs or

verbs having applicative meaning “APP”.



Predicting Morphologically-Complex Unknown Words in Igbo



211



Table 4. Morph-tags and meanings

Tag/marker Meaning



5



APP



Applicative



NEG



Negative



INFL



Inflection for perfect tense



rV



Inflection for past tense



LSUF



Last suffix marker for morphologically-inflected simple verb



xSUF



Suffix within morphologically-inflected simple verb



eSUF



Suffixes within morphologically-inflected participle



iSUF



Suffixes within morphologically-inflected infinitive



rSUF



Suffixes within morphologically-inflected past tense verb



Discussion



Table 5 illustrates how the root+affixes have served as important cues for predicting the tags of morphological-complex unknown words. “Initial Tag” column is the FnTBL initial state, “Transformation Process” is predicted tag after

applying transformational rules (adjacent to the stems are rules indexes that

fired) and “Final Tag” is the FnTBL predicted tags returned as the tags for

morphologically-complex unknown words. In Example 1 of Table 5, the word

“begorochaa” refers to “perching activity of a group of birds”, and is an inflected

simple verb (VSI XS) with “be” as the stem. Two transformational rules fired

to transform its initial tag “ROOT” to the final tag “VSI XS”. The first change

is made by Rule 0, which is a generic rule that changes ROOT tag to VrV (past

tense verb) tag, provided there is a suffix within the [+1,+2] window. This rule is

ordered first in the rule list, as it has the highest correction score over the training data. Rule 2 applies next, changing VrV to “VSI XS” because xSUF and

LSUF occur with inflected simple verbs. In other examples, Rule 2 changes VrV

to VrV XS (inflected past tense verbs) because of rSUF that occur in past tense

verbs, Rules 3 and 4 change VrV and VSI XS to VPP XS (inflected participle)

tag whenever the previous tag after stem is PRE, Rule 5 changes VPP XS to

VPERF (perfect tense verbs) due to presence of INFL, Rule 6 changes VPP XS

to VIF XS (inflected infinitive verb) due to .i prefix, and Rule 36 changes VPERF

to VPERF XS (inflected perfect tense) due to xSUF.

The accuracy scores of both experiments are shown in 1st and 2nd experiment columns of Table 2. “PRE+SUF” column is for Pattern1 variation, the

accuracy scores are substantive, FnTBL did better than SLLT2 in all cases

and performed better than other taggers in experiment 1 except SLLT. Column

“PRE+SUF+rV” shows Pattern2 variation, SLLT2 and FnTBL performances

generally improve and FnTBL scored better than majority in 1st experiment.

“All” column is for Pattern3, here is to test the prospect of paradigmatic tagging where meaningful tags for affixes are added to indicate their grammatical



212



I.E. Onyenwe and M. Hepple



Table 5. Examples of transformational rules generated by FnTBL. The numbers are

the rule identity numbers that fired.

Initial tag



Transformation process



Final tag



be VSI XS VSI XS | 0 1

go xSUF xSUF

ro APP APP

chaa LSUF LSUF



begorochaa/VSI XS



kpo. ROOT VrV XS

chi rSUF rSUF

bi rSUF rSUF

do rSUF rSUF



kpo. VrV XS VrV XS | 0 2

chi rSUF rSUF

bi rSUF rSUF

do rSUF rSUF



kpo.chibidoro/VrV XS



ro rV rV



ro rV rV



Example 1

be ROOT VSI XS

go xSUF xSUF

ro APP APP

chaa LSUF LSUF

Example 2



Example 3

e PRE PRE

kpo ROOT VPP XS

cha eSUF eSUF

pu. eSUF eSUF



e PRE PRE

kpo VPP XS VPP XS | 0 3

cha eSUF eSUF

eSUF eSUF



ekpochapu./VPP XS



i. PRE PRE

kp VIF XS VIF XS | 0 1 4 6

cha LSUF LSUF



i.kpo.cha/VIF XS



Example 4

i. PRE PRE

kpo. ROOT VIF XS

cha LSUF LSUF

Example 5

e PRE PRE

e PRE PRE

echekwala/VPERF XS

che ROOT VPERF XS che VPERF XS VPERF XS | 0 3 5

36

kwa xSUF xSUF

kwa xSUF xSUF

la INFL INFL

la INFL INFL



functions. This gave best scores of 90.44 %, 91.99 % and 88.46 % for FnTBL and

these scores are several points better than scores achieved by the taggers used

in the 1st experiment (see Table 2).

Finally, column “All(-PRE)” for Pattern4 is to verify the strength of prefix

as unknown word predictive feature considering it is only one character length.

Comparing columns “All(-PRE)” and “All”, shows that there are lost in accuracies of column “All” for FnTBL (e.g. about 9.0 in IgbTNT). This is contrary

to English where addition of prefix as feature caused negative effect on the accuracy of unknown words [11]. Surprisingly, SLLT2 increased in its accuracy against

decrease in FnTBL scores. But an experiment on IgbTMT using SLLT tagger’s



Predicting Morphologically-Complex Unknown Words in Igbo



213



technique for handling unknown words shows that using only suffix features

gave accuracy of 77.26 % and addition of prefix features improved the accuracy

on the morphologically-complex words by 9.22 %. The reason for SLLT2’s accuracy increment can be explained in regard with “PRE” ambiguity. “PRE” tag

is used to indicate prefix whether it is “i./i” for infinitive or “a/e” for participle

and simple verbs, therefore, collapsing it with the stem removes this ambiguity.

Statistical taggers will require large data size to properly disambiguate this case.



6



Conclusion



We have shown that use of actual linguistically-informed segmentation into

stems and associated affixes are good for predicting unknown inflected words

in Igbo. Through morphological reconstruction, inflected words are represented

in machine learnable pattern that exploits morphological characteristics during tagging process for handling unknown words. The performance of FnTBL

that inductively learns linguistic patterns reveals that our method is better than

methods that automatically identify suffix-string cues (e.g. for use in POS tagging), based on extracting the last n letters of words to serve as proxy for actual

linguistic affixes. The standard method using arbitrary suffix strings as cues

is challenged by complexity associated with morphologically-complex unknown

words of the language. In Igbo language, a single root can produce as many possible word-forms as possible through the use of affixes of varying lengths ranging

from 1 to 5, which may concatenate with a stem in variable orders.

In the future work, it is important to perform full morphological analysis

on Igbo. This experiment excludes some inflected classes (like nouns) as it will

lead to full morphological analysis which is beyond the research scope. Also,

morphological analysis on the compound verbs and exploiting n neighbouring

words information are ignored. These lapses will hide some important information required for NLP task. Of course, this is pointing towards building a

large-scale computational morphologies for Igbo.

Acknowledgments. We acknowledge the financial support of Tertiary Education

Trust Fund Nigeria and Nnamdi Azikiwe University (NAU) Nigeria. Many thanks to

Dr. Uchechukwu Chinedu of linguistic department, NAU for his very helpful discussion.



References

1. Brants, T.: TnT: a statistical part-of-speech tagger. In: Proceedings of the Sixth

Conference on Applied Natural Language Processing, pp. 224–231 (2000)

2. Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21, 543–565 (1995).

MIT Press, Cambridge

3. Emenanjo, N.E.: Elements of Modern Igbo Grammar: A Descriptive Approach.

Oxford University Press, Ibadan (1978)



214



I.E. Onyenwe and M. Hepple



4. Hal´

acsy, P., Kornai, A., Oravecz, C.: HunPos: an open source trigram tagger. In:

Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and

Demonstration Sessions, pp. 209–212 (2007)

5. Kupiec, J.: Robust part-of-speech tagging using a hidden Markov model. J. Comput. Speech Lang. 6(3), 225–242 (1992)

6. Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of the Second Meeting of the North American Chapter of the Association for

Computational Linguistics on Language Technologies, pp. 1–8 (2001)

7. Onyenwe, I.E., Uchechukwu, C., Hepple, M.: Part-of-speech tagset and corpus

development for Igbo, an African. In: Proceedings of LAW VIII-8th Linguistic

Annotation, Workshop 2014 in conjuction with COLING 2014, Dublin, Ireland

23–24 August 2014, pp. 93–98. Association for Computational Linguistics (2014)

8. Onyenwe, I.E., Hepple, M., Uchechukwu, C., Ezeani, I.: Use of transformationbased learning in annotation pipeline of Igbo, an African language. In: Joint

Workshop on Language Technology for Closely Related Languages, Varieties and

Dialects, p. 24 (2015)

9. Ratnaparkhi, A., et al.: A maximum entropy model for part-of-speech tagging.

In: Proceedings of the Conference on Empirical Methods in Natural Language

Processing, vol. 1, pp. 133–142 (1996)

10. Samuelsson, C.: Morphological tagging based entirely on Bayesian inference. In:

9th Nordic Conference on Computational Linguistics (2013)

11. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech

tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference

of the North American Chapter of the Association for Computational Linguistics

on Human Language Technology, vol. 1, pp. 173–180 (2003)



Morphosyntactic Analyzer for the Tibetan

Language: Aspects of Structural Ambiguity

Alexei Dobrov1 , Anastasia Dobrova2 , Pavel Grokhovskiy1(B) ,

Nikolay Soms2 , and Victor Zakharov1

1



Saint-Petersburg State University, Saint-Petersburg, Russia

{a.dobrov,p.grokhovskiy,v.zakharov}@spbu.ru

2

LLC “AIIRE”, Saint-Petersburg, Russia

{adobrova,nsoms}@aiire.org



Abstract. The paper deals with the development of a morphosyntactic

analyzer for the Tibetan language. It aims to create a consistent formal grammatical description (formal grammar) of the Tibetan language,

including all grammar levels of the language system from morphosyntax (syntactics of morphemes) to the syntax of composite sentences and

supra-phrasal entities. Syntactic annotation was created on the basis of

morphologically tagged corpora of Tibetan texts. The peculiarity of the

annotation consists in combining both the immediate constituents structure and the dependency one. An individual (basic) grammar module of

Tibetan grammatical categories, its possible values, and restrictions on

their combination are created. Types of tokens and their grammatical

features form the basis of the formal grammar being produced, allowing

linguistic processor to build syntactic trees of various kinds. Methods of

avoiding redundant structural ambiguity are proposed.

Keywords: Corpus linguistics · Tibetan language · Morphosyntactic

analyzer · Tokenization · Immediate constituents · Dependency grammar · Natural language processing



1



Introduction



In order to build a morphosyntactic analyzer of Tibetan texts it is necessary to

create a formal grammar, which includes all levels of the grammatical system

of the Tibetan language from morphosyntax (syntactics of morphemes) to the

syntax of sentences and supra-phrasal units.

A few currently available studies of the Tibetan language analyze mainly

Tibetan morphology, the only notable exception being “The Classical Tibetan

language” by Stephen Beyer [1], which also includes an extensive presentation

of Tibetan syntax. Still, this work does not fully describe the Tibetan system of

syntactic units and often has a speculative character, since the conclusions are

not supported by textual corpora.

The current project has the following objectives:

c Springer International Publishing Switzerland 2016

P. Sojka et al. (Eds.): TSD 2016, LNAI 9924, pp. 215–222, 2016.

DOI: 10.1007/978-3-319-45510-5 25



216



A. Dobrov et al.



1. To create a system of syntactical annotation of Tibetan texts, including the

information about Tibetan grammatical categories, their possible values, and

restrictions on their combinations;

2. To develop a formal grammatical module of the open natural language

processing system, which is able to perform a complete morphological and

syntactic analysis of Tibetan texts;

3. To annotate a corpus of Tibetan texts syntactically.

The developed tools of language processing allow automatic markup procedures for further extension of the corpus.

The project uses an innovative approach to syntactic analysis, combining the

immediate constituents structure (CS) and the dependency structure (DS). Such

combination was proposed in [2] for the first time, but the available mathematical

model did not allow to implement it in an algorithm. This study takes advantage of the AIIRE linguistic processor (Artificial Intelligence-based Information

Retrieval Engine), which is one of the most successful computer realizations of

combined CS and DS analysis [3]. Still, in order to be apllied to the Tibetan

language, it requires a new research on Tibetan syntax.



2



The Project’s Corpora Resources and Software



The project’s database comprises two corpora of the Tibetan language developed

at the Saint-Petersburg University. The Basic Corpus of the Tibetan Classical

Language includes texts in a variety of classical Tibetan literary genres. The

Corpus of Indigenous Tibetan Grammar Treatises consists of the most influential

grammar works, the earliest of them proposedly dating back to 7th -8th centuries.

Both corpora are provided with metadata and morphological annotation.

The corpora comprise 34,000 and 48,000 tokens, respectively. Tibetan texts

are represented both in a Tibetan Unicode script and in a standard Latin transliteration [4].

The AIIRE linguistic processor with an open code is used for the project.

AIIRE implements the method of inter-level interaction proposed by Tseitin in

1985 [5], which ensures the effective ambiguity resolution, based on the rules.

The principle of inter-level interaction helps to minimize the combinatorial

explosion, which is very important for NLP software. The formal grammar analysis produces a considerable rate of ambiguity, especially when ellipsis is possible.

The principle of inter-level interaction, implemented in the AIIRE linguistic

processor, allows to apply upper-level constraints to lower-level ambiguity, and

thus reduces the number of produced combinations.

The architecture of AIIRE and the developed algorithms of text analysis

allow to apply this technology to languages of different types in the form of

independent language modules, while the analysis algorithms are independent

of the language. Besides the modules for the Russian language, modules for

Arabic and Abkhaz languages were previously created, and the present project

aims at developing a module for the Tibetan language, which is well known

for the absence of formally marked word boundaries and ambiguity of word

segmentation as such.



Morphosyntactic Analyzer for the Tibetan Language



3



217



Representation of Tibetan Morphological Structures

in AIIRE



The linguistic processor needs to recognize all the relevant linguistic units in

the input text. For inflectional languages the input units are easy to identify as

word forms, separated by space, punctuation marks etc. It is not the case for

the Tibetan language, as there are no universal symbols to separate the input

string into words or morphemes.

The developed module for the Tibetan language performs the segmentation

of the input string into morphemes by using the Aho-Corasick algorithm (by

Aho and Corasick), that allows to find all possible substrings of the input string

according to a given dictionary. The algorithm builds a tree, describing a finite

state machine with terminal nodes corresponding to completed character strings

of elements (in this case, morphemes) from the input dictionary.

Language module contains a dictionary of morphemes, which allows the

machine to create the tree in advance at the build stage of the language module, while in the runtime of the linguistic processor the tree is being loaded

as a component of an executable module which brings its initialization time to

minimum.

Two special files were created in order to analyze Tibetan morphology and morphonemics: the grammarDefines.py file determines types of

tokens, their properties and restrictions, while the atoms.txt file (the allomorphs dictionary) specifies the morpheme, the token type and properties for each allomorph, also in accordance with grammarDefines.py

file. For example, the following entry in the allomorphs dictionary

indicates that the

(dga’) allomorph is the basic allomorph of the

(dga’)

morpheme, that is the verb root in the indicative mood, having no tense property

and ending in a vowel.

The materials processed on the pilot stage allow to identify the following

token types: v suff (verbal suffix), punct (punctuation mark), p dem root (the

root of the demonstrative pronoun), n root (noun root), p pers root (the root

of the personal pronoun), case marker, v root (verbal root), num root (numeral

root), p def root (attributive pronoun), fin (statement end marker). All these

types of tokens have their possible morphological and morphonemic features

indicated in the grammarDefines.py file. For example, the verbal root has such

potential properties indicated as the mood (indicative, imperative), the tense

(present, past, future), the availability of tense category (true/false) and the

type of final phonemes defining the compatibility of the verbal root with suffix

allomorphs. The restrictions for the verbal root require that the category of

tense is available only if respective parameter “has tense” is set to “true”, and

the parameter of “mood” is set to “indicative”.

These types of tokens and their grammatical features form the basis of the

formal grammar being developed, allowing the linguistic processor to build syntactic treebanks of various structure.



218



A. Dobrov et al.



Case markers of the Tibetan language, unlike inflected languages, function

as postpositions rather than as suffix morphemes; and the most appropriate

model to correspond to Tibetan morpheme order is seen as representing the

nominal phrases followed by a case marker in postposition. Case marker takes the

final position after all the modifiers of the nominal phrases, including numerals

and pronouns. In this case, the order of English morphemes is opposite: the

phrase dus gcig na is translated as at (na, locative) one (gcig) time (dus). Both

the numeral and the nominal phrase modified by it may be further modified:

the numeral may be complex, and the nominal phrase may be modified by an

adjective or a participle etc. Thus, it seems to interpret the case marker of the

Tibetan morphosyntax as a major constituent, and not as a dependent one, that

corresponds, for example, to prepositions in prepositional groups in English and

Russian languages.

Nominal phrases followed by a postpositive case marker may have structures

of any complexity, including those modified by complex participle clauses, sometimes without a head, as shown in Fig. 1. Such nominal phrases are often proper

names or epithets (in this case it is the Tibetan morpheme-for-morpheme ren´

dering of the Sanskrit name of the Indian city of Sravasti);

the head constituent



Fig. 1. Locative NP exemplified by a participle clause with a terminative adverbial

modifier



Morphosyntactic Analyzer for the Tibetan Language



219



“city” being omitted due to its semantic redundancy. In this example, there

is a participle verbal phrase, modified by a circumstance, which is expressed

by a terminative nominal phrase (TerminativeNP), where the termenative case

marker follows the masdar nominal phrase (MasdarNP), that is expressed by a

verbal root (V Root). The masdar nominal phrase omits the nominalizer, that

is typical for Tibetan complex verbal nouns, including proper names: the nominalizer may be omitted both by participles and masdars, and for the time being

current authors have not identified the precise rules of such omission. Literally

translated, the given example reads as follows:

to hear + nominalizer omitted (missing in the tree) +

for (terminative) +

to exist + nominalizer +

in (locative)

That is, in Existing-To Be-Heard (where Existing-To Be-Heard is a name of

the city).

The above mentioned features of the Tibetan morphosyntax cause a considerable rate of ambiguity in Tibetan text while being processed by a computer:

due to ellipsis every verbal root can be treated as a participle or a masdar within

a personal name, and each modifer can be treated as a separate proper name.

As in other languages, circumstances and complements can get ambiguous interpretations if there are several recursive verbal phrases.



4



Avoiding Redundant Structural Ambiguity:

Undocumented Restrictions on Tibetan Syntax



Ambiguity of formal syntactic structures is often produced not merely by intrinsic linguistic units’ polysemy, but rather by combinatorial redundancy of the formal grammar itself. Nevertheless, exactly in these fairly frequent cases, ambiguity of formal structures shows lack of accuracy in conventional informal descriptions of language, and works as a clue to choose one of several possible ways to

specify these descriptions.

As for Tibetan grammar, this on-going study has already shown some formal

ambiguity cases of this kind. The examples below show how only three cases

of description opacity can produce a combinatorial explosion in quite a short

, The story about this is heard by me/The one who

sentence (

told this is heard by me/Those who told this are heard by me).

First off, it is not strictly specified in any of existing Tibetan grammar

descriptions, including the most detailed one [1], if Tibetan predicates can be

omitted. It is known that link-verbs are omitted in composite nominal predicates, but it is not clear, whether the whole predicative VP can be omitted like

in Russian, or it is obligatory in any sentence like in English.

As Fig. 2 shows, allowing predicate ellipsis makes the analyzed sentence about

5.3 times more ambiguous (333 vs. 63 versions of parsing). Supposing predicate

ellipsis produces not only obvious versions like ‘The story about this that I heard

(zero predicate)’, but also quite weird hypotheses like ‘This (zero predicate), the



220



A. Dobrov et al.



Fig. 2. Amount of versions depending on ways to formalize Tibetan syntax



story (zero predicate), (something) heard by me (zero predicate)’, which seem

to be ungrammatical, and a lot of their possible combinations.

Another problem of Tibetan syntax formalization that has proven to be

quite important is the question about NP head ellipsis limitations. Nouns

are very often omitted in Tibetan, especially in proper names, but it is

absolutely unclear from existing linguistic descriptions, if such ellipsis is possible in adjacent NPs (e.g.,

king Making-Victory’s grove - the (merchant) Giving-Food-To-the-UnprotectedOnes’s amusement park). Figure 2 shows that allowing ellipsis in adjacent NPs

makes ambiguity level more than 2.3 times higher (63 vs. 27 options). This ellipsis, when allowed, produces ambiguity for each attribute of an NP, as attributes

are postpositional in Tibetan and never have any markers to distinguish them

from adjacent NPs. Prohibiting ellipsis in such NPs can be achieved by creating a separate constituent class for them. Generally, we can say that ellipsis

and related issues in Tibetan require more theoretical research than is currently

available in [1,6,7], to give a few examples.

One of the most important difficulties in Tibetan grammar formalization,

however, is the problem of verbal tense. Tense is not expressed by any separate

marker, but is denoted by verbal root allomorph itself. The problem is that not

nearly all Tibetan verbal roots have different allomorphs for different tenses; it

is the case for many verbs that tense remains unexpressed in the sentence at

all. There are two options to deal with this phenomenon in formal grammar: (1)

to build hypotheses for all three possible tenses for each verb root (2) to create

different constituent classes (with tense feature and without tense feature) for

sentences, predicates, VPs, participle and masdar phrases, etc. First option may

seem attractive, as it allows to make grammar shorter, but, as Fig. 2 shows, it

makes the above-mentioned sentence 9 times more ambiguous (3 versions for 2

verbs make 3*3 = 9 combinations). The conclusion is therefore obvious, that the

second option, which means that such verbs and their phrases are not ambiguous

in terms of tense, but rather have no tense at all, is far more plausible for Tibetan.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

5 Experiment 2: Using Morphologically Reconstructed Forms

Tải bản đầy đủ ngay(0 tr)

×