EAGLES Synopsis and Comparison of Morphosyntactic Phenomena Encoded in Lexicons and Corpora


A Common Proposal and Applications to European Languages




Application to Bulgarian

Milena Slavcheva and Elena Paskaleva


0. Introduction

The set of morphosyntactic specifications used in the Application is that encoded in the Bulgarian Morphological Lexicon produced in Linguistic Modelling Laboratory (LML) The Bg. tags in the tables are the notation used in the Bulgarian module of INTEX: a corpus processing system, and are composed according to the required format. The absense of a tag for a given value means that it is a default value in the finite-state automata representation of that system. The tagset was also used in the LML high speed (30,000 words/sec) morphological Analyser.

The set of morphosyntactic specifications for Bulgarian includes feature-value pairs characterising synthetic forms, i.e., consisting of a unique graphical form. The specifications are lexical, i.e., providing the invariable lexeme information for words, and grammatical, i.e., encoding the variable information of wordforms as the result of inflection.

1. Noun (N)

1.1. Level 1 Features

1.1.1. Type

Attribute Value Bg. example Bg. tag
Type common книга  
  proper Милена P
1.1.2. Gender
Attribute Value Bg. example Bg. tag
Gender masculine стол m
  feminine маса f
  neuter море n

1.1.3. Number

Attribute Value Bg. example Bg. tag
Number singular стол s
  plural столове p
l-spec count стола c
The language-specific count form is used after cardinal numerals and is part of the wordform paradigm of nouns which are masculine and denote humans. The count form is homonymous with the short definite form of masculine nouns, and as such cannot be disambiguated in a text without context reference. The count form is the factor for the presence of the Animateness feature (see 1.3.2) in the set of morphosyntactic specifications for the Noun in the Bulgarian lexicon.

1.2. Level 1a Features

1.2.1. Case

Attribute Value Bg. example Bg. tag
l-spec vocative родино V
Modern Bulgarian has no case system for nouns. There is only the vocative form which is still alive, but it is looked upon as a case relic, rather than as a true case form entering in opposition with indicative or other case. The vocative is now just a wordform in the paradigm of some nouns.

1.2.2. Countability

This feature is not applied to Bulgarian. It is implicitly present in the lexicon as a constraint to some nouns missing the plural in their paradigms.

1.3. Language-specific Features (Level 2b)

1.3.1. Definiteness

Attribute Value Bg. example Bg. tag
Definiteness indefinite маса  
  definite масата d
  short def. form стола h
  full def. form столът l
Definiteness in Bulgarian is expressed by a “definite article” which behaves as an inflection. Masculine nouns which are definite and singular have either short or full definite form. The full “article” ending is attached to masculine singular nouns which function as sentence subjects. The value definite without a distinction for full or short def. form is necessary for feminine and neuter singular nouns, masculine singular nouns whose base form ends with a vowel, as well as for all plural nouns.

The principles of Definiteness as a linguistic phenomenon can be derived from the above description for the other part of speech categories where the Definiteness feature is present.

1.3.2. Animateness

This feature is relevant to the count form (see 1.1.3.) of masculine nouns: inanimate and animate non-human nouns are in the count form when modified by a cardinal numeral, while animate human nouns are used in the plural in the same word combination. So the opposition is inanimate and animate non-human vs. animate human, and the attribute-value set in the Animateness table is defined according to this distribution of Animateness and Humanness. For reasons of uniformity, all nouns in the Bulgarian lexicon are classified for the values of the Animateness feature.
Attribute Value Bg. example Bg. tag
Animateness inanimate стол  
  animate non-human кон  
  animate human учител Hum

2. Verb (V)

2.1. Level 1 Features

2.1.1. Type

Attribute Value Bg. example Bg. tag
Type personal питам  
  impersonal съмва NPR
  semi-impersonal вали SIP
  auxiliary съм AUX
The mapping between the EAGLES Level 1 subdivision of Type into main and auxiliary and the verbal subcategories in the Bulgarian lexicon is given in the following tree diagram:

The subclassification of verbs is done in respect to their role in the formation of analytic tense and mood forms, their syntactic behaviour, and is conditioned by the specific shape of the paradigm and some peculiarities in the attachment of pronominal clitics. Personal verbs have a full paradigm, while the impersonal ones are used only in the third person singular forms and in the singular neuter participle form. The number of paradigm members of the semi-impersonal verbs is between that of the personal and impersonal ones. A great number of the impersonal and semi-impersonal verbs obligatorily attach pronoun clitics which carry information for the verbal person. Some of the personal verbs attach the clitic reflexive personal pronoun as an obligatory element.

2.1.2. Finiteness

Attribute Value Bg. example Bg. tag
Finiteness finite чета  
  non-finite чел  
The Finiteness feature with its values finite and non-finite fits well in the set of distinctions for the verbal forms. Finite and non-finite mark two sections in the verbal paradigm. The distribution of the verb forms (see 2.1.3.) in the two classes of finite and non-finite is the following:


2.1.3. Verb form / Mood

Attribute Value Bg. example Bg. tag
Verb form/Mood indicative четеш  
  imperative чети Z
  participle чел  
  gerund четейки D
In Bulgarian the only mood form expressed by an inflection within a single wordform is the posititive imperative. Indicative is the default feature in the morphosyntactic distinction.
The possible participle forms and their tags are given in the next table:
Tense Voice Example Tag
present active четящ N
aorist active чел O
imperfect active четял M
passive четен S

2.1.4. Tense

Attribute Value Bg. example Bg. tag
Tense present чета P
  aorist четох A
  imperfect четях I
The tense system in Bulgarian is very rich. The Tense values in the table above are those encoded in synthetic forms. The majority of tenses and moods are generated by analytic (multi-word) forms. The language-specific Tense values aorist and imperfect belong to the more general category of past tense if mapped to a general scheme:

2.1.5. Person

Attribute Value Bg. example Bg. tag
Person first чета 1
  second четеш 2
  third чете 3

2.1.6. Number

Attribute Value Bg. example Bg. tag
Number singular четох s
  plural четохме p

2.1.7. Gender

Attribute Value Bg. example Bg. tag
Gender masculine ходил m
  feminine ходила f
  neuter ходило n
Gender applies to the participles.

2.2. Level 2a Features

2.2.1. Aspect

Attribute Value Bg. example Bg. tag
Aspect perfective прочета PF
  imperfective чета IPF
Aspect is a lexical feature present in the lexicon specifications and licenses some of the verbal paradigm members.

2.2.2. Voice

Attribute Value Bg. example Bg. tag
Voice active чел  
  passive четен  
In the present set of morphosyntactic specifications Voice applies only to the participles where it is indicated by inflections.

2.2.3. Clitic Attachment

The attachment to different types of verbs of reflexive and non-reflexive pronominal clitics has to be encoded in the lexical entry when verb + clitic forms a lexeme. The cases when clitics have the status of complements and are interchangeable with nominal elements should be treated on syntactic level.

2.2.4. Main-Verb Function

In the EAGLES proposal the values of this feature are transitive, intransitive, impersonal, and they are in subordinate position to the main value of Type. The subdivision of the main verbal type in the Bulgarian lexicon was considered in section 2.1.1. There is a separate feature Transitivity applied to the personal type of verbs.
Attribute Value Bg. example Bg. tag
Transitivity transitive чета t
  intransitive ходя i

2.2.5. Auxiliary Function

This feature is not present in the set of morphosyntactic specifications for Bulgarian.

2.3. Language-specific Features (Level 2b)

2.3.1. Definiteness

Attribute Value Bg. example Bg. tag
Definiteness indefinite чел  
  definite челата d
  short def. form челия h
  full def. form челият l
Definiteness is pertinent to the participles.

3. Adjective (A)

3.1. Level 1 Features

3.1.1. Type

At the EAGLES common level, the values of the Type feature, i.e., qualificative, possessive, ordinal, cardinal and indefinite, are defined in order to distinguish the two main groups of Qualificative and Indicative Adjectives, the latter including the last four values. The Indicative Adjectives are in fact Determiners in their adjectival function, or Adjectives which also have pronominal function. This subdivision does not apply to the Bulgarian lexicon, since it classifies all subtypes of the so called Indicative Adjectives to other categories such as Pronouns and Numerals (see 4. Pronouns and 10. Numerals). In the Bulgarian lexicon there is a distinction between gradable and ungradable adjectives which overlaps to a great extent with the semantic distinction between qualitative adjectives (denoting quality that can be graded, e.g. big, clever) and relative adjectives (identifying an object as belonging to a given class, e.g. educational). The feature Gradability licenses the generation of the superlative and comparative forms of adjectives in the lexicon, but in corpora annotation it is not necessary.
Attribute Value Bg. example Bg. tag
Gradability gradable весел  
  ungradable аграрен  

 3.1.2. Degree

In Bulgarian the comparative and superlative forms are the combination between adjectives and special particles which behave as proclitics and graphically are attached to adjectives with a hyphen. The degree forms are considered analytic, and their generation or analysis is rule-based. Consequently, there is not a Degree feature in the tagset for adjectives.

3.1.3. Gender

Attribute Value Bg. example Bg. tag
Gender masculine умен m
  feminine умна f
  neuter умно n

3.1.4. Number

Attribute Value Bg. example Bg. tag
Number singular умен s
  plural умни p

3.2. Language-specific Reatures (level 2b)

3.2.1. Definiteness

Attribute Value Bg. example Bg. tag
Definiteness indefinite умен  
  definite умната d
  short def. form умния h
  full def. form умният l

4. Pronouns (PRO)

4.1. Level 1 Features

4.1.1. Type

Attribute Value Bg. example Bg. tag
Type personal аз PER
  demonstrative този DEM
  relative който REL
  collective всички COL
  interrogative кой INT
  indefinite някой IDF
  negative никой NEG
  possessive мой POS
  reflexive свой RFL

4.1.2. Person

Attribute Value Bg. example Bg. tag
Person first аз 1
  second ти 2
  third той 3
This feature is relevant to the personal pronouns.

4.1.3. Gender

Attribute Value Bg. example Bg. tag
Gender masculine никой m
  feminine никоя f
  neuter никое n
For the personal pronouns there is distinction in Gender only for the third person singular forms.
Gender applies to all other types of pronouns.

4.1.4. Number

Attribute Value Bg. example Bg. tag
Number singular този s
  plural тези p
The feature Number applies to all types of pronouns.

4.1.5. Case

Attribute Value Bg. example Bg. tag
Case nominative той  
  accusative него A
  dative нему D
Case is a feature pertinent especially to the personal pronouns. There are also some accusative and dative masculine singular forms of the demonstrative, relative, collective, interrogative, indefinite, negative pronouns which gradually fall off of usage.

4.1.6. Possessor

The EAGLES proposal introduces the feature Possessor in Level 1 which denotes the Number of the Possessor in the different forms of the possessive pronouns. In the Bulgarian lexicon there are three features characterising the Possessor: Possessor-Person, Possessor-Gender, Possessor-Number. The information about the Possessor belongs to the stem of pronouns. In the corpora annotation in INTEX format there is a code for Characteristics of Possessor (O) which is the general marker of the specific Person, Gender, or Number information.
Attribute Value Bg. example Bg. tag
Possessor Person first мой O1
    second твой O2
    third негов O3
  Gender masculine негов Om
    feminine неин Of
    neuter негов On
  Number singular негов Os
    plural техен Op

4.2. Level 1a Features

4.2.1. Politeness

In Bulgarian, there are no special pronouns for politeness. It is expressed by the usage of the second person plural pronoun for addressing both a single person or several persons. Politeness is not encoded in the Bulgarian lexicon or the corpora annotation, since it is derived on syntactic and discourse level.

4.3. Language-specific features (Level 2b).

4.3.1. Pronoun form

Attribute Value Bg. example Bg. tag
Pronoun Form full form мене F
  short form ме S
In Bulgarian the personal, possessive and reflexive pronouns have a full, and short or clitic form. The feature is encoded in the set of morphosyntactic specifications since it is an important indicator for the behaviour of pronouns, and a constraint for the presence of some features.

4.3.2. Definiteness

Attribute Value Bg. example Bg. tag
Definiteness indefinite негов  
  definite неговата d
  short def. form неговия h
  full def. form неговият l
Definiteness is a feature that applies to the full forms of the possessive non-reflexive and reflexive pronouns and to some other types of pronouns which, denoting attributes, have a paradigm resembling that of adjectives.

4.3.3. Referent Type

Attribute Value Bg. example Bg. tag
Referent type people & things кой PET
  possession чий PSS
  attributes някакъв ATT
  quantity колко QN
This feature is included in the set of morphosyntactic specifications of the lexicon that is fine-grained, and produces a further detailed subclassification of the relative, collective, interrogative, indefinite, negative, and reflexive pronouns. The feature can be used for defining the syntatic function of a given pronoun, and for anaphoric binding.

4.3.4. Referent Features

Attribute Value Bg. example Bg. tag
Referent features size толкав SZ
  quality такъв QLT
  nearness тази NER
  distance онази DIS
The nearness and distance values distinguish between demonstratives denoting near and distant objects. The size and quality values introduce a fine-grained distinction of the referents of type attributes.

5. Determiner

This category is not present in the Bulgarian lexicon. Words that are usually classified as Determiners in other traditions are distributed in other part-of-speech categories as Numerals, Pronouns, and Adjectives.

6. Article

This category is not applicable to Bulgarian where definite wordforms are generated by the attachment of an ending of inflectional type.

7. Adverb (ADV)

7.1. Level 1 Features

There is a great difference in the subclassification of adverbs in the different languages. EAGLES proposes at Level 1 two features: Type and Degree. The EAGLES values of the Type feature are general and particle. This subdivision is not suitable for Bulgarian where the subclass of particle as an adverbial subtype is not applicable. The mapping between the EAGLES and the Bulgarian subdivision (see 7.1.1.) of the Type feature is given in the following tree:

7.1.1. Type

AttributeValue Bg. example Bg. tag
Type normal добре  
  pronominal тук  

7.1.2. Degree

The formation of the comparatives and superlatives of adverbs is the same as that of Adjectives (see 3.2.1.) and the considerations about the Degree of Adjectives are relevant for adverbs as well.

There is a feature Gradability (semantic in nature) which encodes in the lexicon the ability of normal adverbs to form the comparative and superlative.
Attribute Value Bg. example Bg. tag
Gradability gradable добре  
  ungradable именно  

7.2. Language-specific Features (Level 2b)

7.2.1. Pronominal Adverb Type

Pronominal adverbs are further subclassified:
Attribute Value Bg. example Bg. tag
Pronominal adverb type demonstrative там DEM
  relative където REL
  collective навсякъде COL
  interrogative къде INT
  indefinite някъде IDF
  negative никъде NEG

7.2.2. Adverb Sense

This feature is a semantic and partly functional distinction for both normal and pronominal adverbs.
Attribute Value Bg. example Bg. tag
Adverb sense time сега TM
  place далече PLC
  manner бързо MNN
  quantity & degree много QNT
  reason & goal затова RGO
  modality несъмнено LOG
  predicativity присърце PRD

8. Adposition

In Bulgarian there are only Prepositions (tagged PREP). They are considered a separate category having no specifications at the morphosyntactic level.

9. Conjunction (CONJ)

The only feature for Conjunctions in the Bulgarian tagset is Type whose values coinside with those proposed at EAGLES Level 1.
Attribute Value Bg. example Bg. tag
Type coordinating и CONJC
  subordinating че CONJS

10. Numeral (NU)

10.1. Level 1 Features

10.1.1. Type

Attribute Value Bg. example Bg. tag
Type cardinal десет CAR
  ordinal десети ORD

10.1.2. Gender

Attribute Value Bg. example Bg. tag
Gender masculine трети m
  feminine трета f
  neuter трето n
Gender is pertinent to ordinal numerals whose paradigm is a typical adjectival one, and also to a small number of cardinals.

10.1.3. Number

Attribute Value Bg. example Bg. tag
Number singular хиляда s
  plural хиляди p
Number is pertinent to ordinals and some cardinals.

10.2. Language-specific features (Level 2b)

10.2.1. Definiteness

Attribute Value Bg. example Bg. tag
Definiteness indefinite трети  
  definite третата d
  short def. form третия h
  full def. form третият l
The values of Definiteness apply both to cardinals and ordinals.

10.2.2. Numeral Form

Attribute Value Bg. example Bg. tag
Numeral Form absolute cardinal numeral десет S
  male person form двама M
  approximate десетина A
The male person form is used before nouns denoting male humans, or a group of humans where there is at least one male. The approximate form, as revealed by its name, means “approximate number of objects”. The absolute cardinal numeral value is necessary for marking the respective wordforms as opposed to the other numeral forms.

11. Interjection (INTJ)

There are no subcategories for Interjection.

12. Unique membership class

Particles (tagged PC), which are a distinct part-of-speech category in Bulgarian, should belong to the Unique class of the EAGLES categorial model. There is no subdivision of Particles in the present set of Bulgarian morphosyntactic specifications.

13. Residual

This category is not considered in the present Bulgarian tagset.