EAGLES Synopsis and Comparison of Morphosyntactic
Phenomena Encoded in Lexicons and Corpora
A Common Proposal and Applications to European Languages
Application to Bulgarian
Milena Slavcheva and Elena Paskaleva
0. Introduction
The set of morphosyntactic specifications used in the Application is that
encoded in the Bulgarian Morphological Lexicon produced in Linguistic
Modelling Laboratory (LML) The Bg. tags in the
tables are the notation used in the Bulgarian module of INTEX: a corpus
processing system, and are composed according to the required format.
The absense of a tag for a given value means that it is a default value
in the finite-state automata representation of that system. The tagset
was also used in the LML high speed (30,000 words/sec) morphological Analyser.
The set of morphosyntactic specifications for Bulgarian includes feature-value
pairs characterising synthetic forms, i.e., consisting of a unique graphical
form. The specifications are lexical, i.e., providing the invariable lexeme
information for words, and grammatical, i.e., encoding the variable information
of wordforms as the result of inflection.
1. Noun (N)
1.1. Level 1 Features
1.1.1. Type
Attribute |
Value |
Bg. example |
Bg. tag |
Type |
common |
книга |
|
|
proper |
Милена |
P |
1.1.2. Gender
Attribute |
Value |
Bg. example |
Bg. tag |
Gender |
masculine |
стол |
m |
|
feminine |
маса |
f |
|
neuter |
море |
n |
1.1.3. Number
Attribute |
Value |
Bg. example |
Bg. tag |
Number |
singular |
стол |
s |
|
plural |
столове |
p |
l-spec |
count |
стола |
c |
The language-specific count form is used after cardinal numerals
and is part of the wordform paradigm of nouns which are masculine and denote
humans. The count form is homonymous with the short definite
form of masculine nouns, and as such cannot be disambiguated in a text
without context reference. The count form is the factor for the
presence of the Animateness feature (see 1.3.2) in the set of morphosyntactic
specifications for the Noun in the Bulgarian lexicon.
1.2. Level 1a Features
1.2.1. Case
Attribute |
Value |
Bg. example |
Bg. tag |
Case |
|
|
|
l-spec |
vocative |
родино |
V |
Modern Bulgarian has no case system for nouns. There is only the vocative
form which is still alive, but it is looked upon as a case relic, rather
than as a true case form entering in opposition with indicative
or other case. The vocative is now just a wordform in the paradigm
of some nouns.
1.2.2. Countability
This feature is not applied to Bulgarian. It is implicitly present in the
lexicon as a constraint to some nouns missing the plural in their paradigms.
1.3. Language-specific Features (Level 2b)
1.3.1. Definiteness
Attribute |
Value |
Bg. example |
Bg. tag |
Definiteness |
indefinite |
маса |
|
|
definite |
масата |
d |
|
short def. form |
стола |
h |
|
full def. form |
столът |
l |
Definiteness in Bulgarian is expressed by a “definite article” which
behaves as an inflection. Masculine nouns which are definite and singular
have either short or full definite form. The full
“article” ending is attached to masculine singular nouns which function
as sentence subjects. The value definite without a distinction for
full or short def. form is necessary for feminine
and neuter singular nouns, masculine singular nouns whose base form ends
with a vowel, as well as for all plural nouns.
The principles of Definiteness as a linguistic phenomenon can be derived
from the above description for the other part of speech categories where
the Definiteness feature is present.
1.3.2. Animateness
This feature is relevant to the count form (see 1.1.3.) of masculine
nouns: inanimate and animate non-human nouns are in the count
form when modified by a cardinal numeral, while animate human nouns
are used in the plural in the same word combination. So the opposition
is inanimate and animate non-human vs. animate human,
and the attribute-value set in the Animateness table is defined
according to this distribution of Animateness and Humanness.
For reasons of uniformity, all nouns in the Bulgarian lexicon are classified
for the values of the Animateness feature.
Attribute |
Value |
Bg. example |
Bg. tag |
Animateness |
inanimate |
стол |
|
|
animate non-human |
кон |
|
|
animate human |
учител |
Hum |
2. Verb (V)
2.1. Level 1 Features
2.1.1. Type
Attribute |
Value |
Bg. example |
Bg. tag |
Type |
personal |
питам |
|
|
impersonal |
съмва |
NPR |
|
semi-impersonal |
вали |
SIP |
|
auxiliary |
съм |
AUX |
The mapping between the EAGLES Level 1 subdivision of Type into
main and auxiliary and the verbal subcategories in the Bulgarian
lexicon is given in the following tree diagram:
The subclassification of verbs is done in respect to their role in
the formation of analytic tense and mood forms, their syntactic behaviour,
and is conditioned by the specific shape of the paradigm and some peculiarities
in the attachment of pronominal clitics. Personal verbs have a full
paradigm, while the impersonal ones are used only in the third person
singular forms and in the singular neuter participle form. The number of
paradigm members of the semi-impersonal verbs is between that of
the personal and impersonal ones. A great number of the impersonal
and semi-impersonal verbs obligatorily attach pronoun clitics which
carry information for the verbal person. Some of the personal verbs
attach the clitic reflexive personal pronoun as an obligatory element.
2.1.2. Finiteness
Attribute |
Value |
Bg. example |
Bg. tag |
Finiteness |
finite |
чета |
|
|
non-finite |
чел |
|
The Finiteness feature with its values finite and non-finite
fits well in the set of distinctions for the verbal forms. Finite
and non-finite mark two sections in the verbal paradigm. The distribution
of the verb forms (see 2.1.3.) in the two classes of finite and
non-finite is the following:
2.1.3. Verb form / Mood
Attribute |
Value |
Bg. example |
Bg. tag |
Verb form/Mood |
indicative |
четеш |
|
|
imperative |
чети |
Z |
|
participle |
чел |
|
|
gerund |
четейки |
D |
In Bulgarian the only mood form expressed by an inflection within a
single wordform is the posititive imperative. Indicative
is the default feature in the morphosyntactic distinction.
The possible participle forms and their tags are given in the next
table:
PARTICIPLE
|
Tense |
Voice |
Example |
Tag |
present |
active |
четящ |
N |
aorist |
active |
чел |
O |
imperfect |
active |
четял |
M |
-
|
passive |
четен |
S |
2.1.4. Tense
Attribute |
Value |
Bg. example |
Bg. tag |
Tense |
present |
чета |
P |
|
aorist |
четох |
A |
|
imperfect |
четях |
I |
The tense system in Bulgarian is very rich. The Tense values
in the table above are those encoded in synthetic forms. The majority of
tenses and moods are generated by analytic (multi-word) forms. The language-specific
Tense values aorist and imperfect belong to the more
general category of past tense if mapped to a general scheme:
2.1.5. Person
Attribute |
Value |
Bg. example |
Bg. tag |
Person |
first |
чета |
1 |
|
second |
четеш |
2 |
|
third |
чете |
3 |
2.1.6. Number
Attribute |
Value |
Bg. example |
Bg. tag |
Number |
singular |
четох |
s |
|
plural |
четохме |
p |
2.1.7. Gender
Attribute |
Value |
Bg. example |
Bg. tag |
Gender |
masculine |
ходил |
m |
|
feminine |
ходила |
f |
|
neuter |
ходило |
n |
Gender applies to the participles.
2.2. Level 2a Features
2.2.1. Aspect
Attribute |
Value |
Bg. example |
Bg. tag |
Aspect |
perfective |
прочета |
PF |
|
imperfective |
чета |
IPF |
Aspect is a lexical feature present in the lexicon specifications and
licenses some of the verbal paradigm members.
2.2.2. Voice
Attribute |
Value |
Bg. example |
Bg. tag |
Voice |
active |
чел |
|
|
passive |
четен |
|
In the present set of morphosyntactic specifications Voice applies
only to the participles where it is indicated by inflections.
2.2.3. Clitic Attachment
The attachment to different types of verbs of reflexive and non-reflexive
pronominal clitics has to be encoded in the lexical entry when verb + clitic
forms a lexeme. The cases when clitics have the status of complements and
are interchangeable with nominal elements should be treated on syntactic
level.
2.2.4. Main-Verb Function
In the EAGLES proposal the values of this feature are transitive,
intransitive, impersonal, and they are in subordinate position
to the main value of Type. The subdivision of the main
verbal type in the Bulgarian lexicon was considered in section 2.1.1. There
is a separate feature Transitivity applied to the personal
type of verbs.
Attribute |
Value |
Bg. example |
Bg. tag |
Transitivity |
transitive |
чета |
t |
|
intransitive |
ходя |
i |
2.2.5. Auxiliary Function
This feature is not present in the set of morphosyntactic specifications
for Bulgarian.
2.3. Language-specific Features (Level 2b)
2.3.1. Definiteness
Attribute |
Value |
Bg. example |
Bg. tag |
Definiteness |
indefinite |
чел |
|
|
definite |
челата |
d |
|
short def. form |
челия |
h |
|
full def. form |
челият |
l |
Definiteness is pertinent to the participles.
3. Adjective (A)
3.1. Level 1 Features
3.1.1. Type
At the EAGLES common level, the values of the Type feature, i.e.,
qualificative, possessive, ordinal, cardinal and indefinite,
are defined in order to distinguish the two main groups of Qualificative
and Indicative Adjectives, the latter including the last four values.
The Indicative Adjectives are in fact Determiners in their adjectival function,
or Adjectives which also have pronominal function. This subdivision does
not apply to the Bulgarian lexicon, since it classifies all subtypes of
the so called Indicative Adjectives to other categories such as Pronouns
and Numerals (see 4. Pronouns and 10. Numerals). In the Bulgarian
lexicon there is a distinction between gradable and ungradable
adjectives which overlaps to a great extent with the semantic distinction
between qualitative adjectives (denoting quality that can be graded,
e.g. big, clever) and relative adjectives (identifying an
object as belonging to a given class, e.g. educational). The feature
Gradability licenses the generation of the superlative and
comparative forms of adjectives in the lexicon, but in corpora annotation
it is not necessary.
Attribute |
Value |
Bg. example |
Bg. tag |
Gradability |
gradable |
весел |
|
|
ungradable |
аграрен |
|
3.1.2. Degree
In Bulgarian the comparative and superlative forms are the combination
between adjectives and special particles which behave as proclitics and
graphically are attached to adjectives with a hyphen. The degree forms
are considered analytic, and their generation or analysis is rule-based.
Consequently, there is not a Degree feature in the tagset for adjectives.
3.1.3. Gender
Attribute |
Value |
Bg. example |
Bg. tag |
Gender |
masculine |
умен |
m |
|
feminine |
умна |
f |
|
neuter |
умно |
n |
3.1.4. Number
Attribute |
Value |
Bg. example |
Bg. tag |
Number |
singular |
умен |
s |
|
plural |
умни |
p |
3.2. Language-specific Reatures (level 2b)
3.2.1. Definiteness
Attribute |
Value |
Bg. example |
Bg. tag |
Definiteness |
indefinite |
умен |
|
|
definite |
умната |
d |
|
short def. form |
умния |
h |
|
full def. form |
умният |
l |
4. Pronouns (PRO)
4.1. Level 1 Features
4.1.1. Type
Attribute |
Value |
Bg. example |
Bg. tag |
Type |
personal |
аз |
PER |
|
demonstrative |
този |
DEM |
|
relative |
който |
REL |
|
collective |
всички |
COL |
|
interrogative |
кой |
INT |
|
indefinite |
някой |
IDF |
|
negative |
никой |
NEG |
|
possessive |
мой |
POS |
|
reflexive |
свой |
RFL |
4.1.2. Person
Attribute |
Value |
Bg. example |
Bg. tag |
Person |
first |
аз |
1 |
|
second |
ти |
2 |
|
third |
той |
3 |
This feature is relevant to the personal pronouns.
4.1.3. Gender
Attribute |
Value |
Bg. example |
Bg. tag |
Gender |
masculine |
никой |
m |
|
feminine |
никоя |
f |
|
neuter |
никое |
n |
For the personal pronouns there is distinction in Gender only for the
third person singular forms.
Gender applies to all other types of pronouns.
4.1.4. Number
Attribute |
Value |
Bg. example |
Bg. tag |
Number |
singular |
този |
s |
|
plural |
тези |
p |
The feature Number applies to all types of pronouns.
4.1.5. Case
Attribute |
Value |
Bg. example |
Bg. tag |
Case |
nominative |
той |
|
|
accusative |
него |
A |
|
dative |
нему |
D |
Case is a feature pertinent especially to the personal
pronouns. There are also some accusative and dative masculine singular
forms of the demonstrative, relative, collective, interrogative,
indefinite, negative pronouns which gradually fall off of usage.
4.1.6. Possessor
The EAGLES proposal introduces the feature Possessor in Level 1
which denotes the Number of the Possessor in the different forms of the
possessive pronouns. In the Bulgarian lexicon there are three features
characterising the Possessor: Possessor-Person, Possessor-Gender, Possessor-Number.
The information about the Possessor belongs to the stem of pronouns. In
the corpora annotation in INTEX format there is a code for Characteristics
of Possessor (O) which is the general marker of the specific
Person, Gender, or Number information.
Attribute |
Value |
Bg. example |
Bg. tag |
Possessor |
Person |
first |
мой |
O1 |
|
|
second |
твой |
O2 |
|
|
third |
негов |
O3 |
|
|
|
|
|
|
Gender |
masculine |
негов |
Om |
|
|
feminine |
неин |
Of |
|
|
neuter |
негов |
On |
|
|
|
|
|
|
Number |
singular |
негов |
Os |
|
|
plural |
техен |
Op |
4.2. Level 1a Features
4.2.1. Politeness
In Bulgarian, there are no special pronouns for politeness. It is expressed
by the usage of the second person plural pronoun for addressing both a
single person or several persons. Politeness is not encoded in the Bulgarian
lexicon or the corpora annotation, since it is derived on syntactic and
discourse level.
4.3. Language-specific features (Level 2b).
4.3.1. Pronoun form
Attribute |
Value |
Bg. example |
Bg. tag |
Pronoun Form |
full form |
мене |
F |
|
short form |
ме |
S |
In Bulgarian the personal, possessive and reflexive pronouns have a
full, and short or clitic form. The feature is encoded
in the set of morphosyntactic specifications since it is an important indicator
for the behaviour of pronouns, and a constraint for the presence of some
features.
4.3.2. Definiteness
Attribute |
Value |
Bg. example |
Bg. tag |
Definiteness |
indefinite |
негов |
|
|
definite |
неговата |
d |
|
short def. form |
неговия |
h |
|
full def. form |
неговият |
l |
Definiteness is a feature that applies to the full forms of
the possessive non-reflexive and reflexive pronouns and to some other types
of pronouns which, denoting attributes, have a paradigm resembling that
of adjectives.
4.3.3. Referent Type
Attribute |
Value |
Bg. example |
Bg. tag |
Referent type |
people & things |
кой |
PET |
|
possession |
чий |
PSS |
|
attributes |
някакъв |
ATT |
|
quantity |
колко |
QN |
This feature is included in the set of morphosyntactic specifications
of the lexicon that is fine-grained, and produces a further detailed subclassification
of the relative, collective, interrogative, indefinite, negative,
and reflexive pronouns. The feature can be used for defining the
syntatic function of a given pronoun, and for anaphoric binding.
4.3.4. Referent Features
Attribute |
Value |
Bg. example |
Bg. tag |
Referent features |
size |
толкав |
SZ |
|
quality |
такъв |
QLT |
|
nearness |
тази |
NER |
|
distance |
онази |
DIS |
The nearness and distance values distinguish between
demonstratives denoting near and distant objects. The size and quality
values introduce a fine-grained distinction of the referents of type attributes.
5. Determiner
This category is not present in the Bulgarian lexicon. Words that are usually
classified as Determiners in other traditions are distributed in other
part-of-speech categories as Numerals, Pronouns, and Adjectives.
6. Article
This category is not applicable to Bulgarian where definite wordforms are
generated by the attachment of an ending of inflectional type.
7. Adverb (ADV)
7.1. Level 1 Features
There is a great difference in the subclassification of adverbs in the
different languages. EAGLES proposes at Level 1 two features: Type and
Degree. The EAGLES values of the Type feature are general
and particle. This subdivision is not suitable for Bulgarian where
the subclass of particle as an adverbial subtype is not applicable.
The mapping between the EAGLES and the Bulgarian subdivision (see 7.1.1.)
of the Type feature is given in the following tree:
7.1.1. Type
AttributeValue |
Bg. example |
Bg. tag |
|
Type |
normal |
добре |
|
|
pronominal |
тук |
|
7.1.2. Degree
The formation of the comparatives and superlatives of adverbs
is the same as that of Adjectives (see 3.2.1.) and the considerations about
the Degree of Adjectives are relevant for adverbs as well.
There is a feature Gradability (semantic in nature) which encodes
in the lexicon the ability of normal adverbs to form the comparative
and superlative.
Attribute |
Value |
Bg. example |
Bg. tag |
Gradability |
gradable |
добре |
|
|
ungradable |
именно |
|
7.2. Language-specific Features (Level 2b)
7.2.1. Pronominal Adverb Type
Pronominal adverbs are further subclassified:
Attribute |
Value |
Bg. example |
Bg. tag |
Pronominal adverb type |
demonstrative |
там |
DEM |
|
relative |
където |
REL |
|
collective |
навсякъде |
COL |
|
interrogative |
къде |
INT |
|
indefinite |
някъде |
IDF |
|
negative |
никъде |
NEG |
7.2.2. Adverb Sense
This feature is a semantic and partly functional distinction for both normal
and pronominal adverbs.
Attribute |
Value |
Bg. example |
Bg. tag |
Adverb sense |
time |
сега |
TM |
|
place |
далече |
PLC |
|
manner |
бързо |
MNN |
|
quantity & degree |
много |
QNT |
|
reason & goal |
затова |
RGO |
|
modality |
несъмнено |
LOG |
|
predicativity |
присърце |
PRD |
8. Adposition
In Bulgarian there are only Prepositions (tagged PREP). They
are considered a separate category having no specifications at the morphosyntactic
level.
9. Conjunction (CONJ)
The only feature for Conjunctions in the Bulgarian tagset is Type
whose values coinside with those proposed at EAGLES Level 1.
Attribute |
Value |
Bg. example |
Bg. tag |
Type |
coordinating |
и |
CONJC |
|
subordinating |
че |
CONJS |
10. Numeral (NU)
10.1. Level 1 Features
10.1.1. Type
Attribute |
Value |
Bg. example |
Bg. tag |
Type |
cardinal |
десет |
CAR |
|
ordinal |
десети |
ORD |
10.1.2. Gender
Attribute |
Value |
Bg. example |
Bg. tag |
Gender |
masculine |
трети |
m |
|
feminine |
трета |
f |
|
neuter |
трето |
n |
Gender is pertinent to ordinal numerals whose paradigm is a
typical adjectival one, and also to a small number of cardinals.
10.1.3. Number
Attribute |
Value |
Bg. example |
Bg. tag |
Number |
singular |
хиляда |
s |
|
plural |
хиляди |
p |
Number is pertinent to ordinals and some cardinals.
10.2. Language-specific features (Level 2b)
10.2.1. Definiteness
Attribute |
Value |
Bg. example |
Bg. tag |
Definiteness |
indefinite |
трети |
|
|
definite |
третата |
d |
|
short def. form |
третия |
h |
|
full def. form |
третият |
l |
The values of Definiteness apply both to cardinals and ordinals.
10.2.2. Numeral Form
Attribute |
Value |
Bg. example |
Bg. tag |
Numeral Form |
absolute cardinal numeral |
десет |
S |
|
male person form |
двама |
M |
|
approximate |
десетина |
A |
The male person form is used before nouns denoting male humans,
or a group of humans where there is at least one male. The approximate
form, as revealed by its name, means “approximate number of objects”.
The absolute cardinal numeral value is necessary for marking the
respective wordforms as opposed to the other numeral forms.
11. Interjection (INTJ)
There are no subcategories for Interjection.
12. Unique membership class
Particles (tagged PC), which are a distinct part-of-speech
category in Bulgarian, should belong to the Unique class of the EAGLES
categorial model. There is no subdivision of Particles in the present set
of Bulgarian morphosyntactic specifications.
13. Residual
This category is not considered in the present Bulgarian tagset.