A Comparative Representation of Two Bulgarian Morphosyntactic Tagsets and the EAGLES Encoding Standard

  Milena Slavcheva


The tables below represent a cross section of the morphosyntactic attributes/values included in tagsets for Bulgarian and the EAGLES minimal common core set of morphosyntactic distinctions.


                       A full application to Bulgarian of the EAGLES proposal
                       for morphosyntactic encoding is here



The first and the second columns contain attributes and their values relevant for Bulgarian. The column with LML heading is a list of the morphosyntactic specifications in the Bulgarian Morphological Lexicon of 60,000 lexemes created in Linguistic Modelling Laboratory (LML). The MULTEXT-East column reproduces the Bulgarian tagset for corpora annotation as part of the Final Report of the MULTEXT-East project:

Specifications and Notation for Lexicon Encoding. COP Project 106 Multext-East, Work Package WP1 - Task 1.1, Deliverable D 1.1 F, Final Report, 29 August, 1997.

The LML and the MULTEXT-East tagsets are mapped onto the set of morphosyntactic specifications constituting Level 0 and Level 1 of the EAGLES proposal for morphosyntactic encoding as presented in:

Synopsis and Comparison of Morphosyntactic Phenomena Encoded in Lexicon and Corpora. A Common Proposal and Applications to European Languages. EAG-CLWG-MORPHSYN/R, Version of 31 Aug., 1996.

Level 0 of the EAGLES proposal contains only the part of speech categories. Level 1 is the set of recommended features constituting the minimal common core set of features which are usually encoded in lexicons and corpora. The ?x? symbol in the EAGLES column in the tables below indicates that the features present in the LML and MULTEXT-East sets of morphosyntactic specifications (marked also with ?x?) have an equivalent belonging to the EAGLES minimal common core set of features, i.e., Level 0 and Level 1.

2. Tables

The comparative tables are given for each part-of-speech category.