LML .    CFRL
 
 

                    RUSSIAN-BULGARIAN JOINT VENTURE

 

            Goals

The goal of the joint Russian-Bulgarian venture is    
    * INTEX - a system for lexicon and corpora processing made in LADL (University Paris VII) where, 
    within the framework of COPERNICUS?94 JRP #790 BILEDITA, a new component (Bulgarian) was added 
    to the existing 5 components (French, English, German, Italian, Spanish). 
 

 
 

               Sharing Linguistic  Knowledge

   

                               From CFRL tagset to INTEX tagset format

The CFRL Russian tagset (mirroring directly the notation of Grammatical Dictionary of Russian of A. Zaliznjak) underwent some changes. They consist of:

                    a) removing tags from the tagset;

                   b) changing the tags hierarchy.

Removing tags. It is done for the tags denoting syntactic operations, referring to generation rather than analysis (Phrasal element, Analytic form, Postposition)

Changing the hierarchy. Following the INTEX format, three types of features (2 lexical and 1 grammatical) are distinguished: lexical identifier - the Part of Speech, lexical attribute - a characteristic of the lexeme (the whole paradigm), and grammatical feature - a characteristic of the concrete paradigm member (in the Table: LEXID, LEXATT and GRAMM).

To follow the INTEX hierarchy, some structural changes were made in cases as 8-10, 4-7, and 12,18: Main node ----> Subordinated to a new main node (4-7)    Main node --à Subordinated to an existing main node (12, 18)
 
 
 
The full mapping of CRLF lexicon features to INTEX-LML tagset is given in Tables.

A sample of analyzed Russian text is given in Results.