BULSTEM: INFLECTIONAL STEMMER FOR BULGARIAN
Abstract: The paper starts with an overview of some
important approaches to stemming for English and other
languages. Then, the design, implementation and evaluation
of the BulStem inflectional stemmer for Bulgarian
are presented. The problem is addressed from a machinelearning
perspective using a large morphological dictionary.
A detailed automatic evaluation in terms of understemming,
over-stemming and coverage is provided. In
addition, the effect of stemming and BulStem parameters
setting is demonstrated on a particular task: text categorisation
using kNN+LSA.
Keywords: Stemming, lemmatisation, text categorisation,
k-nearest-neighbour, vector-space model, latent semantic
analysis, information retrieval.