Home
About
Results
Project Team
Papers
Tools
Contacts
Project Objectives

Project Objectives

The main objective of this project is to expand and develop methods, resources and software systems for improving the OCR correction of Bulgarian and Multilingual (Bulgarian, Russian, English and German) documents.

The high level achievements of the project are given bellow:

Word context based OCR correction

  • Further development of word context based on Levenshtein Automata correction method. This method can be further refined in several directions. First, one can use probabilities for symbol-dependent recognition errors in order to sort more precisely the correction candidates. For the implementation of this option we can extend the concept of Levenshtein automata by using of weighted automata, which will deliver optimal efficiency. Second we can order the possible correction candidates in respect of the word frequencies.
  • Extension of the Bulgarian, Russian, German and English Electronic Dictionaries with OCR aiding data which makes the further correction methods possible. This includes adding information about the word frequencies, recognition error risk values. The lexical resources will be formatted in order to provide efficiency. For correction of multilingual documents a very-large size consolidated Bulgarian-Russian-German-English dictionary will be constructed.
  • Test series for the probabilities of symbol-dependent recognition errors for Cyrillic and Latin Fonts. This series will provide the font dependent data for the building of the weighted Levenshtein automata. In that way the list of the correction candidates can be sorted in respect of the recognition error probability.

Sentence context based OCR correction

  • Analysis of large-size Corpora for extracting word collocation table for Bulgarian to be used for OCR correction based on word collocation techniques.

Implementation

  • Implementation of a robust and highly efficient correction system based on the Levenshtein automata framework and the sentence context correction. We plan to implement our approach in order to test and compare it against the traditional methods. This implementation can demonstrate the achievements of the project in order to attract industrial applications.
[Home] [About] [Results] [Project Team] [Papers] [Tools] [Contacts]