Bulgarian Phonetic Corpus
BulPhonC Version 2

The Bulgarian Phonetic Corpus BulPhonC contains speech signals annotated automatically on phoneme level.
The creation of the BulPhonC corpus has been supported by the project AComIn: Advanced Computing for Innovation
funded by the FP7 Capacity Programme, Research Potential of Convergence Regions, Grant Agreement: 316087.

The corpus has been compiled at the Linguistic Modelling Department of the Institute of Information and Communication Technologies at the Bulgarian Academy of Sciences by Dimitar Hristov, Ivan Zamanov, Ivana Yovcheva, Marina Kraeva, Nelly Hateva, Petar Mitankin and Stoyan Mihov.

Corpus Description

Language: Bulgarian
Year: 2015
Speakers: 99 speakers, 44 male and 55 female Bulgarian speakers, average speaker's age - 34 years
Recording environment: studio
Microphone: Sennheiser MK 4
Sampling rate: 16 kHz
Number of bits per sample: 16
Sample type: one-channel pcm
Number of utterances: 16215
Number of sentences: the corpus contains 319 phonetically rich sentences divided into two parts.
Part 1 contains 148 sentences and Part 2 contains the remaining 171 sentences.
Most of the speakers have read only Part 1.
Phonetic annotation: each utterance has a corresponding annotation on phoneme level in a format supported by praat.
The recorded signals were automatically segmented into utterances.
All automatically segmented utterances were manually verified and the incorrectly segmented utterances were removed from the corpus.
The remaining utterances were automatically annotated on phoneme level.
Phonetic system: the phonetic system consists of 30 phonemes.
Size of the corpus: 2.4 GB

Samples

Sentence
Utterance
Phonetic annotation

Contents of the BulPhonC Corpus Version 2

For more information, please, contact BulPhonC at lml dot bas dot bg.

Bulgarian Phonetic CorpusBulPhonC Version 2

Corpus Description

Samples

Contents of the BulPhonC Corpus Version 2

Bulgarian Phonetic Corpus
BulPhonC Version 2