Bulgarian Phonetic Corpus
BulPhonC Version 2
The Bulgarian Phonetic Corpus BulPhonC contains speech signals annotated automatically on phoneme level.
The creation of the BulPhonC corpus has been supported by
the project AComIn: Advanced Computing for Innovation
funded by the FP7 Capacity Programme, Research Potential of Convergence Regions, Grant Agreement: 316087.
The corpus has been compiled at the Linguistic Modelling Department of
the Institute of Information and Communication Technologies at the
Bulgarian Academy of Sciences by
Dimitar Hristov,
Ivan Zamanov,
Ivana Yovcheva,
Marina Kraeva,
Nelly Hateva,
Petar Mitankin and
Stoyan Mihov.
Corpus Description
- Language: Bulgarian
- Year: 2015
- Speakers: 99 speakers, 44 male and 55 female Bulgarian speakers, average speaker's age - 34 years
- Recording environment: studio
- Microphone: Sennheiser MK 4
- Sampling rate: 16 kHz
- Number of bits per sample: 16
- Sample type: one-channel pcm
- Number of utterances: 16215
- Number of sentences: the corpus contains 319 phonetically rich sentences divided into two parts.
Part 1 contains 148 sentences and Part 2 contains the remaining 171 sentences.
Most of the speakers have read only Part 1.
- Phonetic annotation: each utterance has a corresponding annotation on phoneme level in a format
supported by praat.
The recorded signals were automatically segmented into utterances.
All automatically segmented utterances were manually verified and the incorrectly segmented utterances were removed from the corpus.
The remaining utterances were automatically annotated on phoneme level.
- Phonetic system: the phonetic system consists of 30 phonemes.
- Size of the corpus: 2.4 GB
Samples
Sentence
Utterance
Phonetic annotation
For more information, please, contact BulPhonC at lml dot bas dot bg.