The Bulgarian Phonetic Corpus BulPhonC contains speech signals annotated automatically on phoneme level. The creation of the BulPhonC corpus has been supported by the project AComIn: Advanced Computing for Innovation funded by the FP7 Capacity Programme, Research Potential of Convergence Regions, Grant Agreement: 316087.
|Authors:||Dimitar Hristov, Ivan Zamanov, Ivana Yovcheva, Marina Kraeva, Nelly Hateva, Petar Mitankin and Stoyan Mihov|
|Speakers:||140 speakers, 59 male and 81 female Bulgarian speakers, average speaker's age - 37 years|
|Microphone:||Sennheiser MK 4|
|Sampling rate:||16 kHz|
|Number of bits per sample:||16|
|Sample type:||One-channel pcm|
|Number of utterances:||21891|
|Number of sentences:||The corpus contains 319 phonetically rich sentences divided into two parts. Part 1 contains 148 sentences and Part 2 contains the remaining 171 sentences. Most of the speakers have read only Part 1.|
|Phonetic annotation:||Each utterance has a corresponding annotation on phoneme level in a format supported by praat. The recorded signals were automatically segmented into utterances. All automatically segmented utterances were manually verified and the incorrectly segmented utterances were removed from the corpus. The remaining utterances were automatically annotated on phoneme level.|
|Phonetic system:||The phonetic system consists of 30 phonemes.|
|Size of the corpus:||2.7 GB tar.gz|
|Duration:||~ 40 hours|
|Citation:||Hateva, N., Mitankin, P., Mihov, S., BulPhonC: Bulgarian Speech Corpus for the Development of ASR Technology. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016, ISBN:978-2-9517408-9-1, pp. 771-774|
For more information, please, contact BulPhonC at lml dot bas dot bg.