Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 1.1 KB

README.md

File metadata and controls

25 lines (21 loc) · 1.1 KB

The MC Speech Dataset

This is public domain speech dataset consisting of 24018 short audio clips of a single speaker reading sentences in Polish. A transcription is provided for each clip. Clips have total length of more than 22 hours.

Texts are in public domain. The audio was recorded in 2021-22 as a part of my master's thesis and is in public domain.

The dataset is available at:

If you use this dataset, please cite:

@masterthesis{mcspeech,
  title={Analiza porównawcza korpusów nagrań mowy dla celów syntezy mowy w języku polskim},
  author={Czyżnikiewicz, Mateusz},
  year={2022},
  month={December},
  school={Warsaw University of Technology},
  type={Master's thesis},
  doi={10.13140/RG.2.2.26293.24800},
  note={Available at \url{http://dx.doi.org/10.13140/RG.2.2.26293.24800}},
}

Also, if you find this resource helpful, kindly consider leaving a ⭐.