Skip to content

Latest commit

 

History

History
12 lines (6 loc) · 585 Bytes

README.md

File metadata and controls

12 lines (6 loc) · 585 Bytes

Echo-of-Moscow scrapper

The aim of this project is to prepare the corpora of data for newral network training. We've got pairs of (audio, transcript) from https://echo.msk.ru/

Description

This repository contains two files:

  • urls.txt contains the list of available URLs to get texts from
  • extract_data_1.py contains functions for getting and parsing texts and audios from these URLs using BeautifulSoup