Skip to content

Latest commit

 

History

History
16 lines (13 loc) · 510 Bytes

README.md

File metadata and controls

16 lines (13 loc) · 510 Bytes

tharavukkanam

collection of datasets

TODO

Collect direct sources for corpora

  • setup download for books and other public works, use wikisource
  • setup for blogs and news, a perdiodical scraper (can use existing crawler)

Dictionaries

  • Scrape every dictionary
    • Winslow
    • Fabricius
  • develop quasi-schema to merge different dictionaries into one