This repository contains research into podcasts with Natural Language processing.
Podcast Data is vast and growing tremendously day by day. There are many data points to research podcasts, with the main being the audio files themselves, transcripts of the audio, podcast descriptions and other metadata obtained from a podcast's rss feed.
The first phase of this research is dealing with textual data obtained from the podcast and it's episodes' descriptions obtained from rss feeds. Named entities are extracted from the descriptions and the entities attached to the the resulting podcast file.
- Downloading the latest version from git
JSON-Java
- for parsing JSON.