languagepod101-scraper is a resource for dozen of language learning courses and study material for FREE.
languagepod101-scraper helps you download full language courses and save them to a local directory. The courses are produced and distributed by Innovative Language, who provides language learning courses from a selection of dozens of languages. Each lesson is usually 10-20 minutes long.
To get started, choose one of the languages courses offered by Innovative Language and create a free account.
To use the script, fulfill the requirements and follow the example as demonstrated below.
-
Download and install Python 3.9+.
-
Install required packages from requirements.txt file using pip.
pip install -r requirements.txt
For the sake of example, the process of downloading of a course from Japanese Pod 101 will be demonstrated.
Japanese Pod 101 and all other sites have a similar structure which looks as following:
Japanesepod101
├─ Level 1 - Absolute Beginner
│ ├─ Newbie Season 1
│ │ ├─ lesson 01
│ │ ├─ lesson 02
│ │ ├─ lesson 03
│ │ ├─ ...
│ ├─ Newbie Season 2
│ ├─ ...
├─ Level 2 - Beginner
│ ├─ Lower Beginner Season 1
│ │ ├─ lesson 01
│ │ ├─ lesson 02
│ │ ├─ lesson 03
│ │ ├─ ...
│ ├─ ...
├─ Level 3 - Intermediate
│ ├─ ...
│ │ ├─ ...
│ │ ├─ ...
│ ├─ ...
│ ├─ ...
├─ Level 4 - Upper Intermediate
│ ├─ ...
├─ Level 5 - Advanced
│ ├─ ...
-
To download Lower Beginner Season 1 we will have to use our web browser to navigate to
lesson 1
of this course (any other lesson url from the same course is ok too...).Navigation would look like this:
Japanesepod101
→Level 2 - Beginner
→Lower Beginner Season 1
→lesson 01
.Save the URL for
lesson 01
from the address bar, as you will have to provide it to the script later on. -
Create a directory in your PC for this course, and enter into it.
-
Run the language101_scraper.py script, and follow the instructions. You will have to provide:
- the email you used to sign up for the course
- your password for the course
- the course's lesson URL you have navigated through earlier
(in our example:
lesson 01
of theLower Beginner Season 1
course).
-
Alternatively, you can pass the data as parameters when invoking the script:
./language101_scraper.py -u $USERNAME -p $PASSWORD --url YOUR_LESSON_URL
-
The script will start downloading the MP3/MP4/M4V files into the local navigated folder. Any possible errors would be printed out.
-
Output inside folder should look like this:
├─01 - A Formal Japanese Introduction - JapanesePod101 - Dialogue.mp3 ├─01 - A Formal Japanese Introduction - JapanesePod101 - Review.mp3 ├─01 - A Formal Japanese Introduction - JapanesePod101 - Main Lesson.mp3 ├─02 - Which Famous Tokyo Tower is That - JapanesePod101 - Dialogue.mp3 ├─02 - Which Famous Tokyo Tower is That - JapanesePod101 - Main Lesson.mp3 ├─02 - Which Famous Tokyo Tower is That - JapanesePod101 - Review.mp3 ├─03 - Networking in Japan - JapanesePod101 - Dialogue.mp3 ├─03 - Networking in Japan - JapanesePod101 - Main Lesson.mp3 ├─03 - Networking in Japan - JapanesePod101 - Review.mp3 ├─...
-
Any usage of the script is under user's responsibility only. Users of the script must act according to site's terms.
-
As of today, Innovative Language's terms of use does not forbid usage of crawlers or scrapers on any of their sites. This may change in the future, so be aware.
-
If you like the services Innovative Language provides you should consider a monthly subscription. Basic programs start at around $5 per month and include support from native speaker teachers.
-
As with all websites, the site's structure may change in the future and thus, as often happens with scraping scripts, deprecate it. It is not really a question of if the site's source code will change but rather when (so enjoy it while it's still working 😁).
All of the content presented in the websites belongs to the original creators (Innovative Language) and I have nothing to do with it.
The license below refers only to the script and not to the downloaded content.
- 23.03.2022: Added support for basic video downloading (nothing fancy, just m4v and mp4 files) Added error handling for when a lesson library/lesson contents URL is used instead of the first lesson (user is now warned)
- 11.05.2021: Headers and waiting time added, script is alive again.