edx-dl not able to download videos from edx platform #559

MATRIX30 · 2019-10-20T03:24:12Z

🚨Please review the Troubleshooting section
before reporting any issue. Don't forget also to check the current issues to
avoid duplicates.

Subject of the issue

edx-dl fails to extract and download videos for "https://courses.edx.org/courses/course-v1:EdinburghX+PA1.1x+3T2019/course/" on www.edx.org
it seems the videos for this course are sourced from "https://media.ed.ac.uk/" and not youtube
Need help on resolving this issue

Your environment

Operating System (name/version):windows 10 Professional
Python version: 3.7.0
youtube-dl version: 2019.09.28
edx-dl version: 0.1.10

Steps to reproduce

--- create an account on Edx

--- enroll for the course "https://courses.edx.org/courses/course-v1:EdinburghX+PA1.1x+3T2019/course/"

---- type the following into CMD
edx-dl -u username -p password -o path --ignore-errors --cache https://courses.edx.org/courses/course-v1:EdinburghX+PA1.1x+3T2019/course/

Expected behaviour

download to start normally

Actual behaviour

edx_dl version 0.1.10
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading Introduction to Predictive Analytics [course-v1:EdinburghX+PA1.1x+3T2019/co]
Downloading 0 section(s)
loading 2329 urls from cache [edx-dl.cache]
Extracting all units information in parallel.
No downloadable video found.

YukunXia · 2019-10-21T05:24:39Z

Having the same issue :(

mor3dr3ad · 2019-10-21T09:58:34Z

Confirmed with different url:
https://courses.edx.org/courses/course-v1:MITx+14.750x+3T2019/course/

Output of --debug:

root[main] edx_dl version 0.1.10
root[parse_file_formats] file_formats: ['e?ps', 'pdf', 'txt', 'doc', 'xls', 'ppt', 'docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp', 'odg', 'zip', 'rar', 'gz', 'mp3', 'R', 'Rmd', 'ipynb', 'py']
root[edx_get_headers] Building initial headers for future requests.
root[_get_initial_token] Getting initial CSRF token.
root[_get_initial_token] Found CSRF token.
root[edx_get_headers] Headers built: {'User-Agent': 'edX-downloader/0.01', 'Accept': 'application/json, text/javascript, /; q=0.01', 'Content-Type': 'application/x-www-form-urlencoded;charset=utf-8', 'Referer': 'https://courses.edx.org/login_ajax', 'X-Requested-With': 'XMLHttpRequest', 'X-CSRFToken': 'PUsSLjqYvxBtMFO07I7RfYRpxPPZdHE0zWBVoJk4aqqo8AOSciOeEoSTr49FvNeH'}
root[edx_login] Logging into Open edX site: https://courses.edx.org/login_ajax
root[get_courses_info] Extracting course information from dashboard.
root[get_courses_info] Data extracted: ["lotsofcourseswhichidontwanttoshare"]
root[get_available_sections] Extracting sections for :https://courses.edx.org/courses/course-v1:MITx+14.750x+3T2019/course/
root[get_available_sections] Extracted sections: []
root[_display_selections] Downloading Political Economy and Economic Development [course-v1:MITx+14.750x+3T2019/co]
root[_display_sections] Downloading 0 section(s)
root[extract_all_units_in_sequence] Extracting all units information in sequentially.
root[extract_all_units_in_sequence] urls: []
root[parse_units] No downloadable video found.

adizukerman · 2019-10-22T22:44:11Z

Same issue with https://courses.edx.org/courses/course-v1:MITx+2.830.2x+3T2019/course/

ozhaggis · 2019-10-24T23:29:20Z

Same issue with multiple courses.

edx_dl version 0.1.10
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading Data Science: Machine Learning [course-v1:HarvardX+PH125.8x+2T2019/co]
Downloading 0 section(s)
Extracting all units information in parallel.
No downloadable video found.

EugeneLoy · 2019-10-26T17:26:29Z

Same issue. Course: https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/

It looks like edx-dl is missing most of the sections of the course. In my example, it sees only 1 section, while edx site displays more than 5 (at the moment):

> edx-dl.py -u <username> --list-sections https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/
edx_dl version 0.1.10
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Fundamentals of Statistics [course-v1:MITx+18.6501x+3T2019/co] has 1 sections so far
 1 - Download Entrance Survey videos

not-lucky · 2019-10-27T07:34:53Z

Here's mine...

abeckman · 2019-10-29T15:52:13Z

Same issue with multiple courses.

lubaroli · 2019-10-30T21:42:47Z

Same issue here, edx-dl only sees the first section.

Heres the log:
root[main] edx_dl version 0.1.10 root[parse_file_formats] file_formats: ['e?ps', 'pdf', 'txt', 'doc', 'xls', 'ppt', 'docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp', 'odg', 'zip', 'rar', 'gz', 'mp3', 'R', 'Rmd', 'ipynb', 'py'] Password: root[edx_get_headers] Building initial headers for future requests. root[_get_initial_token] Getting initial CSRF token. root[_get_initial_token] Found CSRF token. root[edx_get_headers] Headers built: {'User-Agent': 'edX-downloader/0.01', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Content-Type': 'application/x-www-form-urlencoded;charset=utf-8', 'Referer': 'https://courses.edx.org/login_ajax', 'X-Requested-With': 'XMLHttpRequest', 'X-CSRFToken': 'wWr0eKCgnA1uusK8rQvzPJHFK8bXmxn4i1pxyGtnuxsy0MRE8LXYh87mk8DN1eST'} root[edx_login] Logging into Open edX site: https://courses.edx.org/login_ajax root[get_courses_info] Extracting course information from dashboard. root[get_courses_info] Data extracted: [Fundamentals of Statistics: https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/, TOEFL Test Preparation: The Insider’s Guide: https://courses.edx.org/courses/course-v1:ETSx+TOEFLx+3T2017/course/, Minds and Machines: https://courses.edx.org/courses/course-v1:MITx+24.09x+3T2015/course/, Practical Learning Analytics: https://courses.edx.org/courses/course-v1:MichiganX+PLAx+2T2016/course/, Embedded Systems - Shape the World: https://courses.edx.org/courses/course-v1:UTAustinX+UT.6.03x+1T2016/course/, The Science of Everyday Thinking: https://courses.edx.org/courses/course-v1:UQx+Think101x+2T2015/course/, Electronic Interfaces: https://courses.edx.org/courses/course-v1:BerkeleyX+EE40LX+2T2015/course/, Autonomous Navigation for Flying Robots: https://courses.edx.org/courses/TUMx/AUTONAVx/2T2014/course/, Next Generation Infrastructures - Part 2: https://courses.edx.org/courses/DelftX/NGI102x/3T2014/course/, Solar Energy: https://courses.edx.org/courses/DelftX/ET.3034TU/3T2014/course/, Circuits and Electronics: https://courses.edx.org/courses/MITx/6.002_4x/3T2014/course/] root[get_available_sections] Extracting sections for :https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/ root[get_available_sections] Extracted sections: [<edx_dl.common.Section object at 0x1042f6110>] root[_display_selections] Downloading Fundamentals of Statistics [course-v1:MITx+18.6501x+3T2019/co] root[_display_sections] Downloading 1 section(s) root[_display_sections] Section 1: Entrance Survey root[_display_sections] 1. Entrance Survey root[extract_all_units_in_parallel] Extracting all units information in parallel. root[extract_all_units_in_parallel] urls: ['https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/jump_to/block-v1:MITx+18.6501x+3T2019+type@vertical+block@entrancesurvey-tab1'] root[extract_units] Processing 'https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/jump_to/block-v1:MITx+18.6501x+3T2019+type@vertical+block@entrancesurvey-tab1' root[main] Removed 0 duplicated urls from 0 in total root[download] Output directory: Downloaded

wzhuwz · 2019-11-01T11:07:05Z

Looks like edx-dl is missing most of the sections of the course. My case https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/course/.

Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading FA18: Deterministic Optimization [course-v1:GTx+ISYE6669+2T2018/co]
Downloading 5 section(s)
Section 1: Getting Started
Welcome Message
Syllabus
Getting Help
Getting to Know Each Other
Section 2: Discussions and Q&A
Discussions and Q&A Forums
Section 3: Proctoring Information - Verified Learners
Section 4: Midterm Exam - Verified Learners
Section 5: Final Exam - Verified Learners
Extracting all units information in parallel.
Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@b4e0e428596e4a438b61d9c44a66ff45'
Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@6e0eef9f7a9b4eed99ea9c1ad8e37b16'
Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@d827bed0374e46b5a0abe62978b7cca8'
Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@3247cb48d14b4f1e97bb9dd74d1ec8a2'
Processing 'https://courses.edx.org/courses/course-v1:GTx+ISYE6669+2T2018/jump_to/block-v1:GTx+ISYE6669+2T2018+type@vertical+block@c49832c367cc47be96ba15a3ce5e9d8c'
Removed 0 duplicated urls from 0 in total
Output directory: Downloaded

dorianherle · 2019-11-02T12:02:34Z

I have the same issue:

edx_dl version 0.1.10
Password:
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading Introduction to Discrete Choice Models [course-v1:EPFLx+DiscreteChoiceX+3T2017/co]
Downloading 0 section(s)
Extracting all units information in parallel.
No downloadable video found.

mor3dr3ad · 2019-11-04T11:52:00Z

So, I've dug into the code a bit and I think I found the issue: for some courses, edx has again updated the structure of their website. The issue is with line 397 in /edx-dl/.parsing.py

    sections_soup = soup.find_all('li', class_='outline-item section')

In the new format, the sections have a different class, namely "outline-item section scored".

Should be easily fixed. will try to hack sth together, but this better be checked by so experienced.

mor3dr3ad · 2019-11-04T14:57:18Z

Alright, quick fix:

replace as follows in /edx_dl/parsing.py:

Line 385:
subsections_soup = section_soup.find_all('li', class_='vertical outline-item focusable') with subsections_soup = section_soup.find_all('li', class_=['vertical outline-item focusable', 'vertical outline-item focusable scored'])

and line 397:

sections_soup = soup.find_all('li', class_='outline-item section') with sections_soup = soup.find_all('li', class_=['outline-item section', 'outline-item section scored'])

This should work for both the 'old' and new format. Will try to run some tests and create a merge request sometime this week.

not-lucky · 2019-11-04T15:37:14Z

Thanks a lot.
Its working now.

malawadd · 2019-11-04T19:11:08Z

thank you it works now

malawadd · 2019-11-04T19:40:02Z

Alright, quick fix:

replace as follows in /edx_dl/parsing.py:

Line 385:
subsections_soup = section_soup.find_all('li', class_='vertical outline-item focusable') with subsections_soup = section_soup.find_all('li', class_=['vertical outline-item focusable', 'vertical outline-item focusable scored'])

and line 397:

sections_soup = soup.find_all('li', class_='outline-item section') with sections_soup = soup.find_all('li', class_=['outline-item section', 'outline-item section scored'])

This should work for both the 'old' and new format. Will try to run some tests and create a merge request sometime this week.

this partially works , it still misses some weeks and module i tried it on this course

https://courses.edx.org/courses/course-v1:CurtinX+MKT1x+1T2019/course/

and the entire module 3 didnt download

mor3dr3ad · 2019-11-04T20:20:39Z

@malawadd
can you please share error messages/debug info? Do the sections just not download or does it exit with a message?

malawadd · 2019-11-04T20:37:33Z

@mor3dr3ad

it download an empty folder but skips all the content, then processed to downloading the following module and all it's content, there are no error messages or anything

mor3dr3ad · 2019-11-04T21:13:37Z

Just ran the course you mentioned and it seems to be working for me. Will do some more testing this week. In the meanwhile maybe download missing vids manually

malawadd · 2019-11-04T21:18:53Z

@mor3dr3ad
do you mind telling me more about the testing you plan to run , because i would like to try and fix this but am not sure where to start nor what exactly i should look for.

mor3dr3ad · 2019-11-05T09:55:18Z

@malawadd
well for starters you could help by providing some more debugging info by using the --debug flag when running edx with the course you mentioned and providing information.

For me, my fix is working, even with your course. So without being able to reproduce your error I can only assume there is a different issue (maybe using a different version of edx-dl?)

rbrito · 2019-11-05T14:14:50Z

If something fixes a program, why don't you submit your changes as a pull request to fix things (or get things slightly improved) for other users?

mor3dr3ad · 2019-11-05T15:34:50Z

Planning on doing exactly that sometime this week. Just a bit busy right now

…

-------- Original Message -------- From: "Rogério Brito" <[email protected]> Sent: 5 November 2019 15:52:11 CET To: coursera-dl/edx-dl <[email protected]> Cc: mor3dr3ad <[email protected]>, Mention <[email protected]> Subject: Re: [coursera-dl/edx-dl] edx-dl not able to download videos from edx platform (#559) If something fixes a program, why don't you submit your changes as a pull request to fix things (or get things slightly improved) for other users?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #559 (comment)

rbrito · 2019-11-05T16:56:08Z

Thanks, please do and I can do a round of code review and merge everything. That will be awesome!

maxshatskiy · 2019-11-06T20:47:47Z

Hello,

Alright, quick fix:

replace as follows in /edx_dl/parsing.py:

Line 385:
subsections_soup = section_soup.find_all('li', class_='vertical outline-item focusable') with subsections_soup = section_soup.find_all('li', class_=['vertical outline-item focusable', 'vertical outline-item focusable scored'])

and line 397:

sections_soup = soup.find_all('li', class_='outline-item section') with sections_soup = soup.find_all('li', class_=['outline-item section', 'outline-item section scored'])

This should work for both the 'old' and new format. Will try to run some tests and create a merge request sometime this week.

This solution works for many courses, but now old courses are not supported:
https://courses.edx.org/courses/course-v1:KTHx+DTS02.1x+1T2018/course/

adizukerman · 2019-11-06T21:01:41Z

For class https://courses.edx.org/courses/course-v1:MITx+2.830.2x+3T2019/course/ it worked partially. Not all videos and attachments were downloaded.

By the way, thank you to everyone who is working on this. This tool is so helpful as a time saver to allow working on classes offline.

WajdiBenSaad · 2019-11-12T10:17:24Z

Alright, quick fix:

replace as follows in /edx_dl/parsing.py:

Line 385:
subsections_soup = section_soup.find_all('li', class_='vertical outline-item focusable') with subsections_soup = section_soup.find_all('li', class_=['vertical outline-item focusable', 'vertical outline-item focusable scored'])

and line 397:

sections_soup = soup.find_all('li', class_='outline-item section') with sections_soup = soup.find_all('li', class_=['outline-item section', 'outline-item section scored'])

This should work for both the 'old' and new format. Will try to run some tests and create a merge request sometime this week.

This should be integrated into a new release. Edx has changed their website structure and this new change breaks all download operations with edx-dl.

antoniosereno · 2019-12-05T19:52:13Z

Thanks everyone! I'm facing the same issue and unfortunately the solution provided does not work with this course:
https://courses.edx.org/courses/course-v1:EdinburghX+CCSx+3T2019/course/
any hint?

EugeneLoy · 2019-12-06T15:26:50Z

@malawadd I've checked the course you are having problem with and it looks like some of the videos are no longer available:

[download] https://www.youtube.com/watch?v=N9SFeRNAfEA => Downloaded\Digital_Branding_and_Engagement\02-Module_1-_The_Digital_Consumer\02-%(title)s-%(id)s.%(ext)s
Downloading video with URL https://www.youtube.com/watch?v=N9SFeRNAfEA from YouTube.
[youtube] N9SFeRNAfEA: Downloading webpage
[youtube] N9SFeRNAfEA: Downloading video info webpage
WARNING: Unable to extract video title
WARNING: unable to extract description; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
ERROR: This video is no longer available because the YouTube account associated with this video has been terminated.
Sorry about that.

It is likely that your specific problem was caused by deletion of the video from youtube itself, not bug in edx-dl

antoniosereno · 2019-12-06T15:48:20Z

Hi @EugeneLoy , thank you for your help!
May I ask if you were able to download this course?
https://courses.edx.org/courses/course-v1:EdinburghX+CCSx+3T2019/course/
I'm having trouble with it but not with others

EugeneLoy · 2019-12-06T16:41:33Z

@antoniosereno yes, I've been able to download that course.

antoniosereno · 2019-12-06T17:35:51Z

Ok I've downloaded the edx-dl-cummulative, made everything you suggested and now it gives me an HTTP Error 400: Bad Request

Yesterday I was able to access the courses list, now I'm not able anymore..

It there anything I'm missing?

EugeneLoy · 2019-12-06T17:47:27Z

@antoniosereno are you sure you running code from cummulative branch of the repo and not the one installed globally in your system?

The error you are getting looks like the one that should be fixed by #569 .

One way to run code from repo is to cd into repo root and point python to .py file directly, like this:

python edx-dl.py -u <user> <course_url>

If this wont help, please, post the full debug output, so I could figure out what went wrong.

naefl · 2019-12-08T03:47:36Z

Hi @EugeneLoy,

Doesn't work on my end as well.

From your fork root dir:

In:

python edx-dl.py -u <name>@gmail.com https://courses.edx.org/courses/course-v1:DavidsonX+D001x+3T2018/course/

Out:

rses.edx.org/courses/course-v1:DavidsonX+D001x+3T2018/course/ --debug
root[main] edx_dl version 0.1.10
root[parse_file_formats] file_formats: ['e?ps', 'pdf', 'txt', 'doc', 'xls', 'ppt', 'docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp', 'odg', 'zip', 'rar', 'gz', 'mp3', 'R', 'Rmd', 'ipynb', 'py']
Password:
root[edx_get_headers] Building initial headers for future requests.
root[_get_initial_token] Getting initial CSRF token.
Traceback (most recent call last):
  File "edx-dl.py", line 6, in <module>
    edx_dl.main()
  File "/root/workspace/edx-dl/edx_dl/edx_dl.py", line 1000, in main
    headers = edx_get_headers()
  File "/root/workspace/edx-dl/edx_dl/edx_dl.py", line 425, in edx_get_headers
    'X-CSRFToken': _get_initial_token(EDX_HOMEPAGE),
  File "/root/workspace/edx-dl/edx_dl/edx_dl.py", line 167, in _get_initial_token
    opener.open(url)
  File "/opt/conda/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/opt/conda/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/opt/conda/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/opt/conda/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/opt/conda/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

EugeneLoy · 2019-12-08T09:53:00Z

@naefl @antoniosereno I think I know what the problem is. However, I'll need a bit more cooperation from you to make sure, since I cannot reproduce this in my environment.

I've added commit with test fix and some debug output to cummulative branch. Grab it and, please, let me know if this works for you now.

If this won't fix this issue, please post full debug output as before as well as output of the following:

curl -v https://courses.edx.org/user_api/v1/account/login_session/

antoniosereno · 2019-12-08T12:12:52Z

Thank you Eugene..
This is my output when I try to list courses:

(base) C:\edx-dl-cummulative\edx-dl-cummulative>edx-dl -u [email protected] --list-courses edx_dl version 0.1.10 Password: Building initial headers for future requests. Getting initial CSRF token. Traceback (most recent call last): File "c:\users\anton\anaconda3\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\anton\anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\anton\Anaconda3\Scripts\edx-dl.exe\__main__.py", line 9, in <module> File "c:\users\anton\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 1000, in main headers = edx_get_headers() File "c:\users\anton\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 425, in edx_get_headers 'X-CSRFToken': _get_initial_token(EDX_HOMEPAGE), File "c:\users\anton\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 167, in _get_initial_token opener.open(url) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 531, in open response = meth(req, response) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 569, in error return self._call_chain(*args) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "c:\users\anton\anaconda3\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 400: Bad Request

and this one is of the previous line you asked us to launch

`(base) C:\edx-dl-cummulative\edx-dl-cummulative>curl -v https://courses.edx.org/user_api/v1/account/login_session/

Trying 54.85.51.136:443...
TCP_NODELAY set
Connected to courses.edx.org (54.85.51.136) port 443 (#0)

GET /user_api/v1/account/login_session/ HTTP/1.1
Host: courses.edx.org
User-Agent: curl/7.65.3
Accept: /

schannel: failed to decrypt data, need more data
Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Allow: GET, POST, HEAD, OPTIONS
< Cache-control: no-cache="set-cookie"
< Content-Language: en
< Content-Type: application/json
< Date: Sun, 08 Dec 2019 12:09:58 GMT
< P3P: CP="edX does not have a P3P policy. Review our privacy policy at https://edx.org/privacy"
< Server: nginx
< Set-Cookie: csrftoken=5657a4q6CepadqTkeWzFuSVnvpVqaJlrFmdBbyGDtSQZsdL7uRjpUGCCMPSWJVw1; expires=Sun, 06-Dec-2020 12:09:58 GMT; Max-Age=31449600; Path=/; secure
< Set-Cookie: prod-edx-sessionid="1|jnhlbvh7w39f44dwj782otpg3042k98f|3QLYtGo6h2Dw|IjVjZWI5MjkwZjkxZjA4OTg5Y2MwMmFiZTI2Y2JlY2E1NDZiNTNiYjFmMjIyZTEyM2I4NDJhYTE0OGExNDI1MDki:1idvNu:2YNry_Y95HcfEbPNwxVfnjrbwtE"; Domain=.edx.org; expires=Sun, 22-Dec-2019 12:09:58 GMT; httponly; Max-Age=1209600; Path=/; secure
< Set-Cookie: AWSELB=D1EF6B6510E347E5B895826CD53CF4FD55E0CFA9A951F8E39A00AC86C5195B42EB656E552F728A68C9A3299E8F6AFF2A1A23123006583EAE591F65FD084E6693F1009EDC31;PATH=/;MAX-AGE=120
< Strict-Transport-Security: max-age=3600; includeSubDomains
< Vary: Accept-Encoding
< Vary: Cookie, Accept-Language
< X-Content-Type-Options: nosniff
< X-Frame-Options: DENY
< Content-Length: 650
< Connection: keep-alive
<
{"submit_url": "/user_api/v1/account/login_session/", "fields": [{"errorMessages": {}, "supplementalLink": "", "placeholder": "[email protected]", "instructions": "The email address you used to register with edX", "restrictions": {"min_length": 3, "max_length": 254}, "name": "email", "defaultValue": "", "required": true, "label": "Email", "supplementalText": "", "type": "email"}, {"errorMessages": {}, "supplementalLink": "", "placeholder": "", "instructions": "", "restrictions": {"max_length": 5000}, "name": "password", "defaultValue": "", "required": true, "label": "Password", "supplementalText": "", "type": "password"}], "method": "post"}* Connection #0 to host courses.edx.org left intact`

EugeneLoy · 2019-12-08T13:05:53Z

@antoniosereno Thanks, but from your debug output I can say for sure that edx-dl from your environment is used, as indicated by this part of stack trace:

File "c:\users\anton\anaconda3\lib\site-packages\edx_dl\edx_dl.py", line 167, in _get_initial_token opener.open(url)

Please point your python directly to the edx-dl.py from repo to avoid using version that is installed in your system.

Looking at your post, command should look something like this:

C:\edx-dl-cummulative\edx-dl-cummulative>python edx-dl.py -u [email protected] --list-courses

adizukerman · 2019-12-08T13:58:58Z

@EugeneLoy , works great with https://courses.edx.org/courses/course-v1:MITx+2.830.2x+3T2019/course/ , thank you so much for the time and effort! I hope it gets integrated into the master build soon.

antoniosereno · 2019-12-08T20:12:37Z

It worked! I was able to download all the videos in the course! Thank you !
May I ask if there's a command to download not only medias (video and pdf) but also the written contents?

EugeneLoy · 2019-12-08T21:49:45Z

As far as I know if file is "attached" to course page it will be treated a resource by edx-dl and will be downloaded. At least this was my experience so far.

Sometimes, however, you have extra content that is present on the page inline (like errata, tables, extra recitations and text explanations, etc). As far as I understand this is what you interested in.

Now, it just so happens that lately I've been working on a tool that saves this kind of content :)

It is also helpful if you want to save exercises and homework (with explanations), or, any other type of content that is displayed on the course pages.

This tool is meant to complement edx-dl and is called edx-archive and can be found here: https://github.com/EugeneLoy/edx-archive

I only released it recently, so if you guys check it out that would be great!

antoniosereno · 2019-12-08T21:53:40Z

wow, I'll take a look at it! I was initially thinking of doing it manually, but it would be a long work! Thank you Eugene!

naefl · 2019-12-08T21:54:00Z

@EugeneLoy that worked, thanks for troubleshooting!

balta2ar · 2019-12-08T22:07:28Z

@EugeneLoy from your tool's page

-c, --concurrency number of pages to save in parallel (default: 4)

I don't know what's the current state of their implementation on the backend now, but my impression was that hammering edx servers is generally not a good idea. FWIW, couple of years ago they blocked me by IP for several months after me flooding their servers with requests (debugging this edx-dl, by the way). It's not that the ban could not be surmounted, but the message was clear. So if you ask me, it's more of a courtesy to not put extra pressure on them by default. If you're still not convinced, please take your time to read this thread: #377

EugeneLoy · 2019-12-08T22:49:17Z

@balta2ar Thanks, will take my time to read though #377 , however, motivation behind adding concurrency to the tool is not to speed things up on expense of edx servers but to shave some waste time taken by page render.

The tool makes snapshot of the page once it fully rendered (including math processing) and since edx pages can be pretty bloated (I saw pages taking more than a minute to render) this leads to a lot of time being wasted waiting for render (with no network activity).

The actual workload in terms of average request rate is not high and should not cause any issues with default settings. In fact I used much higher concurrency factor and I can say that the memory is much more of a bottleneck candidate than request rate overload.

antoniosereno · 2019-12-18T16:38:51Z

Sorry for the late answer.
Can you please mention the entire procedure to run the edx-archive-master? I'm not able to install it, anaconda prompt says that npm is not recognised as an internal or external command

EugeneLoy · 2019-12-18T17:16:26Z

@antoniosereno Hi.

npm is "node package manager". It is distributed along with node.

If I am not mistaken, you can get node through conda by installing nodejs package. Otherwise, you can get it from here.

Once you get npm on your system, install edx-archive:

npm install edx-archive -g

I'll update readme to clear this npm part shortly.

antoniosereno · 2019-12-22T17:44:21Z

it works perfectly @EugeneLoy ! Thanks a lot, you saved me a big amount of time!

gaber86 · 2020-02-05T10:04:19Z

still empty folders not working with https://courses.edx.org/courses/course-v1:UCSanDiegoX+DSE230x+3T2019a/course/

Navid-Alipour-96 · 2020-02-06T21:20:52Z

i have empty folders i tried the codes above but doesn't work.
https://courses.edx.org/courses/course-v1:CurtinX+IOT4x+3T2019/course/

ghost · 2020-04-29T05:09:32Z

Is there a way to Download a Particular video and not the whole course...

sasidhar22 · 2020-09-01T17:24:39Z

edx_dl version 0.1.13
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "c:\users\asus\appdata\local\programs\python\python38\lib\runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\asus\appdata\local\programs\python\python38\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Asus\AppData\Local\Programs\Python\Python38\Scripts\edx-dl.exe_main.py", line 7, in
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\edx_dl.py", line 1020, in main
all_selections = {selected_course:
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\edx_dl.py", line 1021, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections
page = get_page_contents(url, headers)
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents
result = urlopen(Request(url, None, headers))
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

MuradShafiyev · 2020-09-02T05:16:38Z

edx_dl version 0.1.13
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "c:\users\asus\appdata\local\programs\python\python38\lib\runpy.py", line 193, in _run_module_as_main
return run_code(code, main_globals, None,
File "c:\users\asus\appdata\local\programs\python\python38\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Users\Asus\AppData\Local\Programs\Python\Python38\Scripts\edx-dl.exe__main.py", line 7, in
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\edx_dl.py", line 1020, in main
all_selections = {selected_course:
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\edx_dl.py", line 1021, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections
page = get_page_contents(url, headers)
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents
result = urlopen(Request(url, None, headers))
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Same issue :(

AshMp · 2020-09-11T14:29:43Z

Greetings
please kindly assist with the problem depicted below. I am failing to download courses from edx. I have followed everything that has been given on github's edx-dl page, but I am stuck at the point depicted below. Please kindly assist, the courses on edx are of great help, I don't want the knowledge they offer to pass me by. Thank you.

edx_dl version 0.1.13
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Traceback (most recent call last):
File "c:\users\asus\appdata\local\programs\python\python38\lib\runpy.py", line 193, in _run_module_as_main
return run_code(code, main_globals, None,
File "c:\users\asus\appdata\local\programs\python\python38\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Users\Asus\AppData\Local\Programs\Python\Python38\Scripts\edx-dl.exe__main.py", line 7, in
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\edx_dl.py", line 1020, in main
all_selections = {selected_course:
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\edx_dl.py", line 1021, in
get_available_sections(selected_course.url.replace('info', 'course'),
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\edx_dl.py", line 184, in get_available_sections
page = get_page_contents(url, headers)
File "c:\users\asus\appdata\local\programs\python\python38\lib\site-packages\edx_dl\utils.py", line 58, in get_page_contents
result = urlopen(Request(url, None, headers))
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 640, in http_response
response = self.parent.error(
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 502, in _call_chain
result = func(*args)
File "c:\users\asus\appdata\local\programs\python\python38\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

shivammehta25 mentioned this issue Dec 2, 2019

Course only partially downloaded #567

Closed

M-Oliv3 mentioned this issue Dec 6, 2019

Error 400: Bad Request #568

Closed

EugeneLoy mentioned this issue Dec 6, 2019

Fixed missing course sections after edx updated course page structure #570

Merged

balta2ar closed this as completed in #570 Dec 8, 2019

EugeneLoy mentioned this issue Dec 8, 2019

used more precise login endpoints #574

Open

arsenyspb mentioned this issue Feb 8, 2020

No longer downloading anything, only empty folder structure #587

Closed

sonu-666 mentioned this issue Mar 14, 2021

edx-dl : No downloadable video found #670

Closed

koushikr063 mentioned this issue Sep 24, 2021

Edx-dl is not working: "You can access 0 courses" #679

Open

edx-dl not able to download videos from edx platform #559

edx-dl not able to download videos from edx platform #559

Comments

MATRIX30 commented Oct 20, 2019

Subject of the issue

Your environment

Steps to reproduce

Expected behaviour

Actual behaviour

YukunXia commented Oct 21, 2019

mor3dr3ad commented Oct 21, 2019

adizukerman commented Oct 22, 2019

ozhaggis commented Oct 24, 2019

EugeneLoy commented Oct 26, 2019 • edited Loading

not-lucky commented Oct 27, 2019 • edited Loading

abeckman commented Oct 29, 2019

lubaroli commented Oct 30, 2019

wzhuwz commented Nov 1, 2019 • edited Loading

dorianherle commented Nov 2, 2019

mor3dr3ad commented Nov 4, 2019 • edited Loading

mor3dr3ad commented Nov 4, 2019

not-lucky commented Nov 4, 2019

malawadd commented Nov 4, 2019

malawadd commented Nov 4, 2019

mor3dr3ad commented Nov 4, 2019

malawadd commented Nov 4, 2019

mor3dr3ad commented Nov 4, 2019

malawadd commented Nov 4, 2019

mor3dr3ad commented Nov 5, 2019

rbrito commented Nov 5, 2019

mor3dr3ad commented Nov 5, 2019 via email

rbrito commented Nov 5, 2019

maxshatskiy commented Nov 6, 2019 • edited Loading

adizukerman commented Nov 6, 2019 • edited Loading

WajdiBenSaad commented Nov 12, 2019

antoniosereno commented Dec 5, 2019

EugeneLoy commented Dec 6, 2019

antoniosereno commented Dec 6, 2019

EugeneLoy commented Dec 6, 2019

antoniosereno commented Dec 6, 2019

EugeneLoy commented Dec 6, 2019 • edited Loading

naefl commented Dec 8, 2019 • edited Loading

EugeneLoy commented Dec 8, 2019

antoniosereno commented Dec 8, 2019

EugeneLoy commented Dec 8, 2019 • edited Loading

adizukerman commented Dec 8, 2019

antoniosereno commented Dec 8, 2019

EugeneLoy commented Dec 8, 2019

antoniosereno commented Dec 8, 2019

naefl commented Dec 8, 2019

balta2ar commented Dec 8, 2019

EugeneLoy commented Dec 8, 2019

antoniosereno commented Dec 18, 2019

EugeneLoy commented Dec 18, 2019

antoniosereno commented Dec 22, 2019

gaber86 commented Feb 5, 2020

Navid-Alipour-96 commented Feb 6, 2020

ghost commented Apr 29, 2020

sasidhar22 commented Sep 1, 2020

MuradShafiyev commented Sep 2, 2020

AshMp commented Sep 11, 2020

EugeneLoy commented Oct 26, 2019 •

edited

Loading

not-lucky commented Oct 27, 2019 •

edited

Loading

wzhuwz commented Nov 1, 2019 •

edited

Loading

mor3dr3ad commented Nov 4, 2019 •

edited

Loading

maxshatskiy commented Nov 6, 2019 •

edited

Loading

adizukerman commented Nov 6, 2019 •

edited

Loading

EugeneLoy commented Dec 6, 2019 •

edited

Loading

naefl commented Dec 8, 2019 •

edited

Loading

EugeneLoy commented Dec 8, 2019 •

edited

Loading