-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
edx-dl not able to download videos from edx platform #559
Comments
Having the same issue :( |
Confirmed with different url: Output of --debug: root[main] edx_dl version 0.1.10 |
Same issue with multiple courses. edx_dl version 0.1.10 |
Same issue. Course: https://courses.edx.org/courses/course-v1:MITx+18.6501x+3T2019/course/ It looks like
|
Same issue with multiple courses. |
Same issue here, edx-dl only sees the first section. Heres the log: |
I have the same issue:
|
So, I've dug into the code a bit and I think I found the issue: for some courses, edx has again updated the structure of their website. The issue is with line 397 in /edx-dl/.parsing.py
In the new format, the sections have a different class, namely "outline-item section scored". Should be easily fixed. will try to hack sth together, but this better be checked by so experienced. |
Alright, quick fix: replace as follows in /edx_dl/parsing.py: Line 385: and line 397:
This should work for both the 'old' and new format. Will try to run some tests and create a merge request sometime this week. |
Thanks a lot. |
thank you it works now |
this partially works , it still misses some weeks and module i tried it on this course https://courses.edx.org/courses/course-v1:CurtinX+MKT1x+1T2019/course/ and the entire module 3 didnt download |
@malawadd |
it download an empty folder but skips all the content, then processed to downloading the following module and all it's content, there are no error messages or anything |
Just ran the course you mentioned and it seems to be working for me. Will do some more testing this week. In the meanwhile maybe download missing vids manually |
@mor3dr3ad |
@malawadd For me, my fix is working, even with your course. So without being able to reproduce your error I can only assume there is a different issue (maybe using a different version of edx-dl?) |
If something fixes a program, why don't you submit your changes as a pull request to fix things (or get things slightly improved) for other users? |
Planning on doing exactly that sometime this week. Just a bit busy right now
…-------- Original Message --------
From: "Rogério Brito" <[email protected]>
Sent: 5 November 2019 15:52:11 CET
To: coursera-dl/edx-dl <[email protected]>
Cc: mor3dr3ad <[email protected]>, Mention <[email protected]>
Subject: Re: [coursera-dl/edx-dl] edx-dl not able to download videos from edx platform (#559)
If something fixes a program, why don't you submit your changes as a pull request to fix things (or get things slightly improved) for other users?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#559 (comment)
|
Thanks, please do and I can do a round of code review and merge everything. That will be awesome! |
Hello,
This solution works for many courses, but now old courses are not supported: |
For class https://courses.edx.org/courses/course-v1:MITx+2.830.2x+3T2019/course/ it worked partially. Not all videos and attachments were downloaded. By the way, thank you to everyone who is working on this. This tool is so helpful as a time saver to allow working on classes offline. |
This should be integrated into a new release. Edx has changed their website structure and this new change breaks all download operations with edx-dl. |
Thanks everyone! I'm facing the same issue and unfortunately the solution provided does not work with this course: |
@malawadd I've checked the course you are having problem with and it looks like some of the videos are no longer available:
It is likely that your specific problem was caused by deletion of the video from youtube itself, not bug in |
Hi @EugeneLoy , thank you for your help! |
@antoniosereno yes, I've been able to download that course. |
Ok I've downloaded the edx-dl-cummulative, made everything you suggested and now it gives me an HTTP Error 400: Bad Request Yesterday I was able to access the courses list, now I'm not able anymore.. It there anything I'm missing? |
@antoniosereno are you sure you running code from The error you are getting looks like the one that should be fixed by #569 . One way to run code from repo is to
If this wont help, please, post the full debug output, so I could figure out what went wrong. |
Hi @EugeneLoy, Doesn't work on my end as well. From your fork root dir: In: python edx-dl.py -u <name>@gmail.com https://courses.edx.org/courses/course-v1:DavidsonX+D001x+3T2018/course/ Out: rses.edx.org/courses/course-v1:DavidsonX+D001x+3T2018/course/ --debug
root[main] edx_dl version 0.1.10
root[parse_file_formats] file_formats: ['e?ps', 'pdf', 'txt', 'doc', 'xls', 'ppt', 'docx', 'xlsx', 'pptx', 'odt', 'ods', 'odp', 'odg', 'zip', 'rar', 'gz', 'mp3', 'R', 'Rmd', 'ipynb', 'py']
Password:
root[edx_get_headers] Building initial headers for future requests.
root[_get_initial_token] Getting initial CSRF token.
Traceback (most recent call last):
File "edx-dl.py", line 6, in <module>
edx_dl.main()
File "/root/workspace/edx-dl/edx_dl/edx_dl.py", line 1000, in main
headers = edx_get_headers()
File "/root/workspace/edx-dl/edx_dl/edx_dl.py", line 425, in edx_get_headers
'X-CSRFToken': _get_initial_token(EDX_HOMEPAGE),
File "/root/workspace/edx-dl/edx_dl/edx_dl.py", line 167, in _get_initial_token
opener.open(url)
File "/opt/conda/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/opt/conda/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/opt/conda/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/opt/conda/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/opt/conda/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request |
@naefl @antoniosereno I think I know what the problem is. However, I'll need a bit more cooperation from you to make sure, since I cannot reproduce this in my environment. I've added commit with test fix and some debug output to If this won't fix this issue, please post full debug output as before as well as output of the following:
|
Thank you Eugene..
and this one is of the previous line you asked us to launch `(base) C:\edx-dl-cummulative\edx-dl-cummulative>curl -v https://courses.edx.org/user_api/v1/account/login_session/
|
@antoniosereno Thanks, but from your debug output I can say for sure that
Please point your Looking at your post, command should look something like this:
|
@EugeneLoy , works great with https://courses.edx.org/courses/course-v1:MITx+2.830.2x+3T2019/course/ , thank you so much for the time and effort! I hope it gets integrated into the master build soon. |
It worked! I was able to download all the videos in the course! Thank you ! |
As far as I know if file is "attached" to course page it will be treated a resource by Sometimes, however, you have extra content that is present on the page inline (like errata, tables, extra recitations and text explanations, etc). As far as I understand this is what you interested in. Now, it just so happens that lately I've been working on a tool that saves this kind of content :) It is also helpful if you want to save exercises and homework (with explanations), or, any other type of content that is displayed on the course pages. This tool is meant to complement I only released it recently, so if you guys check it out that would be great! |
wow, I'll take a look at it! I was initially thinking of doing it manually, but it would be a long work! Thank you Eugene! |
@EugeneLoy that worked, thanks for troubleshooting! |
@EugeneLoy from your tool's page
I don't know what's the current state of their implementation on the backend now, but my impression was that hammering edx servers is generally not a good idea. FWIW, couple of years ago they blocked me by IP for several months after me flooding their servers with requests (debugging this edx-dl, by the way). It's not that the ban could not be surmounted, but the message was clear. So if you ask me, it's more of a courtesy to not put extra pressure on them by default. If you're still not convinced, please take your time to read this thread: #377 |
@balta2ar Thanks, will take my time to read though #377 , however, motivation behind adding concurrency to the tool is not to speed things up on expense of edx servers but to shave some waste time taken by page render. The tool makes snapshot of the page once it fully rendered (including math processing) and since edx pages can be pretty bloated (I saw pages taking more than a minute to render) this leads to a lot of time being wasted waiting for render (with no network activity). The actual workload in terms of average request rate is not high and should not cause any issues with default settings. In fact I used much higher concurrency factor and I can say that the memory is much more of a bottleneck candidate than request rate overload. |
Sorry for the late answer. |
@antoniosereno Hi.
If I am not mistaken, you can get node through conda by installing Once you get
I'll update readme to clear this |
it works perfectly @EugeneLoy ! Thanks a lot, you saved me a big amount of time! |
still empty folders not working with https://courses.edx.org/courses/course-v1:UCSanDiegoX+DSE230x+3T2019a/course/ |
i have empty folders i tried the codes above but doesn't work. |
Is there a way to Download a Particular video and not the whole course... |
edx_dl version 0.1.13 |
Same issue :( |
Greetings edx_dl version 0.1.13 |
🚨Please review the Troubleshooting section
before reporting any issue. Don't forget also to check the current issues to
avoid duplicates.
Subject of the issue
edx-dl fails to extract and download videos for "https://courses.edx.org/courses/course-v1:EdinburghX+PA1.1x+3T2019/course/" on www.edx.org
it seems the videos for this course are sourced from "https://media.ed.ac.uk/" and not youtube
Need help on resolving this issue
Your environment
Steps to reproduce
--- create an account on Edx
--- enroll for the course "https://courses.edx.org/courses/course-v1:EdinburghX+PA1.1x+3T2019/course/"
---- type the following into CMD
edx-dl -u username -p password -o path --ignore-errors --cache https://courses.edx.org/courses/course-v1:EdinburghX+PA1.1x+3T2019/course/
Expected behaviour
download to start normally
Actual behaviour
edx_dl version 0.1.10
Building initial headers for future requests.
Getting initial CSRF token.
Found CSRF token.
Logging into Open edX site: https://courses.edx.org/login_ajax
Extracting course information from dashboard.
Downloading Introduction to Predictive Analytics [course-v1:EdinburghX+PA1.1x+3T2019/co]
Downloading 0 section(s)
loading 2329 urls from cache [edx-dl.cache]
Extracting all units information in parallel.
No downloadable video found.
The text was updated successfully, but these errors were encountered: