Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Crawler all video subtitles (transcripts) from Khan Academy to create a word or sentences list #200

Open
3 of 5 tasks
atlantis451 opened this issue Jul 19, 2024 · 1 comment

Comments

@atlantis451
Copy link

atlantis451 commented Jul 19, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Feature Description

crawler all video transcripts from Khan Academy to create a list of 'learn' words or sentences

Web scraping category

EN

  1. MATH: HIGH SCHOOL & COLLEGE https://www.khanacademy.org/math
  2. TEST PREP https://www.khanacademy.org/test-prep
  3. SCIENCE https://www.khanacademy.org/science
  4. COMPUTING https://www.khanacademy.org/computing
  5. ARTS & HUMANITIES https://www.khanacademy.org/humanities
  6. ECONOMICS https://www.khanacademy.org/economics-finance-domain
  7. READING & LANGUAGE ARTS https://www.khanacademy.org/ela
  8. LIFE SKILLS https://www.khanacademy.org/college-careers-more
  9. PARTNER COURSES https://www.khanacademy.org/partner-content

RU

  1. МАТЕМАТИКА https://ru.khanacademy.org/math
  2. ЕСТЕСТВЕННЫЕ НАУКИ https://ru.khanacademy.org/science
  3. ЭКОНОМИКА И ФИНАНСЫ https://ru.khanacademy.org/economics-finance-domain
  4. ИНФОРМАТИКА https://ru.khanacademy.org/computing
  5. ИСКУССТВО И ГУМАНИТАРНЫЕ НАУКИ https://ru.khanacademy.org/humanities

Use Case

Only by studying the 'learn' word list from Khan Academy (subtitles/transcripts) can one fully grasp the knowledge by watching the Khan Academy videos, as learning requires review.

Even someone who doesn't know English at all can study the 'learn' word list and then immediately go to Khan Academy to watch the videos and gain knowledge and skills

Benefits

Contribute to global education, especially for regions where the Khan Academy website does not support their native languages, such as Africa. They can learn from the 'learn' word list and then go to Khan Academy to acquire knowledge

Add ScreenShots

Web scraping steps: 'Enter the web scraping category' (EN, RU), go to 1. MATH: HIGH SCHOOL & COLLEGE, and navigate to the second-level directory.

1

Early math review > Enter the directory Unit 1 > Click the play icon, and the Video transcript at the bottom of the website is the subtitles

2



Combine all the subtitles from the chapters under Early math review into one file, such as Early math review.txt.

Priority

High

Record

  • I have read the Contributing Guidelines
  • I'm a GSSOC'24 contributor
  • I'm a VSoC'24 contributor
  • I have starred the repository
Copy link

Hi there! Thanks for opening this issue. We appreciate your contribution to this open-source project. We aim to respond or assign your issue as soon as possible.

@atlantis451 atlantis451 changed the title [Feature Request]: 爬虫 Khan Academy 所有视频 字幕 transcripts [Feature Request]: Crawler all video subtitles (transcripts) from Khan Academy to create a word or sentences list Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant