Skip to content

[Feature Request]: Crawler all video subtitles (transcripts) from Khan Academy to create a word or sentences list #200

@ghost

Description

Is there an existing issue for this?

  • I have searched the existing issues

Feature Description

crawler all video transcripts from Khan Academy to create a list of 'learn' words or sentences

Web scraping category

EN

  1. MATH: HIGH SCHOOL & COLLEGE https://www.khanacademy.org/math
  2. TEST PREP https://www.khanacademy.org/test-prep
  3. SCIENCE https://www.khanacademy.org/science
  4. COMPUTING https://www.khanacademy.org/computing
  5. ARTS & HUMANITIES https://www.khanacademy.org/humanities
  6. ECONOMICS https://www.khanacademy.org/economics-finance-domain
  7. READING & LANGUAGE ARTS https://www.khanacademy.org/ela
  8. LIFE SKILLS https://www.khanacademy.org/college-careers-more
  9. PARTNER COURSES https://www.khanacademy.org/partner-content

RU

  1. МАТЕМАТИКА https://ru.khanacademy.org/math
  2. ЕСТЕСТВЕННЫЕ НАУКИ https://ru.khanacademy.org/science
  3. ЭКОНОМИКА И ФИНАНСЫ https://ru.khanacademy.org/economics-finance-domain
  4. ИНФОРМАТИКА https://ru.khanacademy.org/computing
  5. ИСКУССТВО И ГУМАНИТАРНЫЕ НАУКИ https://ru.khanacademy.org/humanities

Use Case

Only by studying the 'learn' word list from Khan Academy (subtitles/transcripts) can one fully grasp the knowledge by watching the Khan Academy videos, as learning requires review.

Even someone who doesn't know English at all can study the 'learn' word list and then immediately go to Khan Academy to watch the videos and gain knowledge and skills

Benefits

Contribute to global education, especially for regions where the Khan Academy website does not support their native languages, such as Africa. They can learn from the 'learn' word list and then go to Khan Academy to acquire knowledge

Add ScreenShots

Web scraping steps: 'Enter the web scraping category' (EN, RU), go to 1. MATH: HIGH SCHOOL & COLLEGE, and navigate to the second-level directory.

1

Early math review > Enter the directory Unit 1 > Click the play icon, and the Video transcript at the bottom of the website is the subtitles

2



Combine all the subtitles from the chapters under Early math review into one file, such as Early math review.txt.

Priority

High

Record

  • I have read the Contributing Guidelines
  • I'm a GSSOC'24 contributor
  • I'm a VSoC'24 contributor
  • I have starred the repository

Metadata

Metadata

Assignees

No one assigned

    Labels

    On holdbusy somewhere else

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions