Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup #393

Closed
Niraj1608 opened this issue Oct 22, 2024 · 3 comments · Fixed by #421
Closed

✨[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup #393

Niraj1608 opened this issue Oct 22, 2024 · 3 comments · Fixed by #421

Comments

@Niraj1608
Copy link
Contributor

Project Overview
The GitHub Topic Extractor is a Python-based tool that scrapes repositories from GitHub and extracts relevant topics using Natural Language Processing (NLP). Leveraging BeautifulSoup for HTML parsing and NLP techniques such as tokenization and keyword extraction, this tool automates the identification of key topics or themes associated with GitHub repositories.

Features

  • Scrapes GitHub repositories: Extracts key information such as repository name, description, and associated topics.
  • Proxy support: Handles IP rotation and proxies to avoid getting blocked during scraping.
  • Summarization Model: Utilizes a summarization model (like BERT) to condense repository descriptions for further analysis.
  • NLP Integration: Processes the extracted content using NLP techniques, extracting relevant keywords and insights.
@Niraj1608
Copy link
Contributor Author

@suryanshsk assign me this issue :)

@Niraj1608
Copy link
Contributor Author

@suryanshsk can you check my pr pls

suryanshsk added a commit that referenced this issue Oct 24, 2024
[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup #393
@suryanshsk
Copy link
Owner

i have already checked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants