Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup #393 #421

Merged
merged 1 commit into from
Oct 24, 2024

Conversation

Niraj1608
Copy link
Contributor

fix: #393

The GitHub Topic Extractor is a Python-based tool that scrapes repositories from GitHub and extracts relevant topics using Natural Language Processing (NLP). Leveraging BeautifulSoup for HTML parsing and NLP techniques such as tokenization and keyword extraction, this tool automates the identification of key topics or themes associated with GitHub repositories.

Features
Scrapes GitHub repositories: Extracts key information such as repository name, description, and associated topics.
Proxy support: Handles IP rotation and proxies to avoid getting blocked during scraping.
Summarization Model: Utilizes a summarization model (like BERT) to condense repository descriptions for further analysis.
NLP Integration: Processes the extracted content using NLP techniques, extracting relevant keywords and insights.

github_scraper.ipynb - Colab.pdf

@Niraj1608
Copy link
Contributor Author

Niraj1608 commented Oct 23, 2024

@suryanshsk kindly review my pr also i scraped readme file and added hugging face ( Summarization) model for summary of readme file check the pdf i attached it and increase the level if you like the project :))
thank you :)

@suryanshsk suryanshsk merged commit ef73900 into suryanshsk:main Oct 24, 2024
1 check passed
@Niraj1608 Niraj1608 deleted the github branch November 7, 2024 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

✨[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup
2 participants