[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup #393 #421

Niraj1608 · 2024-10-23T07:17:12Z

The GitHub Topic Extractor is a Python-based tool that scrapes repositories from GitHub and extracts relevant topics using Natural Language Processing (NLP). Leveraging BeautifulSoup for HTML parsing and NLP techniques such as tokenization and keyword extraction, this tool automates the identification of key topics or themes associated with GitHub repositories.

Features
Scrapes GitHub repositories: Extracts key information such as repository name, description, and associated topics.
Proxy support: Handles IP rotation and proxies to avoid getting blocked during scraping.
Summarization Model: Utilizes a summarization model (like BERT) to condense repository descriptions for further analysis.
NLP Integration: Processes the extracted content using NLP techniques, extracting relevant keywords and insights.

github_scraper.ipynb - Colab.pdf

Niraj1608 · 2024-10-23T07:19:14Z

@suryanshsk kindly review my pr also i scraped readme file and added hugging face ( Summarization) model for summary of readme file check the pdf i attached it and increase the level if you like the project :))
thank you :)

github

28c4b46

suryanshsk merged commit ef73900 into suryanshsk:main Oct 24, 2024
1 check passed

suryanshsk added gssoc-ext hacktoberfest-accepted level2 hacktoberfest labels Oct 24, 2024

Niraj1608 deleted the github branch November 7, 2024 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup #393 #421

[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup #393 #421

Niraj1608 commented Oct 23, 2024

Niraj1608 commented Oct 23, 2024 •

edited

Loading

[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup #393 #421

[FEATURE] GithubScraper:GitHub Repository Extraction Using BeautifulSoup #393 #421

Conversation

Niraj1608 commented Oct 23, 2024

Niraj1608 commented Oct 23, 2024 • edited Loading

Niraj1608 commented Oct 23, 2024 •

edited

Loading