Web Scraping Documentation and Llama-Index Integration #43

AsH1605 · 2024-11-06T18:44:58Z

Issue Reference: Resolves #42

Changes:

Added functionality to scrape documentation from the given website, including sublinks.
Stored scraped content in Llama-Index Document objects with metadata (URLs).
Compiled the documents into a full Llama-Index for indexing.
Created a Colab notebook with sample usage to demonstrate the scraper in action.

Challenges:

Overcame issues with deprecated functions in libraries like GPTSimpleVectorIndex.
Ensured all relative URLs were correctly resolved using urljoin.
Managed to implement a recursive scraper to handle multiple pages linked from the main documentation.

Please review the changes and let me know if further adjustments are needed.

debrupf2946 · 2024-11-25T01:31:11Z

Hi, @AsH1605 Thanks for the contribution. I was little busy I am reviewing your code.
Can you please test your implementation on Keras documentation?
Show your implementation on note-book that you have added in PR.

debrupf2946 · 2024-11-25T01:34:28Z

@AsH1605 please sign all your commits before creating a PR(you can watch a youtube tutorial), also you should have made your commits in the different branch not directly on the main.

AsH1605 · 2024-11-26T10:41:43Z

@debrupf2946 I have made the changes please review in next PR.

AsH1605 added 4 commits November 6, 2024 22:46

Issue Resolved- Modules created

32300e2

Notebook Execution done

a91c585

Notebook completed

1652c77

Output

1bfa341

Changed iteration to all documents

07a4aa7

AsH1605 closed this Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web Scraping Documentation and Llama-Index Integration #43

Web Scraping Documentation and Llama-Index Integration #43

AsH1605 commented Nov 6, 2024

debrupf2946 commented Nov 25, 2024

debrupf2946 commented Nov 25, 2024

AsH1605 commented Nov 26, 2024

Web Scraping Documentation and Llama-Index Integration #43

Web Scraping Documentation and Llama-Index Integration #43

Conversation

AsH1605 commented Nov 6, 2024

Changes:

Challenges:

debrupf2946 commented Nov 25, 2024

debrupf2946 commented Nov 25, 2024

AsH1605 commented Nov 26, 2024