Replies: 3 comments 6 replies
-
Try to use HtmlToMarkDown Text Splitter, because using recursive text splitter you will still have a lot of html gibberish embeddings. Hopefully this helps - https://docs.flowiseai.com/use-cases/web-scrape-qna |
Beta Was this translation helpful? Give feedback.
-
Improving the accuracy of web scraping data involves refining your scraping technique and understanding the website structure you're targeting. Here are a few tips to enhance accuracy: Inspect the Website Structure: Before you can scrap the HTML structure of the website to understand how the data is organized. This will help you identify the elements you need to target accurately. Use Targeted Selectors: Utilize precise CSS selectors or XPath expressions to target specific elements containing the desired data. Avoid broad selectors that may capture irrelevant information. Handle Dynamic Content: Some websites load content dynamically using JavaScript. Ensure your scraper can handle dynamic content by using tools like Puppeteer or Selenium or by analyzing network requests to replicate AJAX requests. Implement Error Handling: Build error-handling mechanisms into your scraper to gracefully handle unexpected situations, such as missing data or changes in website layout. Regularly Update Scraping Logic: Websites frequently undergo updates that may affect scraping accuracy. Review and update your scraping logic regularly to adapt to any changes. Test and Iterate: Test your scraper with different scenarios and iterate on your scraping logic based on the results. This iterative process can help refine the accuracy of your scraping. By applying these strategies and experimenting with different approaches, you can improve the accuracy of your web scraping data extraction. |
Beta Was this translation helpful? Give feedback.
-
Apify is the solution for pro scraping. There is even a built-in integration to upsert the data to Pinecone. Also, you can schedule your runs, so, per example, you can scraper 1 or 2 times per week any dynamic web. If I have time, I would like to make a short tutorial about it. It is a perfect companion for Flowise. |
Beta Was this translation helpful? Give feedback.
-
Hi
I've just tried using Flowise to interrogate a website (using the Cheerio Web Scraper). I am asking it to find summer offers on Starbucks.com as an example.
The scraper works, and it is pulling in data from Starbucks but it didn't get the context of the offer right, i.e. the offer detail is wrong.
I am assuming there might be a way to improve the accuracy by changing up how the recursive character text splitter works? But I have no idea how.
Any help is appreciated!
Beta Was this translation helpful? Give feedback.
All reactions