Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify Crawler navigates provided sources #47

Open
aep7128 opened this issue May 24, 2023 · 0 comments
Open

Verify Crawler navigates provided sources #47

aep7128 opened this issue May 24, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@aep7128
Copy link
Contributor

aep7128 commented May 24, 2023

There's a good chunk of sources in our seeds that aren't being navigated properly. A source could fail for the following reasons:

1.) Domain isn't in the whitelist
2.) Crawler isn't navigating far enough (not enough depth)
3.) exception is thrown when trying to parse the source (HTTP error or some other exception)

We should verify that the seeds URLs are 'crawl-able'

@aep7128 aep7128 added the bug Something isn't working label May 24, 2023
@atsawtelle atsawtelle self-assigned this Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants