Filter out Cloudflare error pages + performance improvement #41
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Happy New Year! 🎉
This PR adds a new
-filter-cf-errors
flag to httprobe which causes it to read the first 512 bytes from listening servers in order to determine if they are a "wildcard" response from Cloudflare like the ones shown here:If a Cloudflare signature string is found, the function returns false to treat it as not listening. The functionality is implemented in a generic fashion so that it can easily be extended to filter out other common false-positive responses.
The PR also implements a small performance improvement by performing a DNS lookup on incoming domains and ignore the unresolvable ones to avoid filling up the job channels with dead domains. I timed the execution with a list containing 1483 resolveable and unresolvable domains with the
large
port list and was able to shave off 7 minutes of execution time, which is not a lot, but also not insignificant: