Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce deduplication key #80

Merged
merged 6 commits into from
Oct 23, 2024
Merged

Introduce deduplication key #80

merged 6 commits into from
Oct 23, 2024

Conversation

dennis-tra
Copy link
Owner

This PR introduces the concept of deduplication keys.

A deduplication key is a unique string used for deduplication of crawl tasks. For example, in discv4 and discv5 we might want to crawl the same peer (as identified by its public key) multiple times when we find new ENR's for it. If the deduplication key was just the public key, we would only crawl it only once. If we later find newer ENR's for the same peer with different network addresses, we would skip that peer. On the other hand, if the deduplication key was the entire ENR, we would crawl the same peer with different (potentially newer) connectivity information again.

@dennis-tra dennis-tra force-pushed the deduplication-key branch 8 times, most recently from d7d0d1a to ef3206d Compare October 23, 2024 14:00
@dennis-tra dennis-tra merged commit 15a220b into main Oct 23, 2024
1 check failed
@dennis-tra dennis-tra deleted the deduplication-key branch October 23, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant