From 28f970e1d004a22b62f27ddb4b2a62d844f8a913 Mon Sep 17 00:00:00 2001 From: Dennis Trautwein Date: Fri, 18 Oct 2024 16:55:34 +0200 Subject: [PATCH] Update README --- README.md | 62 +++++++++++++++++++++++++++++++++-------- cmd/nebula/cmd_crawl.go | 2 +- 2 files changed, 52 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 8126aee..219eac6 100644 --- a/README.md +++ b/README.md @@ -12,28 +12,34 @@ A network agnostic DHT crawler and monitor. The crawler connects to [DHT](https: - [IPFS](https://ipfs.network) - [_Amino DHT_](https://blog.ipfs.tech/2023-09-amino-refactoring/) - [Ethereum](https://ethereum.org/en/) - [_Consensus Layer_](https://ethereum.org/uz/developers/docs/networking-layer/#consensus-discovery) -- [Ethereum](https://ethereum.org/en/) - [_Testnet Holesky_](https://github.com/eth-clients/holesky) (alpha) +- [Ethereum](https://ethereum.org/en/) - [_Execution Layer_](https://ethereum.org/uz/developers/docs/networking-layer/#discovery) - [Filecoin](https://filecoin.io) - [Polkadot](https://polkadot.network/) - [Kusama](https://kusama.network/) - [Rococo](https://substrate.io/developers/rococo-network/) - [Westend](https://wiki.polkadot.network/docs/maintain-networks#westend-test-network) +- [Avail](https://www.availproject.org/) - [Celestia](https://celestia.org/) - [_Mainnet_](https://blog.celestia.org/celestia-mainnet-is-live/) - [Celestia](https://celestia.org/) - [_Arabica_](https://github.com/celestiaorg/celestia-node/blob/9c0a5fb0626ada6e6cdb8bcd816d01a3aa5043ad/nodebuilder/p2p/bootstrap.go#L40) - [Celestia](https://celestia.org/) - [_Mocha_](https://docs.celestia.org/nodes/mocha-testnet) - [Pactus](https://pactus.org) -_The crawler was:_ +The crawler was: - 🏆 _awarded a prize in the [DI2F Workshop hackathon](https://research.protocol.ai/blog/2021/decentralising-the-internet-with-ipfs-and-filecoin-di2f-a-report-from-the-trenches/)._ 🏆 - 🎓 _used for the ACM SigCOMM'22 paper [Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web](https://research.protocol.ai/publications/design-and-evaluation-of-ipfs-a-storage-layer-for-the-decentralized-web/trautwein2022.pdf)_ 🎓 -📊 [ProbeLab](https://probelab.io) is publishing weekly reports for the IPFS Amino DHT based on the crawl results [here](https://github.com/protocol/network-measurements/tree/master/reports)! 📊 +Nebula powers: +- 📊 _the weekly reports for the IPFS Amino DHT [here](https://github.com/protocol/network-measurements/tree/master/reports)!_ 📊 +- 🌐 _many graphs on [probelab.io](https://probelab.io) for most of the supported networks above_ 🌐 -📺 You can find a demo on YouTube: [Nebula: A Network Agnostic DHT Crawler](https://www.youtube.com/watch?v=QDgvCBDqNMc) 📺 + +You can find a demo on YouTube: [Nebula: A Network Agnostic DHT Crawler](https://www.youtube.com/watch?v=QDgvCBDqNMc) 📺 ![Screenshot from a Grafana dashboard](./docs/grafana-screenshot.png) +_Grafana Dashboard is not part of this repository_ + ## Table of Contents - [Table of Contents](#table-of-contents) @@ -156,6 +162,8 @@ nebula --db-user nebula_test --db-name nebula_test monitor When Nebula is configured to store its results in a postgres database, then it also tracks session information of remote peers. A session is one continuous streak of uptime (see below). +However, this is not implemented for all supported networks. The [ProbeLab](https://probelab.network) team is using the monitoring feature for the IPFS, Celestia, Filecoin, and Avail networks. Most notably, the Ethereum discv4/discv5 monitoring implementation still needs some work. + --- There are a few more command line flags that are documented when you run`nebula --help` and `nebula crawl --help`: @@ -170,8 +178,6 @@ random `PeerIDs` with common prefix lengths (CPL) that fall each peers buckets, closer (XOR distance) to the ones `nebula` just constructed. This will effectively yield a list of all `PeerIDs` that a peer has in its routing table. The process repeats for all found peers until `nebula` does not find any new `PeerIDs`. -This process is heavily inspired by the `basic-crawler` in [libp2p/go-libp2p-kad-dht](https://github.com/libp2p/go-libp2p-kad-dht/tree/master/crawler) from [@aschmahmann](https://github.com/aschmahmann). - If Nebula is configured to store its results in a database, every peer that was visited is written to it. The visit information includes latency measurements (dial/connect/crawl durations), current set of multi addresses, current agent version and current set of supported protocols. If the peer was dialable `nebula` will also create a `session` instance that contains the following information: @@ -223,7 +229,8 @@ CREATE TABLE sessions ( At the end of each crawl `nebula` persists general statistics about the crawl like the total duration, dialable peers, encountered errors, agent versions etc... -> **Info:** You can use the `crawl` sub-command with the global `--dry-run` option that skips any database operations. +> [!TIP] +> You can use the `crawl` sub-command with the global `--dry-run` option that skips any database operations. Command line help page: @@ -296,10 +303,10 @@ OPTIONS: ## Development -To develop this project, you need Go `1.19` and the following tools: +To develop this project, you need Go `1.23` and the following tools: - [`golang-migrate/migrate`](https://github.com/golang-migrate/migrate) to manage the SQL migration `v4.15.2` -- [`volatiletech/sqlboiler`](https://github.com/volatiletech/sqlboiler) to generate Go ORM `v4.14.2` +- [`volatiletech/sqlboiler`](https://github.com/volatiletech/sqlboiler) to generate Go ORM `v4.14.1` - `docker` to run a local postgres instance To install the necessary tools you can run `make tools`. This will use the `go install` command to download and install the tools into your `$GOPATH/bin` directory. So make sure you have it in your `$PATH` environment variable. @@ -312,7 +319,8 @@ You need a running postgres instance to persist and/or read the crawl results. R docker run --rm -p 5432:5432 -e POSTGRES_PASSWORD=password -e POSTGRES_USER=nebula_test -e POSTGRES_DB=nebula_test --name nebula_test_db postgres:14 ``` -> **Info:** You can use the `crawl` sub-command with the global `--dry-run` option that skips any database operations or store the results as JSON files with the `--json-out` flag. +> [!TIP] +> You can use the `crawl` sub-command with the global `--dry-run` option that skips any database operations or store the results as JSON files with the `--json-out` flag. The default database settings for local development are: @@ -350,7 +358,7 @@ migrate create -ext sql -dir pkg/db/migrations -seq some_migration_name To run the tests you need a running test database instance: ```shell -make database +make database # or make databased (note the d suffix for "daemon") to start the DB in the background make test ``` @@ -376,6 +384,38 @@ The following presentation shows a ways to use Nebula by showcasing crawls of th [![Nebula: A Network Agnostic DHT Crawler - Dennis Trautwein](https://img.youtube.com/vi/QDgvCBDqNMc/0.jpg)](https://www.youtube.com/watch?v=QDgvCBDqNMc) +## Networks + +> [!NOTE] +> This section is work-in-progress and doesn't include information about all networks yet. + +The following sections document our experience with crawling the different networks. + +### Ethereum Execution (disv4) + +Under the hood Nebula uses packages from [`go-ethereum`](https://github.com/ethereum/go-ethereum) to facilitate peer +communication. Mostly, Nebula relies on the [discover package](https://github.com/ethereum/go-ethereum/tree/master/p2p/discover). +However, we made quite a few changes to the implementation that can be found in +our fork of `go-ethereum` [here](https://github.com/probe-lab/go-ethereum/tree/nebula) in the `nebula` branch. + +Most notably, the custom changes include: + +- export of internal constants, functions, methods and types to customize their behaviour or call them directly +- changes to the response matcher logic. UDP packets won't be forwarded to all matchers. This was required so that + concurrent requests to the same peer don't lead to unhandled packets + +Deployment recommendations: + +- CPUs: 4 (better 8) +- Memory > 4 GB +- UDP Read Buffer size >1 MiB (better 4 MiB) via the `--udp-buffer-size=4194304` command line flag or corresponding environment variable `NEBULA_UDP_BUFFER_SIZE`. + You might need to adjust the maximum buffer size on Linux, so that the flag takes effect: + ```shell + sysctl -w net.core.rmem_max=8388608 # 8MiB + ``` +- UDP Response timeout of `3s` (default) +- Workers: 3000 + ## Maintainers [@dennis-tra](https://github.com/dennis-tra). diff --git a/cmd/nebula/cmd_crawl.go b/cmd/nebula/cmd_crawl.go index f0f4301..4d42e88 100644 --- a/cmd/nebula/cmd_crawl.go +++ b/cmd/nebula/cmd_crawl.go @@ -42,7 +42,7 @@ var crawlConfig = &config.Crawl{ AddrDialTypeStr: "public", KeepENR: false, CheckExposed: false, - Discv4RespTimeout: 2 * time.Second, + Discv4RespTimeout: 3 * time.Second, } // CrawlCommand contains the crawl sub-command configuration.