Add network test in CI and run demo command (#1552)

This PR enhances the [demo tutorial](https://hydra.family/head-protocol/docs/getting-started/) by enabling `hydra-cluster` benchmarks to run on an active Hydra cluster. **usage** See the newly introduced `network-test.yaml` for the related invocations of pumba and the hydra clients. Supposing they are running, you simply run: ```sh nix run .#legacyPackages.x86_64-linux.hydra-cluster.components.benchmarks.bench-e2e -- \ demo \ --output-directory=$(pwd)/benchmarks \ --scaling-factor=100 \ --timeout=1000s \ --testnet-magic 42 \ --node-socket=${NETWORK_DIR}/node.socket \ --hydra-client=localhost:4001 \ --hydra-client=localhost:4002 \ --hydra-client=localhost:4003 ``` and you will get some statistics on txns confirmed, time taken, etc. **prerequisites** - A Cardano node must be running on specified `node-socket`. - Hydra nodes must be operational on provided `hydra-client` hosts. - There’s no need to pre-seed the keys, as the bench-demo script will automatically fund them using the faucet. - Note that the reference scripts should already be published, and the Hydra nodes must be running with those scripts. **Todo** - [x] Fix the `FIXME` about `> 33` - [x] Remove duplicate seeding - [x] Make sure the entire CI process doesn't fail when the pumba causes the network to fail - [x] Make it so that if it _fails_ the head is closed. - [x] Quick little matrix to run a few different scenarios - [x] Make the bench-e2e fail if it didn't submit all the txns ( ideally would also be able to see visually in the job list; but Github is missing a feature see also actions/runner#2347 ) - [x] Get docker info via `docker inspect` instead of parsing yaml (!) - [x] Make sure `results.csv` is written to the `outputDirectory` not the tmp directory - [x] Upload the results as part of the artifacts - [x] Write the summary out even when it failed ---  * [x] CHANGELOG updated or not needed * [x] Documentation updated or not needed * [x] Haddocks updated or not needed * [x] No new TODOs introduced or explained herafter --------- Co-authored-by: Noon van der Silk <[email protected]> Co-authored-by: Sebastian Nagel <[email protected]>
cardano-scaling · Aug 29, 2024 · b03fa32 · b03fa32
1 parent a51e04c
commit b03fa32
Show file tree

Hide file tree

Showing 21 changed files with 771 additions and 269 deletions.
diff --git a/.github/workflows/network-test.yaml b/.github/workflows/network-test.yaml
@@ -0,0 +1,132 @@
+name: "Network fault tolerance"
+
+on:
+  pull_request:
+  workflow_dispatch:
+    inputs:
+      debug_enabled:
+        type: boolean
+        description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
+        required: false
+        default: false
+
+jobs:
+  network-test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        # Note: At present we can only run for 3 peers; to configure this for
+        # more we need to make the docker-compose spin-up dynamic across
+        # however many we would like to configure here.
+        # Currently this is just a label and does not have any functional impact.
+        peers:          [3]
+        scaling_factor: [10, 50]
+        netem_loss:     [0, 1, 2, 3, 4, 5, 10, 20]
+    name: "Peers: ${{ matrix.peers }}, scaling: ${{ matrix.scaling_factor }}, loss: ${{ matrix.netem_loss }}"
+    steps:
+    - uses: actions/checkout@v4
+      with:
+        submodules: true
+
+    - name: ❄ Prepare nix
+      uses: cachix/install-nix-action@V27
+      with:
+        extra_nix_config: |
+          accept-flake-config = true
+          log-lines = 1000
+
+    - name: ❄ Cachix cache of nix derivations
+      uses: cachix/cachix-action@v15
+      with:
+        name: cardano-scaling
+        authToken: '${{ secrets.CACHIX_CARDANO_SCALING_AUTH_TOKEN }}'
+
+    - name: Build docker images for netem specifically
+      run: |
+        nix build .#docker-hydra-node-for-netem
+        ./result | docker load
+
+    - name: Setup containers for network testing
+      run: |
+        cd demo
+        ./prepare-devnet.sh
+        docker compose up -d cardano-node
+        sleep 5
+        # :tear: socket permissions.
+        sudo chown runner:docker devnet/node.socket
+        ./export-tx-id-and-pparams.sh
+        # Specify two docker compose yamls; the second one overrides the
+        # images to use the netem ones specifically
+        docker compose -f docker-compose.yaml -f docker-compose-netem.yaml up -d hydra-node-{1,2,3}
+        sleep 3
+        docker ps
+
+    - name: Build required nix and docker derivations
+      run: |
+        nix build .#legacyPackages.x86_64-linux.hydra-cluster.components.benchmarks.bench-e2e
+        nix build github:noonio/pumba/noon/add-flake
+
+    # Use tmate to get a shell onto the runner to do some temporary hacking
+    #
+    # <https://github.com/mxschmitt/action-tmate>
+    #
+    - name: Setup tmate session
+      uses: mxschmitt/action-tmate@v3
+      if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled }}
+      with:
+        limit-access-to-actor: true
+
+    - name: Run pumba and the benchmarks
+      # Note: We're going to allow everything to fail. In the job on GitHub,
+      # we will be able to see which ones _did_, in fact, fail. Originally,
+      # we were keeping track of our expectations with 'include' and
+      # 'exclude' directives here, but I think it's best to leave those out,
+      # as some of the tests (say 5%) fail, and overall the conditions of
+      # failure depend on the scaling factor, the peers, etc, and it becomes
+      # too complicated to track here.
+      continue-on-error: true
+      run: |
+        # Extract inputs with defaults for non-workflow_dispatch events
+        percent="${{ matrix.netem_loss }}"
+        scaling_factor="${{ matrix.scaling_factor }}"
+        target_peer="hydra-node-1"
+        other_peers="172.16.238.20 172.16.238.30"
+
+        .github/workflows/network/run_pumba.sh $target_peer $percent $other_peers
+
+        # Run benchmark on demo
+        mkdir benchmarks
+        touch benchmarks/test.log
+
+        nix run .#legacyPackages.x86_64-linux.hydra-cluster.components.benchmarks.bench-e2e -- \
+          demo \
+          --output-directory=benchmarks \
+          --scaling-factor="$scaling_factor" \
+          --timeout=1000s \
+          --testnet-magic 42 \
+          --node-socket=demo/devnet/node.socket \
+          --hydra-client=localhost:4001 \
+          --hydra-client=localhost:4002 \
+          --hydra-client=localhost:4003
+
+    - name: Acquire logs
+      if: always()
+      run: |
+        cd demo
+        docker compose logs > docker-logs
+
+    - name: 💾 Upload logs
+      if: always()
+      uses: actions/upload-artifact@v4
+      with:
+        name: "docker-logs-netem-loss=${{ matrix.netem_loss }},scaling_factor=${{ matrix.scaling_factor }},peers=${{ matrix.peers }}"
+        path: demo/docker-logs
+        if-no-files-found: ignore
+
+    - name: 💾 Upload build & test artifacts
+      if: always()
+      uses: actions/upload-artifact@v4
+      with:
+        name: "benchmarks-netem-loss=${{ matrix.netem_loss }},scaling_factor=${{ matrix.scaling_factor }},peers=${{ matrix.peers }}"
+        path: benchmarks
+        if-no-files-found: ignore
diff --git a/.github/workflows/network/run_pumba.sh b/.github/workflows/network/run_pumba.sh
@@ -0,0 +1,23 @@
+#!/usr/bin/env bash
+
+target_node_name=$1
+
+percent=$2
+
+rest_node_names=$3
+
+# Build Pumba netem command
+# Note: We leave it for 20 minutes; but really it's effectively unlimited. We don't
+# expect any of our tests to run longer than that.
+nix_command="nix run github:noonio/pumba/noon/add-flake -- -l debug netem --duration 20m"
+
+while IFS= read -r network; do
+    nix_command+=" --target $network"
+done <<< "$rest_node_names"
+
+nix_command+=" loss --percent \"$percent\" \"re2:$target_node_name\" &"
+
+echo "$nix_command"
+
+# Run Pumba netem command
+eval "$nix_command"
diff --git a/demo/.gitignore b/demo/.gitignore
@@ -0,0 +1,2 @@
+/benchmarks
+/datasets
diff --git a/demo/docker-compose-netem.yaml b/demo/docker-compose-netem.yaml
@@ -0,0 +1,9 @@
+services:
+  hydra-node-1:
+    image: hydra-node-for-netem
+
+  hydra-node-2:
+    image: hydra-node-for-netem
+
+  hydra-node-3:
+    image: hydra-node-for-netem
diff --git a/demo/docker-compose.yaml b/demo/docker-compose.yaml
@@ -48,6 +48,8 @@ services:
       , "--ledger-protocol-parameters", "/devnet/protocol-parameters.json"
       , "--testnet-magic", "42"
       , "--node-socket", "/devnet/node.socket"
+      , "--persistence-dir", "/devnet/persistence/alice"
+      , "--contestation-period", "3"
       ]
     networks:
       hydra_net:
@@ -83,6 +85,8 @@ services:
       , "--ledger-protocol-parameters", "/devnet/protocol-parameters.json"
       , "--testnet-magic", "42"
       , "--node-socket", "/devnet/node.socket"
+      , "--persistence-dir", "/devnet/persistence/bob"
+      , "--contestation-period", "3"
       ]
     networks:
       hydra_net:
@@ -118,6 +122,8 @@ services:
       , "--ledger-protocol-parameters", "/devnet/protocol-parameters.json"
       , "--testnet-magic", "42"
       , "--node-socket", "/devnet/node.socket"
+      , "--persistence-dir", "/devnet/persistence/carol"
+      , "--contestation-period", "3"
       ]
     networks:
       hydra_net:
@@ -188,7 +194,6 @@ services:
       hydra_net:
         ipv4_address: 172.16.238.5
 
-
 networks:
   hydra_net:
     driver: bridge

diff --git a/demo/export-tx-id-and-pparams.sh b/demo/export-tx-id-and-pparams.sh
@@ -0,0 +1,71 @@
+#!/usr/bin/env bash
+
+set -eo pipefail
+
+SCRIPT_DIR=${SCRIPT_DIR:-$(realpath $(dirname $(realpath $0)))}
+NETWORK_ID=42
+
+CCLI_CMD=
+DEVNET_DIR=/devnet
+if [[ -n ${1} ]]; then
+    echo >&2 "Using provided cardano-cli command: ${1}"
+    $(${1} version > /dev/null)
+    CCLI_CMD=${1}
+    DEVNET_DIR=${SCRIPT_DIR}/devnet
+fi
+
+HYDRA_NODE_CMD=
+if [[ -n ${2} ]]; then
+    echo >&2 "Using provided hydra-node command: ${2}"
+    ${2} --version > /dev/null
+    HYDRA_NODE_CMD=${2}
+fi
+
+# Invoke hydra-node in a container or via provided executable
+function hnode() {
+  if [[ -n ${HYDRA_NODE_CMD} ]]; then
+      ${HYDRA_NODE_CMD} ${@}
+  else
+      docker run --rm \
+        --pull always \
+        -v ${SCRIPT_DIR}/devnet:/devnet \
+        ghcr.io/cardano-scaling/hydra-node:0.18.1 -- ${@}
+  fi
+}
+
+function publishReferenceScripts() {
+  echo >&2 "Publishing reference scripts..."
+  hnode publish-scripts \
+    --testnet-magic ${NETWORK_ID} \
+    --node-socket ${DEVNET_DIR}/node.socket \
+    --cardano-signing-key devnet/credentials/faucet.sk
+}
+
+# Invoke cardano-cli in running cardano-node container or via provided cardano-cli
+function ccli() {
+  ccli_ ${@} --testnet-magic ${NETWORK_ID}
+}
+function ccli_() {
+  if [[ -x ${CCLI_CMD} ]]; then
+      ${CCLI_CMD} ${@}
+  else
+      ${DOCKER_COMPOSE_CMD} exec cardano-node cardano-cli ${@}
+  fi
+}
+
+function queryPParams() {
+  echo >&2 "Query Protocol parameters"
+  if [[ -x ${CCLI_CMD} ]]; then
+     ccli query protocol-parameters --socket-path ${DEVNET_DIR}/node.socket  --out-file /dev/stdout \
+      | jq ".txFeeFixed = 0 | .txFeePerByte = 0 | .executionUnitPrices.priceMemory = 0 | .executionUnitPrices.priceSteps = 0" > devnet/protocol-parameters.json
+   else
+     docker exec demo-cardano-node-1 cardano-cli query protocol-parameters --testnet-magic ${NETWORK_ID} --socket-path ${DEVNET_DIR}/node.socket --out-file /dev/stdout \
+      | jq ".txFeeFixed = 0 | .txFeePerByte = 0 | .executionUnitPrices.priceMemory = 0 | .executionUnitPrices.priceSteps = 0" > devnet/protocol-parameters.json
+  fi
+  echo >&2 "Saved in protocol-parameters.json"
+}
+
+queryPParams
+echo "HYDRA_SCRIPTS_TX_ID=$(publishReferenceScripts)" > .env
+echo >&2 "Environment variable stored in '.env'"
+echo >&2 -e "\n\t$(cat .env)\n"
diff --git a/demo/seed-devnet.sh b/demo/seed-devnet.sh
@@ -43,18 +43,6 @@ function ccli_() {
   fi
 }
 
-# Invoke hydra-node in a container or via provided executable
-function hnode() {
-  if [[ -n ${HYDRA_NODE_CMD} ]]; then
-      ${HYDRA_NODE_CMD} ${@}
-  else
-      docker run --rm -it \
-        --pull always \
-        -v ${SCRIPT_DIR}/devnet:/devnet \
-        ghcr.io/cardano-scaling/hydra-node:0.18.1 -- ${@}
-  fi
-}
-
 # Retrieve some lovelace from faucet
 function seedFaucet() {
     ACTOR=${1}
@@ -89,26 +77,6 @@ function seedFaucet() {
     echo >&2 "Done"
 }
 
-function publishReferenceScripts() {
-  echo >&2 "Publishing reference scripts..."
-  hnode publish-scripts \
-    --testnet-magic ${NETWORK_ID} \
-    --node-socket ${DEVNET_DIR}/node.socket \
-    --cardano-signing-key devnet/credentials/faucet.sk
-}
-
-function queryPParams() {
-  echo >&2 "Query Protocol parameters"
-  if [[ -x ${CCLI_CMD} ]]; then
-     ccli query protocol-parameters --socket-path ${DEVNET_DIR}/node.socket  --out-file /dev/stdout \
-      | jq ".txFeeFixed = 0 | .txFeePerByte = 0 | .executionUnitPrices.priceMemory = 0 | .executionUnitPrices.priceSteps = 0" > devnet/protocol-parameters.json
-   else
-     docker exec demo-cardano-node-1 cardano-cli query protocol-parameters --testnet-magic ${NETWORK_ID} --socket-path ${DEVNET_DIR}/node.socket --out-file /dev/stdout \
-      | jq ".txFeeFixed = 0 | .txFeePerByte = 0 | .executionUnitPrices.priceMemory = 0 | .executionUnitPrices.priceSteps = 0" > devnet/protocol-parameters.json
-  fi
-  echo >&2 "Saved in protocol-parameters.json"
-}
-
 echo >&2 "Fueling up hydra nodes of alice, bob and carol..."
 seedFaucet "alice" 30000000 # 30 Ada to the node
 seedFaucet "bob" 30000000 # 30 Ada to the node
@@ -117,7 +85,5 @@ echo >&2 "Distributing funds to alice, bob and carol..."
 seedFaucet "alice-funds" 100000000 # 100 Ada to commit
 seedFaucet "bob-funds" 50000000 # 50 Ada to commit
 seedFaucet "carol-funds" 25000000 # 25 Ada to commit
-queryPParams
-echo "HYDRA_SCRIPTS_TX_ID=$(publishReferenceScripts)" > .env
-echo >&2 "Environment variable stored in '.env'"
-echo >&2 -e "\n\t$(cat .env)\n"
+
+./export-tx-id-and-pparams.sh
diff --git a/hydra-cluster/README.md b/hydra-cluster/README.md
@@ -140,3 +140,14 @@ The benchmark can be run in two modes corresponding to two different commands:
 * `datasets`: Runs one or more preexisting _datasets_ in sequence and collect their results in a single markdown formatted file. This is useful to track the evolution of hydra-node's performance over some well-known datasets over time and produce a human-readable summary.
 
 Check out `cabal bench --benchmark-options --help` for more details.
+
+# Network Testing
+
+The benchmark can be also run over the running `demo` hydra-cluster, using `cabal bench` and produces a
+`results.csv` file in a work directory. Same as for benchmarks results, you can use the `bench/plot.sh` script to plot the transaction confirmation times.
+
+To run the benchmark in this mode, the command is:
+* `demo`: Runs a single _dataset_ freshly generated and collects its results in a markdown formatted file. The purpose of this setup is to facilitate a variaty of network-resiliance scenarios, such as packet loss or node failures. This is useful to prove the robustness and performance of the hydra-node's network over time and produce a human-readable summary.
+
+For instance, we make use of this in our [CI](https://github.com/cardano-scaling/hydra/blob/master/.github/workflows/network-test.yaml) to keep track for scenarios that we care about.
+