Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RelayMiner]: add proxy.Ping(...) capability to test connectivity between relay servers and backend URLs #744

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
825 changes: 748 additions & 77 deletions Makefile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were any of the makefile changes intentional? Otherwise, let's revert. I split the makefile up in #816.

Large diffs are not rendered by default.

23 changes: 18 additions & 5 deletions Tiltfile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here as above.

Original file line number Diff line number Diff line change
Expand Up @@ -224,16 +224,28 @@
actor_number = 0
for x in range(localnet_config["relayminers"]["count"]):
actor_number = actor_number + 1
helm_resource(
"relayminer" + str(actor_number),
chart_prefix + "relayminer",
flags=[

flags = [
"--values=./localnet/kubernetes/values-common.yaml",
"--values=./localnet/kubernetes/values-relayminer-common.yaml",
"--values=./localnet/kubernetes/values-relayminer-" + str(actor_number) + ".yaml",
"--set=metrics.serviceMonitor.enabled=" + str(localnet_config["observability"]["enabled"]),
"--set=development.delve.enabled=" + str(localnet_config["relayminers"]["delve"]["enabled"]),
],
]

if localnet_config["rest"]["enabled"]:
flags.append("--values=./localnet/kubernetes/values-relayminer-" + str(actor_number) + "-rest" + ".yaml")

if localnet_config["ollama"]["enabled"]:
flags.append("--values=./localnet/kubernetes/values-relayminer-" + str(actor_number) + "-ollama" + ".yaml")

if localnet_config["rest"]["enabled"] and localnet_config["ollama"]["enabled"]:
flags.append("--values=./localnet/kubernetes/values-relayminer-" + str(actor_number) + "-all" + ".yaml")

helm_resource(
"relayminer" + str(actor_number),
chart_prefix + "relayminer",
flags=flags,
image_deps=["poktrolld"],
image_keys=[("image.repository", "image.tag")],
)
Expand All @@ -257,6 +269,7 @@
# Use with pprof like this: `go tool pprof -http=:3333 http://localhost:6070/debug/pprof/goroutine`
str(6069 + actor_number)
+ ":6060", # Relayminer pprof port. relayminer1 - exposes 6070, relayminer2 exposes 6071, etc.
str(7000 + actor_number) + ":8081", # Relayminer ping port. relayminer1 - exposes 7001, relayminer2 exposes 7002, ect.

Check warning on line 272 in Tiltfile

View workflow job for this annotation

GitHub Actions / Check misspelling

[misspell] reported by reviewdog 🐶 "ect" is a misspelling of "etc" Raw Output: ./Tiltfile:272:126: "ect" is a misspelling of "etc"
],
)

Expand Down
16 changes: 16 additions & 0 deletions docusaurus/docs/operate/configs/relayminer_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ You can find a fully featured example configuration at [relayminer_config_full_e
- [`smt_store_path`](#smt_store_path)
- [`metrics`](#metrics)
- [`pprof`](#pprof)
- [`ping`](#ping)
- [Pocket node connectivity](#pocket-node-connectivity)
- [`query_node_rpc_url`](#query_node_rpc_url)
- [`query_node_grpc_url`](#query_node_grpc_url)
Expand Down Expand Up @@ -173,6 +174,21 @@ pprof:

You can learn how to use that endpoint on the [Performance Troubleshooting](../../develop/developer_guide/performance_troubleshooting.md) page.

### `ping`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. 😎


Configures a `ping` server to test the connectivity of every backend URLs. If
all the backend URLs are reachable, the endpoint returns a 200 HTTP
Code. Otherwise, if one or more backend URLs aren't reachable, the service
returns an 500 HTTP Internal server error.
Comment on lines +179 to +182
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using 204 No Content for the success and 503 Service Unavailable for the error status codes? I think they could be more appropriate than 200 and 500.

Suggested change
Configures a `ping` server to test the connectivity of every backend URLs. If
all the backend URLs are reachable, the endpoint returns a 200 HTTP
Code. Otherwise, if one or more backend URLs aren't reachable, the service
returns an 500 HTTP Internal server error.
Configures a `ping` server to test the connectivity of all backend URLs. If
all the backend URLs are reachable, the endpoint returns a 200 HTTP
Code. If one or more backend URLs aren't reachable, the service
returns an 500 HTTP Internal server error.


Example configuration:

```yaml
ping:
enabled: true
addr: localhost:8081
```

## Pocket node connectivity

```yaml
Expand Down
20 changes: 20 additions & 0 deletions localnet/kubernetes/values-relayminer-1-all.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional diff?

Same goes for all files in the loalnet/kubernetes directory.

One simple way to revert these would be to do something like:

git fetch origin
git checkout origin/main -- localnet/kubernetes
git add localnet/kubernetes
git commit

Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
config:
suppliers:
- service_id: anvil
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://anvil:8547/
publicly_exposed_endpoints:
- relayminer1
- service_id: rest
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://rest:10000/
publicly_exposed_endpoints:
- relayminer1
- service_id: ollama
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://ollama:11434/
publicly_exposed_endpoints:
- relayminer1
14 changes: 14 additions & 0 deletions localnet/kubernetes/values-relayminer-1-ollama.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
config:
suppliers:
- service_id: anvil
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://anvil:8547/
publicly_exposed_endpoints:
- relayminer1
- service_id: ollama
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://ollama:11434/
publicly_exposed_endpoints:
- relayminer1
14 changes: 14 additions & 0 deletions localnet/kubernetes/values-relayminer-1-rest.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
config:
suppliers:
- service_id: anvil
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://anvil:8547/
publicly_exposed_endpoints:
- relayminer1
- service_id: rest
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://rest:10000/
publicly_exposed_endpoints:
- relayminer1
12 changes: 0 additions & 12 deletions localnet/kubernetes/values-relayminer-1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,3 @@ config:
backend_url: http://anvil:8547/
publicly_exposed_endpoints:
- relayminer1
- service_id: ollama
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://ollama:11434/
publicly_exposed_endpoints:
- relayminer1
- service_id: rest
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://rest:10000/
publicly_exposed_endpoints:
- relayminer1
14 changes: 14 additions & 0 deletions localnet/kubernetes/values-relayminer-2-ollama.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
config:
suppliers:
- service_id: anvil
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://anvil:8547/
publicly_exposed_endpoints:
- relayminer2
- service_id: ollama
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://ollama:11434/
publicly_exposed_endpoints:
- relayminer2
6 changes: 0 additions & 6 deletions localnet/kubernetes/values-relayminer-2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,3 @@ config:
backend_url: http://anvil:8547/
publicly_exposed_endpoints:
- relayminer2
- service_id: ollama
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://ollama:11434/
publicly_exposed_endpoints:
- relayminer2
14 changes: 14 additions & 0 deletions localnet/kubernetes/values-relayminer-3-ollama.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
config:
suppliers:
- service_id: anvil
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://anvil:8547/
publicly_exposed_endpoints:
- relayminer3
- service_id: ollama
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://ollama:11434/
publicly_exposed_endpoints:
- relayminer3
6 changes: 0 additions & 6 deletions localnet/kubernetes/values-relayminer-3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,3 @@ config:
backend_url: http://anvil:8547/
publicly_exposed_endpoints:
- relayminer3
- service_id: ollama
listen_url: http://0.0.0.0:8545
service_config:
backend_url: http://ollama:11434/
publicly_exposed_endpoints:
- relayminer3
3 changes: 3 additions & 0 deletions localnet/kubernetes/values-relayminer-common.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,6 @@ config:
pprof:
enabled: true
addr: localhost:6060
ping:
enabled: true
addr: localhost:8081
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add something to the deployment and/or service, and port forwards in tilt to expose this locally?

cc @okdas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this file used?

3 changes: 3 additions & 0 deletions localnet/poktrolld/config/relayminer_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,6 @@ suppliers:
pprof:
enabled: false
addr: localhost:6060
ping:
enabled: false
addr: localhost:8082
6 changes: 6 additions & 0 deletions localnet/poktrolld/config/relayminer_config_full_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ pprof:
enabled: false
addr: localhost:6060

# Ping server configuration to test the connectivity of every
# suppliers.[].service_config.backend_url
ping:
enabled: false
addr: localhost:8081

pocket_node:
# Pocket node URL exposing the CometBFT JSON-RPC API.
# Used by the Cosmos client SDK, event subscriptions, etc.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,6 @@ suppliers:
pprof:
enabled: false
addr: localhost:6070
ping:
enabled: false
addr: localhost:8081
12 changes: 12 additions & 0 deletions pkg/relayer/cmd/cmd.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import (
"context"
"errors"
"fmt"
"net"
"net/http"
"net/url"
"os"
Expand Down Expand Up @@ -134,6 +135,17 @@ func runRelayer(cmd *cobra.Command, _ []string) error {
}
}

if relayMinerConfig.Ping.Enabled {
eddyzags marked this conversation as resolved.
Show resolved Hide resolved
ln, err := net.Listen("tcp", relayMinerConfig.Ping.Addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any kind of validation on the address? What's the UX like if it's invalid?

if err != nil {
return fmt.Errorf("failed to listen ping server: %w", err)
}

if err := relayMiner.ServePing(ctx, ln); err != nil {
return fmt.Errorf("failed to start ping server: %w", err)
}
Comment on lines +139 to +146
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we extract this to a function please?

}

// Start the relay miner
logger.Info().Msg("Starting relay miner...")
if err := relayMiner.Start(ctx); err != nil && !errors.Is(err, http.ErrServerClosed) {
Expand Down
5 changes: 5 additions & 0 deletions pkg/relayer/config/relayminer_configs_reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ func ParseRelayMinerConfigs(configContent []byte) (*RelayMinerConfig, error) {
Addr: yamlRelayMinerConfig.Pprof.Addr,
}

relayMinerConfig.Ping = &RelayMinerPingConfig{
Enabled: yamlRelayMinerConfig.Ping.Enabled,
Addr: yamlRelayMinerConfig.Ping.Addr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt about doing something like net.LookupAddr() to differentiate a ping failure from DNS failure?

}

// Hydrate the pocket node urls
if err := relayMinerConfig.HydratePocketNodeUrls(&yamlRelayMinerConfig.PocketNode); err != nil {
return nil, err
Expand Down
15 changes: 15 additions & 0 deletions pkg/relayer/config/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,13 @@ type YAMLRelayMinerConfig struct {
Pprof YAMLRelayMinerPprofConfig `yaml:"pprof"`
SmtStorePath string `yaml:"smt_store_path"`
Suppliers []YAMLRelayMinerSupplierConfig `yaml:"suppliers"`
Ping YAMLRelayMinerPingConfig `yaml:"ping"`
}

// YAMLRelayMinerPingConfig represents the configuration to expose a ping server.
type YAMLRelayMinerPingConfig struct {
Enabled bool `yaml:"enabled"`
Addr string `yaml:"addr"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a godoc-style comment that includes the expected format for the address (e.g. proto://hostname:port vs hostname:port)?

}

// YAMLRelayMinerPocketNodeConfig is the structure used to unmarshal the pocket
Expand Down Expand Up @@ -83,6 +90,14 @@ type RelayMinerConfig struct {
Pprof *RelayMinerPprofConfig
Servers map[string]*RelayMinerServerConfig
SmtStorePath string
Ping *RelayMinerPingConfig
}

// RelayMinerPingConfig is the structure resulting from parsing the ping
// server configuration.
type RelayMinerPingConfig struct {
eddyzags marked this conversation as resolved.
Show resolved Hide resolved
Enabled bool
Addr string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, copypasta plz.

}

// RelayMinerPocketNodeConfig is the structure resulting from parsing the pocket
Expand Down
9 changes: 9 additions & 0 deletions pkg/relayer/interface.go
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,9 @@ type RelayerProxy interface {
// TODO_TECHDEBT(@red-0ne): This method should be moved out of the RelayerProxy interface
// that should not be responsible for signing relay responses.
SignRelayResponse(relayResponse *servicetypes.RelayResponse, supplierOperatorAddr string) error

// PingAll tests the connectivity between all the managed relay servers and their respective backend URLs.
PingAll(ctx context.Context) []error
}

type RelayerProxyOption func(RelayerProxy)
Expand All @@ -83,8 +86,14 @@ type RelayServer interface {

// Stop terminates the service server and returns an error if it fails.
Stop(ctx context.Context) error

// Ping tests the connection between the relay server and its backend URL.
Ping(ctx context.Context) error
}

// RelayServers aggregates a slice of RelayServer interface.
type RelayServers []RelayServer

// RelayerSessionsManager is responsible for managing the relayer's session lifecycles.
// It handles the creation and retrieval of SMSTs (trees) for a given session, as
// well as the respective and subsequent claim creation and proof submission.
Expand Down
27 changes: 26 additions & 1 deletion pkg/relayer/proxy/proxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -139,14 +139,20 @@
if err := rp.BuildProvidedServices(ctx); err != nil {
return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a reason to remove this line; otherwise, please revert.

// Start the ring cache.
rp.ringCache.Start(ctx)

startGroup, ctx := errgroup.WithContext(ctx)

for _, relayServer := range rp.servers {
server := relayServer // create a new variable scoped to the anonymous function

// Ensure that each backing data node responds to a ping request
// (at least) before continuing operation.
if err := server.Ping(ctx); err != nil {
eddyzags marked this conversation as resolved.
Show resolved Hide resolved
return err
}

startGroup.Go(func() error { return server.Start(ctx) })
}

Expand Down Expand Up @@ -187,3 +193,22 @@

return nil
}

// PingAll tests the connectivity between all the managed relay servers and their respective backend URLs.
func (rp *relayerProxy) PingAll(ctx context.Context) []error {
var errs []error

for _, srv := range rp.servers {
if err := srv.Ping(ctx); err != nil {
rp.logger.Error().Err(err).
Msg("an unexpected error occured while pinging backend URL")

Check warning on line 204 in pkg/relayer/proxy/proxy.go

View workflow job for this annotation

GitHub Actions / Check misspelling

[misspell] reported by reviewdog 🐶 "occured" is a misspelling of "occurred" Raw Output: ./pkg/relayer/proxy/proxy.go:204:29: "occured" is a misspelling of "occurred"
errs = append(errs, err)
}
}

if len(errs) > 0 {
return errs
}

return nil
Comment on lines +199 to +213
Copy link
Contributor

@bryanchriswhite bryanchriswhite Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about something like this using errors.Join and return error instead of []error:

Suggested change
var errs []error
for _, srv := range rp.servers {
if err := srv.Ping(ctx); err != nil {
rp.logger.Error().Err(err).
Msg("an unexpected error occured while pinging backend URL")
errs = append(errs, err)
}
}
if len(errs) > 0 {
return errs
}
return nil
var err error
for _, srv := range rp.servers {
err = errors.Join(srv.Ping(ctx))
}
if err != nil {
rp.logger.Error().Err(err).
Msg("an unexpected error occured while pinging backend URL")
}
return err

Copy link
Contributor

@bryanchriswhite bryanchriswhite Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if we really need to log the error in addition to returning it. Could you please double-check whether and where the returned error is printed?

If, for example, the supplier sees the ping error when starting the relayminer binary, then I don't think we need to log the error.

}
5 changes: 5 additions & 0 deletions pkg/relayer/proxy/proxy_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,11 @@ func TestRelayerProxy_StartAndStop(t *testing.T) {
// Block so relayerProxy has sufficient time to start
time.Sleep(100 * time.Millisecond)

errs := rp.PingAll(ctx)
for _, err := range errs {
require.NoError(t, err)
}
Comment on lines +155 to +158
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this would reduce to:

Suggested change
errs := rp.PingAll(ctx)
for _, err := range errs {
require.NoError(t, err)
}
err := rp.PingAll(ctx)
require.NoError(t, err)


// Test that RelayerProxy is handling requests (ignoring the actual response content)
res, err := http.DefaultClient.Get(fmt.Sprintf("http://%s/", servicesConfigMap[defaultRelayMinerServer].ListenAddress))
require.NoError(t, err)
Expand Down
Loading
Loading