Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RelayMiner]: add proxy.Ping(...) capability to test connectivity between relay servers and backend URLs #744

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

eddyzags
Copy link

@eddyzags eddyzags commented Aug 17, 2024

Summary

This PR adds the capability to test the connectivity between the Relay Servers and the Backend URLs in two ways.

  1. Safeguard at Startup:
    For every suppliers.[].service_config.backend_url referenced as input inside the Relay Miner Configuration file, the Relay Proxy will verify wether the network connection between the targeted backend_url and the relayerminer process is functioning properly. If one or more connections aren't possible, the relay miner won't be able to start.

  2. Configurable Ping HTTP server:
    The Relay Miner process will listen for incoming request to synchronously test the connectivity of every referenced suppliers.[].service_config.backend_url. If one or more backend URLs aren't reachable, the incoming request will fail.

Based on the serverConfig.ServerType (Example: HTTP), each Server Type will implement their own logic to implement to test the connectivity.

Issue

Type of change

Select one or more:

  • New feature, functionality or library
  • Bug fix
  • Code health or cleanup
  • Documentation
  • Other (specify)

Testing

Documentation changes (only if making doc changes)

  • make docusaurus_start; only needed if you make doc changes

Local Testing (only if making code changes)

  • Unit Tests: make go_develop_and_test
  • LocalNet E2E Tests: make test_e2e
  • See quickstart guide for instructions

PR Testing (only if making code changes)

  • DevNet E2E Tests: Add the devnet-test-e2e label to the PR.
    • THIS IS VERY EXPENSIVE, so only do it after all the reviews are complete.
    • Optionally run make trigger_ci if you want to re-trigger tests without any code changes
    • If tests fail, try re-running failed tests only using the GitHub UI as shown here

Sanity Checklist

  • I have tested my changes using the available tooling
  • I have commented my code
  • I have performed a self-review of my own code; both comments & source code
  • I create and reference any new tickets, if applicable
  • I have left TODOs throughout the codebase, if applicable

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new configuration section for the ping functionality, allowing users to test backend connectivity within the relay miner's setup.
    • Added methods to handle ping requests, enhancing health check capabilities for relay servers.
  • Bug Fixes

    • Improved error handling during the server startup process if any relay server is unreachable.
  • Tests

    • Added tests for the new ping functionality to ensure operational integrity and reliability of the relay miner.

Copy link

coderabbitai bot commented Aug 17, 2024

Walkthrough

This update introduces a new ping functionality across various configuration files and related code components for the relay miner. By allowing connectivity testing of backend URLs, the changes enhance monitoring and health-check capabilities. The implementation includes new configuration options, modifications to handle ping requests, and improved testing mechanisms to ensure robust operation. Overall, these improvements aim to bolster the reliability and operational robustness of the relay miner service.

Changes

Files Change Summary
docusaurus/docs/.../relayminer_config.md
localnet/poktrolld/config/...
localnet/poktrolld/config/relayminer_config_full_example.yaml
localnet/poktrolld/config/relayminer_config_localnet_vscode.yaml
Added ping configuration sections with enabled and addr settings, enhancing monitoring capabilities.
pkg/relayer/cmd/cmd.go Introduced Ping functionality to establish a TCP listener and serve ping requests, improving monitoring.
pkg/relayer/config/... Added Ping fields to configuration structures, expanding configurability for health checks.
pkg/relayer/interface.go Introduced Ping methods in RelayerProxy and RelayServer interfaces for connectivity testing.
pkg/relayer/proxy/... Added Ping methods to relayerProxy and synchronousRPCServer for backend connectivity checks.
pkg/relayer/relayminer.go Added ServePing method to handle ping requests and respond based on backend connectivity status.
pkg/relayer/relayminer_test.go Implemented TestRelayMiner_Ping to validate the ping feature in a simulated environment.
testutil/testrelayer/proxy.go Introduced NewMockOneTimeRelayerProxyWithPing for enhanced testing of RelayerProxy functionalities.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (2)
pkg/relayer/relayminer.go (1)

138-154: Log error details for better debugging.

In the ServePing function, when an error occurs during the Ping call, consider logging the error details before returning a 500 status. This will help in diagnosing issues more effectively.

			if errs := rel.relayerProxy.Ping(ctx); errs != nil {
+				rel.logger.Error().Err(errs).Msg("failed to ping relay servers")
				w.WriteHeader(http.StatusInternalServerError)
				return
			}
pkg/relayer/proxy/proxy_test.go (1)

155-158: Consider extending test coverage for Ping method.

While the current test ensures no errors are returned, consider adding scenarios to test edge cases, such as:

  • Handling of unreachable URLs.
  • Different HTTP status codes.
  • Timeout scenarios.

This will ensure comprehensive coverage of the Ping method's behavior.

pkg/relayer/proxy/synchronous.go Outdated Show resolved Hide resolved
pkg/relayer/proxy/proxy.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (1)
pkg/relayer/proxy/proxy.go (1)

203-203: Fix typographical error in log message.

The log message contains a typographical error: "occured" should be "occurred".

- Msg("an unexpected error occured while pinging backend URL")
+ Msg("an unexpected error occurred while pinging backend URL")

@Olshansk Olshansk added the community A ticket intended to potentially be picked up by a community member label Aug 19, 2024
@Olshansk Olshansk added this to the Shannon Quality of Life milestone Aug 19, 2024
@Olshansk Olshansk requested a review from okdas August 21, 2024 22:08
Copy link
Member

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyzags For a first contribution, this is super impressive!! 💪

Left some comments but great job otherwise. 🙌

Let's finish it off and lmk if you're interested in contributing more as we get closer to BETA.

pkg/relayer/cmd/cmd.go Show resolved Hide resolved
pkg/relayer/config/types.go Show resolved Hide resolved
pkg/relayer/proxy/proxy.go Show resolved Hide resolved
pkg/relayer/proxy/proxy.go Outdated Show resolved Hide resolved
pkg/relayer/proxy/proxy.go Outdated Show resolved Hide resolved
pkg/relayer/relayminer.go Show resolved Hide resolved
pkg/relayer/relayminer.go Show resolved Hide resolved
testutil/testrelayer/proxy.go Show resolved Hide resolved
pkg/relayer/proxy/synchronous.go Outdated Show resolved Hide resolved
pkg/relayer/relayminer_test.go Show resolved Hide resolved
…ty between relay servers and backend URLs (#1)

* relayer: add RelayServers() method to RelayProxy interface; Add Ping(), ServiceIDs(), Forward() method to RelayServer interface; add RelayServers slice with helper method byServiceID

* relayer: add forward config entry

* relayer: implement ServiceIDs, Forward, and Ping method for synchrounous RPC server

* relayer: add RelayServers implementation for RelayProxy

* relayer: add Ping and Forward options

* relayer: integrate ping option

* relayer: add ServePing and ServeForward method to RelayMiner

* test proxy.Ping() in test + remove forward feature

* add serve ping test

* add doc
@eddyzags
Copy link
Author

Hey @Olshansk 👋🏾 ! Thanks for taking the time to review the code and the feedback.

Let's finish it off

Yes let's tackle that. I will make changes to the latest comments by Monday at the latest.

I am also working on the forward feature (see issue comment). I'll open a PR as soon as I can.

and lmk if you're interested in contributing more as we get closer to BETA.

And yes, I am definitely interested in contributing more 🚀

@Olshansk
Copy link
Member

Olshansk commented Sep 3, 2024

@eddyzags Sounds great! Please re-request a review when you're ready. I will take another look then.

@Olshansk
Copy link
Member

Olshansk commented Sep 5, 2024

@eddyzags Just FYI: #744 (comment)

@Olshansk
Copy link
Member

@eddyzags Another bump

@eddyzags
Copy link
Author

eddyzags commented Sep 25, 2024

Hey @Olshansk

I've just added the last requested modifications:

  • Document Localnet configuration file
  • Add Localnet helpers to Makefile to ping Relayer Miners

As the Ping safeguard implementation won't start the Relay Miner pod if the suppliers aren't reachable, I had to clean the Relay Miner's Helm value files when starting the Localnet environment based on the constraints below:

  1. If ollama.enabled=true & rest.enabled=false in poktroll/localnet_config.yaml
    Use a Relayer Miner configuration that defines an Anvil & Ollama supplier as a value file for the relayerminer Helm Chart.
  2. If ollama.enabled=false & rest.enabled=true in poktroll/localnet_config.yaml
    Use a Relay Miner configuration that defines an Anvil & Rest supplier as a value file for the relayeminer Helm Chart.
  3. if ollama.enabled=true & rest.enabled=true in poktroll/localnet_config.yaml
    Use a Relay Miner configuration that defines an Anvil, Rest, & Ollama supplier as a value file for the Helm Chart

In other words, this new logic forces us to only reference the available suppliers in the Relay Miner's configuration.
After some tests, the Relay Miner pod starts correctly only if I activate the deployment of an Ollama and Rest supplier (ollama.enabled=true & rest.enabled=true in localnet_config.yaml). But for the other scenarios, the relayminer pod remains in CrashLoopBackoff state.
The Relayer Miner pod isn't failing because of the Ping safeguard, but because the Relay Miner isn't managing the associated service endpoint (see error instruction here).

Error message:

failed to start relay miner: service endpoint http://relayminer1:8545 not handled by the relay miner: service endpoint not handled by relayer proxy

The Relay miner configuration seems to expects all the suppliers list (Ollama, Anvi & Rest) even if the supplier's container isn't deployed. If this is true, the Relay Miner won't be able to start with the Ping safeguard if one supplier isn't deployed.
I would appreciate some guidance here 🙏🏾 . Is there a link with the fact that we define the supplier list in the genesis.json?

@Olshansk Olshansk removed the request for review from okdas September 26, 2024 15:05
@Olshansk
Copy link
Member

@eddyzags Appreciate the detailed feedback.

Re review - I will take a look next week but hopefully @bryanchriswhite can help out as well.

failed to start relay miner: service endpoint http://relayminer1:8545 not handled by the relay miner: service endpoint not handled by relayer proxy

Re this error. I'll take a deeper look next week but in the meantime, can you try:

  1. Removing localnet_config.yaml
  2. Merging with main
  3. Re-running localnet

This might be something historical we've already solved

Copy link
Member

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyzags Please merge with main and will re-review then.

go install golang.org/x/tools/cmd/goimports@latest
go install github.com/mikefarah/yq/v4@latest

.PHONY: install_cosmovisor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Contributor

@bryanchriswhite bryanchriswhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool feature @eddyzags 😎 - thanks for the contribution! 🚀 🙌

(And apologies for the review delay 😅)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were any of the makefile changes intentional? Otherwise, let's revert. I split the makefile up in #816.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here as above.

@@ -173,6 +174,21 @@ pprof:

You can learn how to use that endpoint on the [Performance Troubleshooting](../../develop/developer_guide/performance_troubleshooting.md) page.

### `ping`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. 😎

Comment on lines +179 to +182
Configures a `ping` server to test the connectivity of every backend URLs. If
all the backend URLs are reachable, the endpoint returns a 200 HTTP
Code. Otherwise, if one or more backend URLs aren't reachable, the service
returns an 500 HTTP Internal server error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using 204 No Content for the success and 503 Service Unavailable for the error status codes? I think they could be more appropriate than 200 and 500.

Suggested change
Configures a `ping` server to test the connectivity of every backend URLs. If
all the backend URLs are reachable, the endpoint returns a 200 HTTP
Code. Otherwise, if one or more backend URLs aren't reachable, the service
returns an 500 HTTP Internal server error.
Configures a `ping` server to test the connectivity of all backend URLs. If
all the backend URLs are reachable, the endpoint returns a 200 HTTP
Code. If one or more backend URLs aren't reachable, the service
returns an 500 HTTP Internal server error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional diff?

Same goes for all files in the loalnet/kubernetes directory.

One simple way to revert these would be to do something like:

git fetch origin
git checkout origin/main -- localnet/kubernetes
git add localnet/kubernetes
git commit

Comment on lines +103 to +107
filename := "/tmp/relayerminer.ping.sock"

ln, err := net.Listen("unix", filename)
require.NoError(t, err)
defer os.Remove(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat - love it! 🤩


time.Sleep(time.Millisecond)

c := http.Client{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above:

Suggested change
c := http.Client{
httpClient := http.Client{

Comment on lines +115 to +119
Transport: &http.Transport{
DialContext: func(ctx context.Context, network, addr string) (net.Conn, error) {
return net.Dial(ln.Addr().Network(), ln.Addr().String())
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we please extract the *http.Transport value (i.e. keep the & here) to a var?
  2. Are we sure that it's necessary to provide a DialContext func? If so, please add a comment explaining.

Comment on lines +123 to +124
_, err = c.Get("http://unix")
require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#PUC (please update comment). It's not clear to me what's going on here and what this assertion implies.

I was expecting to see an assertion regarding the contents of the socket file or something to ensure that the http request made it through.

Comment on lines +52 to +62
relayerProxyMock := mockrelayer.NewMockRelayerProxy(ctrl)
relayerProxyMock.EXPECT().
Start(gomock.Eq(ctx)).
Times(1)
relayerProxyMock.EXPECT().
Stop(gomock.Eq(ctx)).
Times(1)
relayerProxyMock.EXPECT().
ServedRelays().
Return(returnedRelaysObs).
Times(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can this be simplified to:

Suggested change
relayerProxyMock := mockrelayer.NewMockRelayerProxy(ctrl)
relayerProxyMock.EXPECT().
Start(gomock.Eq(ctx)).
Times(1)
relayerProxyMock.EXPECT().
Stop(gomock.Eq(ctx)).
Times(1)
relayerProxyMock.EXPECT().
ServedRelays().
Return(returnedRelaysObs).
Times(1)
relayerProxyMock := NewMockOneTimeRelayerProxy(ctx, t, returnedRelaysObs relayer.RelaysObservable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community A ticket intended to potentially be picked up by a community member
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

3 participants