Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CAPPL-222] feat(workflows): adds a secrets syncer for workflow registry #15114

Open
wants to merge 22 commits into
base: workflow-registry-contract-draft
Choose a base branch
from

Conversation

MStreet3
Copy link
Contributor

@MStreet3 MStreet3 commented Nov 5, 2024

CAPPL-222

See the design doc here.

Requires

Supports

@MStreet3 MStreet3 requested review from a team as code owners November 5, 2024 05:09
@MStreet3 MStreet3 requested a review from Atrax1 November 5, 2024 05:09
Copy link
Contributor

github-actions bot commented Nov 5, 2024

I see you updated files related to contracts. Please run pnpm changeset in the contracts directory to add a changeset.

Copy link
Contributor

github-actions bot commented Nov 5, 2024

Static analysis results are available

Hey @MStreet3, you can view Slither reports in the job summary here or download them as artifact here.
Please check them before merging and make sure you have addressed all issues.

@MStreet3 MStreet3 changed the base branch from develop to workflow-registry-contract-draft November 5, 2024 05:11
@MStreet3 MStreet3 requested a review from a team as a code owner November 5, 2024 05:11
Copy link
Contributor

github-actions bot commented Nov 5, 2024

Flaky Test Detector for ./go.mod project has failed ❌

Ran new or updated tests between develop and 0bbc463 (cappl-2222/secrets-syncer).

View Flaky Detector Details | Compare Changes

Failed Tests

Ran 15 tests in total for all affected test packages. Below are the tests identified as flaky, with a pass ratio lower than the 100% threshold:

TestPackage                                                                                     TestName                         PassRatio  RunCount   Skipped
---------                                                                                       ---------                        ---------  ---------  ---------
github.com/smartcontractkit/chainlink/v2/core/services/relay/evm/capabilities/workflows/syncer  Test_SecretsWorker               0%         1          false
github.com/smartcontractkit/chainlink/v2/core/services/workflows/syncer                         Test_StartSyncStop               0%         1          false
github.com/smartcontractkit/chainlink/v2/core/services/workflows                                TestEngineWithHardcodedWorkflow  0%         1          false

Copy link
Contributor

github-actions bot commented Nov 5, 2024

AER Report: CI Core

aer_workflow , commit , Detect Changes , Clean Go Tidy & Generate , Scheduled Run Frequency , Find New Flaky Tests In Root Project / Find Tests To Run , lint , Core Tests (go_core_tests) , Find New Flaky Tests In Deployment Project , Core Tests (go_core_tests_integration) , Core Tests (go_core_ccip_deployment_tests) , Core Tests (go_core_race_tests) , Core Tests (go_core_fuzz) , Find New Flaky Tests In Root Project / Run Tests (github.com/smartcontractkit/chainlink/v2/core/services/relay/evm/capabilities/workflow... , Find New Flaky Tests In Root Project / Run Tests (github.com/smartcontractkit/chainlink/v2/core/services/workflows) , Find New Flaky Tests In Root Project / Run Tests (github.com/smartcontractkit/chainlink/v2/core/services/workflows/secrets) , Find New Flaky Tests In Root Project / Run Tests (github.com/smartcontractkit/chainlink/v2/core/services/workflows/syncer) , Find New Flaky Tests In Root Project / Report , Flakey Test Detection , SonarQube Scan

1. Race(s) detected:[Run tests]

Source of Error:
Run tests	2024-11-05T05:20:19.7996868Z Race(s) detected
Run tests	2024-11-05T05:20:19.8002230Z + exit 1
Run tests	2024-11-05T05:20:19.8018050Z ##[error]Process completed with exit code 1.
**Why**: The Go race detector found data races in the code, indicating that multiple goroutines are accessing shared variables concurrently without proper synchronization.

Suggested fix: Identify the shared variables causing the race condition and protect them using synchronization mechanisms like mutexes or channels.

2. Linting errors:[Golang Lint]

Source of Error:
Golang Lint	2024-11-05T05:19:02.4265595Z ##[error]core/services/relay/evm/capabilities/workflows/syncer/workflow_registry_syncer_test.go:58:11: Error return value of `rand.Read` is not checked (errcheck)
Golang Lint	2024-11-05T05:19:02.4286422Z ##[error]core/services/workflows/engine_test.go:199:6: var-naming: func getExecutionId should be getExecutionID (revive)
Golang Lint	2024-11-05T05:19:02.4287455Z ##[error]core/services/workflows/syncer/workflow_registry.go:37:2: field `mu` is unused (unused)
Golang Lint	2024-11-05T05:19:02.4290030Z ##[error]core/services/workflows/secrets/worker.go:39:2: field `addr` is unused (unused)
Golang Lint	2024-11-05T05:19:02.4291916Z ##[error]core/services/relay/evm/capabilities/workflows/syncer/workflow_registry_syncer_test.go:213:6: func `miner` is unused (unused)
Golang Lint	2024-11-05T05:19:02.4299209Z ##[error]core/services/workflows/secrets/orm_test.go:10:2: import 'github.com/test-go/testify/require' is not allowed from list 'main': Use github.com/stretchr/testify/require instead (depguard)
Golang Lint	2024-11-05T05:19:02.4301197Z ##[error]core/services/workflows/syncer/workflow_registry.go:173:2: S1023: redundant `return` statement (gosimple)
Golang Lint	2024-11-05T05:19:02.4305273Z ##[error]core/services/relay/evm/capabilities/workflows/syncer/workflow_registry_syncer_test.go:122:5: testinggoroutine: call to (*testing.T).Fatal from a non-test goroutine (govet)
Golang Lint	2024-11-05T05:19:02.4309172Z ##[error]core/services/workflows/secrets/handler.go:228:20: fmt.Sprintf can be replaced with faster strconv.FormatUint (perfsprint)
Golang Lint	2024-11-05T05:19:02.4314163Z ##[error]core/services/workflows/secrets/handler_test.go:174:18: fmt.Sprintf can be replaced with faster strconv.FormatUint (perfsprint)
Golang Lint	2024-11-05T05:19:02.4321500Z ##[error]core/services/relay/evm/capabilities/workflows/syncer/workflow_registry_syncer_test.go:117:2: SA2002: the goroutine calls T.Fatal, which must be called in the same goroutine as the test (staticcheck)
Golang Lint	2024-11-05T05:19:02.4325505Z ##[error]core/services/relay/evm/capabilities/workflows/syncer/workflow_registry_syncer_test.go:135:4: go-require: requestForceUpdateSecrets contains assertions that must only be used in the goroutine running the test function (testifylint)
**Why**: The code has multiple linting issues, including unchecked error returns, unused variables, improper naming conventions, and redundant statements.

Suggested fix: Address each linting issue by following the recommendations provided by the linter. For example, check error returns, remove unused variables, and follow proper naming conventions.

3. Test failure due to data race:[Run tests with flakeguard]

Source of Error:
Run tests with flakeguard	2024-11-05T05:16:04.8129098Z WARNING: DATA RACE
Run tests with flakeguard	2024-11-05T05:16:04.8149986Z testing.go:1398: race detected during execution of test
Run tests with flakeguard	2024-11-05T05:16:04.8150530Z --- FAIL: Test_StartSyncStop (60.03s)
Run tests with flakeguard	2024-11-05T05:16:04.8154748Z ##[error]Process completed with exit code 1.
**Why**: The test `Test_StartSyncStop` failed due to a data race condition, indicating concurrent access to shared variables without proper synchronization.

Suggested fix: Use synchronization mechanisms like mutexes to protect shared variables accessed by multiple goroutines in the test.

4. Go module changes detected:[Ensure clean after tidy]

Source of Error:
Ensure clean after tidy	2024-11-05T05:18:52.1328703Z diff --git a/go.mod b/go.mod
Ensure clean after tidy	2024-11-05T05:18:52.1335948Z @@ -281,7 +282,6 @@ require (
Ensure clean after tidy	2024-11-05T05:18:52.1349910Z ##[error]Process completed with exit code 1.
**Why**: The `go.mod` file was modified after running `go mod tidy`, indicating that there are changes that need to be committed.

Suggested fix: Review the changes in go.mod and commit them to the repository to ensure the module dependencies are up to date.

AER Report: Operator UI CI

aer_workflow , commit , Breaking Changes GQL Check

1. Workflow failed to complete successfully:[Breaking Changes GQL Check]

Source of Error:
Run convictional/trigger-workflow-and-wwait@f69fa9eedd3c62a599220f4d5745230e237904be	2024-11-06T15:52:40.7640027Z Checking conclusion [failure]
Run convictional/trigger-workflow-and-wait@f69fa9eedd3c62a599220f4d5745230e237904be	2024-11-06T15:52:40.7641082Z Checking status [completed]
Run convictional/trigger-workflow-and-wait@f69fa9eedd3c62a599220f4d5745230e237904be	2024-11-06T15:52:40.7642268Z Conclusion is not success, it's [failure].
Run convictional/trigger-workflow-and-wait@f69fa9eedd3c62a599220f4d5745230e237904be	2024-11-06T15:52:40.7643283Z Propagating failure to upstream job

Why: The triggered workflow did not complete successfully, resulting in a failure status. This failure was then propagated back to the upstream job, causing the entire run to fail.

Suggested fix: Investigate the logs of the downstream workflow (https://github.com/smartcontractkit/operator-ui/actions/runs/11707133662) to identify the specific cause of the failure and address the underlying issue.

})
}

func (w *worker) getTicker(ctx context.Context) <-chan struct{} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MStreet3 Why is this needed as opposed to something like:

timer := s.newTimer()
defer timer.Close()

in the main loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for letting tests override using a ticker for a channel instead. Removes timing from unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, understood -- I was commenting on why we need to lazy load (as opposed to dependency inject the dependency) and use a signaler


// Sync for a first time outside the loop; this means we'll check the cached state
// or start a remote sync immediately once spinning up syncLoop, as by default a ticker will
// fire for the first time at T+N, where N is the interval.
Copy link
Contributor

@cedric-cordenier cedric-cordenier Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: you could use chainlink-common/pkg/timeutil/ticker here with nextDur returning 0 the first time it's called


// syncForceUpdateSecretsEvents synchronizes the force update secrets events from the contract
// to the local database.
func (w *worker) syncForceUpdateSecretsEvents(
Copy link
Contributor

@cedric-cordenier cedric-cordenier Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MStreet3 YAGNI the concurrency here: it's abstracting things away thus making things a bit harder to understand and I'm not sure what we're gaining here.

This operation should correspond to:

  • updating the relevant secrets entry in the DB
  • updating the local in-mem cache for the secrets

I'd also recommend combining this package with the syncer package for more cohesion. There's no need for a separate worker here: just parse the event and call the handler, potentially mediating by a worker pool if you need that parallelism that gives you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, there's no extra concurrency here. This method just starts the secrets event query worker and passes the events to the event handler.

}

var event T
err = mapstructure.Decode(dm, &event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MStreet3 Hmmmm... As I'm reading this: you're dumping something to a values.Map (WrapMap), then unwrapping it to a map[string]any, then unwrapping it to a struct. Can we somehow just dump it straight to the struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this is mostly copied from log event trigger, will take a look again

event Event,
) error {
// Fetch the secrets url from ORM
url, err := h.orm.GetSecretsURL(ctx, event.GetSecretsURL())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct ? You're fetching the secrets URL from the event, and then are getting it from the DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event only has the hash, so get the plain url from the table, then fetch the contents, then write

@cedric-cordenier
Copy link
Contributor

@MStreet3 Great first stab! Thanks for turning this around quickly.

In terms of high-level feedback, I'd recommend you simplify the implementation: there's a lot here that is increasing implementation cost and abstraction overhead without a corresponding benefit. You shouldn't need much more than:

  • a polling loop -> which returns the events we care about
  • a worker loop -> which receives the events and processes them

This should all be located in a single package; we can say at this stage that they won't need to be reused. (In contrast to the registrysyncer + launcher construct; this is separate because we have two different launchers today -- something which we won't have for the workflow syncer.)

If you want to do the work faster you can have a worker pool of some kind, but note you'll need to be careful that we preserve the order of the events when processing (you could send all messages for a workflow to the same worker as a way of solving this). I'm not sure we need that at this stage though.

@cl-sonarqube-production
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)
7 New Major Issues (required ≤ 5)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarLint SonarLint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants