Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

259 write tests to validate reliability features eg simulated network failures retries e3 #284

Open
wants to merge 57 commits into
base: main
Choose a base branch
from

Conversation

Blindspot22
Copy link
Collaborator

  1. In CI.yml:

    . Added a separate reliability-tests job to isolate tests for retry logic and acknowledgment.
    . Used the --filter reliability flag in the cargo nextest run command to run only the reliability-related tests.
    . Cached the target/cargo-nextest directory to speed up future runs.

  2. In CD.yml:

    . Added a trigger (on.workflow_run) to ensure deployment only happens after the CI workflow completes successfully.
    . Added an if condition to proceed with deployment only if the CI workflow's conclusion is success.

@Blindspot22 Blindspot22 self-assigned this Dec 6, 2024
@Blindspot22 Blindspot22 marked this pull request as draft December 6, 2024 11:17
test.rs Outdated Show resolved Hide resolved
@Blindspot22 Blindspot22 marked this pull request as draft January 28, 2025 10:22
@Blindspot22 Blindspot22 marked this pull request as ready for review January 28, 2025 11:15
src/tests/test.rs Outdated Show resolved Hide resolved
Comment on lines 1 to 3
use tokio::time::{sleep, Duration};
use tokio::sync::{futures, Mutex};
use uuid::Uuid;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the imports to your test module, I do not seem to understand what you will be testing. Can you help me answer that? I'm not being rhetoric, it's a genuine question.

Copy link
Collaborator Author

@Blindspot22 Blindspot22 Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda testing a lot of things, Some which I am testing now and others for later. Please just check my documentation 🙏 Reliability Features doc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you're testing a lot of things. Please enlighten me on one of them. Let us consider the test with the description "Validate retry mechanism under network interruptions." I have two questions:

  • Which entity in the test is subject to network interruptions that would prevent access to it?
  • Which entity should implement a retry mechanism under these conditions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entity subject to network interruptions is the DIDComm mediator server itself or any intermediary network-dependent service it relies on (e.g a database or external service). In the context of the test, this would simulate scenarios where:

. The client agent (the test environment) attempts to send a message to the mediator server but experiences timeouts or dropped connections.
. The mediator server itself might lose connectivity to an upstream service (e.g, a cloud-based DID registry).

Copy link
Collaborator Author

@Blindspot22 Blindspot22 Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entity that should initiates the communication (retry mechanism) in this case, is the client agent why?. This is because:

The client is responsible for ensuring the message is delivered successfully to the mediator server.
If the server is temporarily unavailable, the client should attempt to resend the message after a delay or a backoff period.

For example, If the client sends a message and receives no acknowledgment from the mediator, the client should retry according to a predefined policy (e.g, exponential backoff).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are listing two options:

  • The client agent (the test environment) attempts to send a message to the mediator server but experiences timeouts or dropped connections.
  • The mediator server itself might lose connectivity to an upstream service (e.g, a cloud-based DID registry).

This ticket is about writing tests to validate reliability features, which basically implies those implemented by the mediator to remain reliable. In the first case, like you said yourself, it is the client implementing reliability features for the situation, not the mediator. And you're not expected to test features you're implementing yourself to simulate the actions of a client.

With that I believe you can discard Option 1 and focus on Option 2, so the actual reliability features assumed implemented by the mediator are tested. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks for this enlightenment 👍
I think I'm good to go now

Copy link
Collaborator Author

@Blindspot22 Blindspot22 Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please hold on with the review, I'm still tryna fix my tests, I should be done by the end of the day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write tests to validate reliability features (e.g., simulated network failures, retries) E3.
3 participants