Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the Failure to TRY Again in case it fails to update the metadata. #158

Open
wants to merge 27 commits into
base: 25.0.0-confluent
Choose a base branch
from

Conversation

Pankaj260100
Copy link
Member

@Pankaj260100 Pankaj260100 commented Sep 28, 2023

Fixes #XXXX.

Description

  • Currently, If 2 tasks are consuming from the same partitions, try to publish the segment and update the metadata, the second task can fail because the end offset stored in the metadata store doesn't match with the start offset of the second task. We can fix this by retrying instead of failing.

  • AFAIK apart from the above issue, the metadata mismatch can happen in 2 scenarios:

    1. when we update the input topic name for the data source.
    2. when we run 2 replicas of ingestion tasks(1 replica will publish and 1 will fail as the first replica has already updated the metadata).
  • In our case, we don't update the topic name very frequently, and 2 replicas of the ingestion task are not deployed in internal metrics prod yet. So, I am planning to test this change in internal metrics prod.

  • If it works, I will work on a long-term fix. I was able to replicate this issue in internal metrics prod.

  • Slack link for discussion of the same issue in druid slack: https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1695710662075909

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Release note

For tips about how to write a good release note, see Release notes.


Key changed/added classes in this PR
  • MyFoo
  • OurBar
  • TheirBaz

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • a release note entry in the PR description.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@Pankaj260100 Pankaj260100 requested review from a team as code owners September 28, 2023 08:15
@cla-assistant
Copy link

cla-assistant bot commented Sep 28, 2023

CLA assistant check
All committers have signed the CLA.

@xvrl
Copy link
Member

xvrl commented Sep 29, 2023

can we add a test case that reproduces this issue and validates that we would handle a situation like that correctly?

@Pankaj260100 Pankaj260100 changed the title [OBSDATA-2313][druid-internal-metrics-prod]: Changing the Failure to TRY Again in case it fails to update the metadata. Changing the Failure to TRY Again in case it fails to update the metadata. Oct 3, 2023
@Pankaj260100 Pankaj260100 requested a review from xvrl October 11, 2023 16:06
/**
* Returns true if the metadata in this instance is greater than the metadata in "other"
*
* Behavior is undefined if you pass in an instance of a different class from this one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be documented in the interface rather than the implementing class itself, since the caller wouldn't necessarily be aware of it otherwise.

@Pankaj260100 Pankaj260100 changed the base branch from 25.0.0-confluent to 28.0.1-cflt December 20, 2023 09:54
@Pankaj260100 Pankaj260100 changed the base branch from 28.0.1-cflt to 25.0.0-confluent December 20, 2023 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants