Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-2168] Compare with the current time in UTC #4070

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

linweihs
Copy link
Contributor

@linweihs linweihs commented Oct 30, 2024

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

In existing trash cleaner, the comparison between snapshotTime and the current time is NOT done in the same time zone, this leads to longer hard deletion time due to time zone difference.
Screenshot 2024-10-29 at 9 05 43 PM-1

the fix ensure that the comparison between snapshotTime and the current time is done in the same time zone

  • To allow configuration of whether to use UTC or the system's default time zone for comparison, introduce a new configuration property gobblin.trash.snapshot.retention.comparison.useUTC.

Tests

  • My PR adds the following unit tests :

Implement a unit test class [TimeBasedSnapshotCleanupPolicyTest.java](https://github.com/apache/gobblin/compare/master...linweihs:incubator-gobblin:welin/trash?expand=1#diff-59e30e386f99115a2511854af4f01812ea1cf832430c696bb48182af91713f77) ensuring the comparison are asserted expectedly.

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

@linweihs
Copy link
Contributor Author

linweihs commented Nov 9, 2024

updated the implementation according to comment suggested. thanks for the review

Comment on lines 74 to 89
public class MockTimeBasedSnapshotCleanupPolicy implements SnapshotCleanupPolicy {

public static final String SNAPSHOT_RETENTION_POLICY_MINUTES_KEY = "gobblin.trash.snapshot.retention.minutes";
public static final int SNAPSHOT_RETENTION_POLICY_MINUTES_DEFAULT = 1440; // one day
public static final String RETENTION_SNAPSHOT_TIMEZONE = "gobblin.trash.snapshot.retention.timezone";

private final int retentionMinutes;
private final DateTime MOCK_CURRENT_TIME;
private final DateTimeZone retentionSnapshotTimezone;

public MockTimeBasedSnapshotCleanupPolicy(Properties props, DateTime mockCurrentTime) {
this.retentionMinutes = Integer.parseInt(props.getProperty(SNAPSHOT_RETENTION_POLICY_MINUTES_KEY,
Integer.toString(SNAPSHOT_RETENTION_POLICY_MINUTES_DEFAULT)));
this.MOCK_CURRENT_TIME = mockCurrentTime;
this.retentionSnapshotTimezone = props.containsKey(RETENTION_SNAPSHOT_TIMEZONE) ? DateTimeZone.forID(props.getProperty(RETENTION_SNAPSHOT_TIMEZONE)) : DateTimeZone.getDefault();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit, when it comes to creating a mock class its best to try to reuse as much as possible from the original class you want to test, otherwise the code can easily diverge from the TimeBasedSnapshotCleanupPolicy if that class is changed, this test may not capture those changes.

I would recommend this mock class extend TimeBasedSnapshotCleanupPolicy instead of implementing the same interface again, and then you can just add private final DateTime MOCK_CURRENT_TIME; as an additional field and use the super constructor for the other properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is helpful ! i've addressed according to the suggestion, thanks for the reviewing :)

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 41.30%. Comparing base (45ad13e) to head (3ead802).
Report is 22 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4070      +/-   ##
============================================
- Coverage     45.12%   41.30%   -3.82%     
+ Complexity     3199     2233     -966     
============================================
  Files           705      485     -220     
  Lines         26949    20599    -6350     
  Branches       2680     2382     -298     
============================================
- Hits          12160     8508    -3652     
+ Misses        13781    11190    -2591     
+ Partials       1008      901     -107     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good impl fix here... but the test needs reworking

Comment on lines +83 to +87
@Override
public boolean shouldDeleteSnapshot(FileStatus snapshot, Trash trash) {
DateTime snapshotTime = Trash.TRASH_SNAPSHOT_NAME_FORMATTER.parseDateTime(snapshot.getPath().getName());
return snapshotTime.plusMinutes(this.retentionMinutes).isBefore(this.MOCK_CURRENT_TIME.withZoneRetainFields(this.retentionSnapshotTimezone));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as THE ONLY method of the class-under-test, the test itself can in no way meaningfully verify the impl, when this mock actually reimplements that method!

I see your challenge: you'd like to hard-code a known date-time in the test, but that's not possible because the TimeBasedSnapshotCleanupPolicy isn't parameterized by a source/provider of the current date-time.

typically that is the canonical testing pattern I would recommend, but in this particular case, where the crux of the impl comes down to the actual use of DateTime.now(DateTimeZone), I'd suggest instead NOT to hard-code a known time but rather to calculate one based on the true current time when the test runs.

e.g. set retention at 600 mins (10 hours) and render a path that is between 12 and 13 hours ago in UTC. then instantiate an instance using TZ.UTC to verify Assert.assertTrue(x.shouldDeleteSnapshot(...)) and instantiate another policy using TZ.PST to verify assertFalse. (there's nothing special about PST, but just choose a TZ that is at least >= 4 hours later than UTC.)

then perhaps round out the test by then generating a path that is > 10 hours, even in PST, and also another that's < 9 hours ago in UTC

Comment on lines +53 to +54
FileStatus fs1 = new FileStatus(0, true, 0, 0, 0,
new Path(trash.getTrashLocation(), new DateTime(2024, 10, 29, 1, 0, DateTimeZone.UTC).toString(Trash.TRASH_SNAPSHOT_NAME_FORMATTER)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speaking of mocks, it seems reasonable to mock the FileStatus to only return .getPath(). that demonstrates the check is based solely on the path itself and not e.g. on the modtime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants