Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organize and cleanup azure blob outputs for Automated Tests #1578

Merged
merged 21 commits into from
Nov 15, 2024

Conversation

basiliskus
Copy link
Contributor

@basiliskus basiliskus commented Nov 11, 2024

We're currently uploading all the output files for the Automated RS Integration Tests in the root of the Azure container. This PR reorganizes the files to follow a /year/month/day folder structure. Here are the details:

  • Added AzureBlobOrganizer to reorganize blobs by dated folders with the format year/month/day
  • Added cleanup logic to remove blobs older than a set number of days
  • Updated azure blob fetch logic to use new folder system
  • Some refactoring

Issue

#1523
#1526

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

1523 - Partially compliant

Fully compliant requirements:

  • Organize files in Azure blob storage container by date (day, month, year).

Not compliant requirements:

  • Consider the possibility of removing older files but retain the ability to run tests on the same files multiple times.
  • Older files should be accessible for troubleshooting.
  • Implement auto-organization and cleanup possibly using an Azure Function triggered by an Azure Event.
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Exception Handling
The exception handling might be too aggressive, potentially stopping the process for recoverable errors.

Error Logging
The removal of logging could make it difficult to debug issues in production environments.

@basiliskus basiliskus changed the title Story/1523/org azure container [WIP] Organize azure blob outputs with year/month/day folders Nov 11, 2024
@CDCgov CDCgov deleted a comment from github-actions bot Nov 11, 2024
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Score
Best practice
Implement a rollback mechanism for blob operations to ensure data integrity

Implement a rollback mechanism in organizeBlobsByDate to revert changes if the copy
or delete operation fails, ensuring data integrity.

rs-e2e/src/main/java/gov/hhs/cdc/trustedintermediary/rse2e/AzureBlobOrganizer.java [34-35]

-destinationBlob.beginCopy(sourceBlob.getBlobUrl(), null);
-sourceBlob.delete();
+try {
+    destinationBlob.beginCopy(sourceBlob.getBlobUrl(), null);
+    sourceBlob.delete();
+} catch (Exception e) {
+    // Rollback changes
+}
Suggestion importance[1-10]: 8

Why: Implementing a rollback mechanism is crucial for maintaining data integrity, especially in operations that involve modifications like copying and deleting blobs. This suggestion significantly enhances the reliability of the system.

8
Enhancement
Add error handling for blob operations to improve fault tolerance

Add error handling for potential exceptions during the blob copy and delete
operations in the organizeBlobsByDate method to ensure robustness and fault
tolerance.

rs-e2e/src/main/java/gov/hhs/cdc/trustedintermediary/rse2e/AzureBlobOrganizer.java [34-35]

-destinationBlob.beginCopy(sourceBlob.getBlobUrl(), null);
-sourceBlob.delete();
+try {
+    destinationBlob.beginCopy(sourceBlob.getBlobUrl(), null);
+    sourceBlob.delete();
+} catch (Exception e) {
+    // Handle exception
+}
Suggestion importance[1-10]: 7

Why: Adding error handling around blob operations is a valid suggestion to improve the robustness of the system, especially given the potential impact of failed operations on system stability.

7

@basiliskus
Copy link
Contributor Author

/review

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

1523 - Partially compliant

Fully compliant requirements:

  • Organize files in Azure blob storage by year, month, and day.
  • Consider the possibility of removing older files but retain the ability to rerun tests on the same files.

Not compliant requirements:

  • Utilize an Azure Function triggered by an Azure Event for auto-organization and cleanup.
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Error Handling
Review the exception handling in the AzureBlobOrganizer class to ensure that it properly handles and logs all potential errors during blob organization and cleanup.

@basiliskus basiliskus changed the title [WIP] Organize azure blob outputs with year/month/day folders Organize azure blob outputs with year/month/day folders Nov 13, 2024
@basiliskus basiliskus marked this pull request as ready for review November 13, 2024 23:56
@basiliskus basiliskus changed the title Organize azure blob outputs with year/month/day folders Organize and cleanup azure blob outputs for Automated Tests Nov 14, 2024
*/
public class AzureBlobFileFetcher implements FileFetcher {

private static final FileFetcher INSTANCE = new AzureBlobFileFetcher();
// We're using UTC for now, but we plan to change the timezone to be more realistic to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep all times in UTC. The working timezones can change and be unknown to future engineers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual problem is that the current scheduled time (12am UTC, which is 4pm PT) is still within working ours for people living in the west coast. We have a fair amount of certainty most people working in TI and RS will be within PT and ET timezones given these are US government contracts. The idea is to make sure the tasks run after anything is pushed to the repositories. I think you are right about the timezone though, we can keep UTC. What we need to change is the 12am scheduled time


public static String buildDatePathPrefix(LocalDate date) {
return String.format(
"%d/%02d/%02d/", date.getYear(), date.getMonthValue(), date.getDayOfMonth());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put a comment here to explain this formatting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that

.buildClient();

AzureBlobOrganizer blobOrganizer = new AzureBlobOrganizer(blobContainerClient);
blobOrganizer.organizeAndCleanupBlobsByDate(RETENTION_DAYS, TIME_ZONE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it looks like we're now always running cleanup, it might be worth adding a note to the README saying that if folks run this test too close to/right before midnight UTC, they might get some unexpected date issues, and should probably just re-run it a few minutes later if that happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm that's a good point. That reminds me why we want to change from UTC to another timezone. I have another story where I'll change the timezone to avoid this kind of issues. That also reminds me I should I add something in the readme about the retention time for the files

Copy link

@basiliskus basiliskus merged commit 31438cd into main Nov 15, 2024
17 checks passed
@basiliskus basiliskus deleted the story/1523/org-azure-container branch November 15, 2024 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants