Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr streaming e2e test #8137

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Zarr streaming e2e test #8137

wants to merge 17 commits into from

Conversation

frcroth
Copy link
Member

@frcroth frcroth commented Oct 21, 2024

TODOs:

  • Reset changes to tools/test.sh
  • Zarr 3 TEsts (zarr json and cube)

Issues:

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced error messages for user authentication, organization management, and dataset handling for clearer user feedback.
    • Introduced new end-to-end tests for Zarr data formats, expanding dataset functionality coverage.
  • Bug Fixes

    • Improved error handling messages related to dataset access and user management for better clarity.
  • Chores

    • Updated .gitignore to track the test-dataset.zip file for important test data.
    • Modified test setup to ensure necessary datasets are prepared before running tests, enhancing robustness.
    • Adjusted volume mapping for the e2e-tests service in the configuration.
    • Enhanced CircleCI configuration to ensure the directory for test data is created with appropriate permissions before tests run.

@frcroth frcroth changed the title Add zip test dataset and extract it in e2e test setup Zarr streaming e2e test Oct 21, 2024
@frcroth frcroth requested a review from fm3 October 21, 2024 15:18
@frcroth frcroth marked this pull request as ready for review October 21, 2024 15:24
Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice stuff! I added a couple of questions

frontend/javascripts/test/e2e-setup.ts Outdated Show resolved Hide resolved
frontend/javascripts/test/e2e-setup.ts Outdated Show resolved Hide resolved
@@ -118,6 +118,8 @@ jobs:
- run:
name: Run end-to-end tests
command: |
mkdir -p "binaryData/Connectomics department"
chmod 777 "binaryData/Connectomics department"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below it looks like the new test actually uses Organization_X. Why is it necessary to create the Connectomics department directory here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why, but the volumes for the docker compose are arranged in such a way:

- ./binaryData/Connectomics department:/home/${USER_NAME:-sbt-user}/webknossos/binaryData/Organization_X

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think that is a legacy artifact, from when things were named differently. Could you name it Organization_X there too and see if that breaks anything?

test/e2e/End2EndSpec.scala Outdated Show resolved Hide resolved
test/e2e/End2EndSpec.scala Outdated Show resolved Hide resolved
@frcroth frcroth requested a review from fm3 October 23, 2024 08:29
Copy link

coderabbitai bot commented Nov 4, 2024

Walkthrough

The changes encompass modifications to user-facing error messages and status notifications in the conf/messages file, updates to the docker-compose.yml file for service volume mapping, enhancements to end-to-end tests in datasets.e2e.ts and End2EndSpec.scala, and adjustments in the e2e-setup.ts file for handling volatile values. A new entry was also added to the .gitignore file to track a specific zip file. These updates collectively improve error clarity, testing robustness, and configuration management.

Changes

File Change Summary
conf/messages Updated and added error messages related to user authentication, organization management, dataset handling, and annotations; removed messages for braintracing service.
docker-compose.yml Changed volume mapping for e2e-tests service from Connectomics department to Organization_X.
frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts Added replaceVolatileValues import; modified test cases to use this function; introduced new test cases for Zarr streaming.
frontend/javascripts/test/e2e-setup.ts Added sortingKey to volatileKeys; updated writeTypeCheckingFile function; modified global fetch function for URL handling.
test/dataset/.gitignore Added entry to track test-dataset.zip.
test/e2e/End2EndSpec.scala Added ensureTestDataset method to prepare test datasets; modified test case to invoke this method.

Assessment against linked issues

Objective Addressed Explanation
Add end to end tests for zarr streaming (#7997)
Ensure robust testing for dataset functionalities

🐇 In the meadow, we hop and play,
With messages clear, come what may!
Zarr streaming tests now take their flight,
Ensuring our code is robust and bright.
So let’s celebrate with a joyful cheer,
For changes that bring us all good cheer! 🌼✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
test/e2e/End2EndSpec.scala (2)

52-53: Consider making paths configurable

The hardcoded paths for test dataset and data directory could make the tests brittle. Consider:

  1. Moving these to configuration
  2. Making them relative to the project root
-    val testDatasetPath = "test/dataset/test-dataset.zip"
-    val dataDirectory = new File("binaryData/Organization_X")
+    val testDatasetPath = config.getString("test.dataset.path")
+    val dataDirectory = new File(config.getString("test.data.directory"))

51-71: Consider adding test data cleanup

The method creates test data but doesn't handle cleanup. This could lead to accumulated test data over multiple test runs.

Consider adding a cleanup method and using it in a after block:

private def cleanupTestDataset(): Unit = {
  val dataDirectory = new File("binaryData/Organization_X/test-dataset")
  if (dataDirectory.exists()) {
    dataDirectory.delete()
  }
}

// Use in an after block
override def after = {
  cleanupTestDataset()
  super.after
}
frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts (1)

73-110: Consider adding data content validation

While snapshot testing is good for regression testing, consider adding explicit assertions about the data content to catch potential data corruption issues early.

For example:

 test("Zarr streaming", async (t) => {
   const zattrsResp = await fetch("/data/zarr/Organization_X/test-dataset/segmentation/.zattrs", {
     headers: new Headers(),
   });
   const zattrs = await zattrsResp.text();
+  const zattrsJson = JSON.parse(zattrs);
+  t.truthy(zattrsJson.dimension_separator, "Missing dimension_separator in .zattrs");
   t.snapshot(zattrs);

   const rawDataResponse = await fetch(
     "/data/zarr/Organization_X/test-dataset/segmentation/1/0.1.1.0",
     {
       headers: new Headers(),
     },
   );
   const bytes = await rawDataResponse.arrayBuffer();
+  t.true(bytes.byteLength > 0, "Raw data should not be empty");
   const base64 = btoa(String.fromCharCode(...new Uint8Array(bytes)));
   t.snapshot(base64);
 });
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f937be0 and 459ff77.

⛔ Files ignored due to path filters (2)
  • frontend/javascripts/test/snapshots/public-test/test-bundle/test/backend-snapshot-tests/datasets.e2e.js.snap is excluded by !**/*.snap
  • test/dataset/test-dataset.zip is excluded by !**/*.zip
📒 Files selected for processing (6)
  • conf/messages (0 hunks)
  • docker-compose.yml (1 hunks)
  • frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts (4 hunks)
  • frontend/javascripts/test/e2e-setup.ts (2 hunks)
  • test/dataset/.gitignore (1 hunks)
  • test/e2e/End2EndSpec.scala (4 hunks)
💤 Files with no reviewable changes (1)
  • conf/messages
✅ Files skipped from review due to trivial changes (2)
  • docker-compose.yml
  • test/dataset/.gitignore
🔇 Additional comments (7)
test/e2e/End2EndSpec.scala (3)

3-3: LGTM: Import added for ZipIO utility

The import is correctly added to support the use of ZipIO.unzipToFolder as suggested in previous reviews.


12-13: LGTM: File handling imports added

The imports for File and Paths are appropriate for the file operations being performed.


33-33: LGTM: Test data preparation added

The ensureTestDataset() call is correctly placed to prepare test data before running the tests.

frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts (3)

2-8: LGTM: Import and setup changes look good

The addition of replaceVolatileValues and the dataset check trigger in the setup enhance test stability by handling volatile values and ensuring proper dataset initialization.

Also applies to: 24-24


39-39: LGTM: Consistent handling of volatile values

The systematic application of replaceVolatileValues across all snapshot assertions ensures consistent and reliable test outcomes by normalizing volatile data.

Also applies to: 44-44, 51-51, 59-59


91-110: 🛠️ Refactor suggestion

Add error handling and verify experimental endpoint usage

The test implementation needs error handling, and we should verify the experimental endpoint usage.

  1. Add error handling:
 test("Zarr 3 streaming", async (t) => {
   const zarrJsonResp = await fetch(
     "/data/zarr3_experimental/Organization_X/test-dataset/segmentation/zarr.json",
     {
       headers: new Headers(),
     },
   );
+  t.true(zarrJsonResp.ok, `Failed to fetch zarr.json: ${zarrJsonResp.statusText}`);
   const zarrJson = await zarrJsonResp.text();
   t.snapshot(zarrJson);

   const rawDataResponse = await fetch(
     "/data/zarr3_experimental/Organization_X/test-dataset/segmentation/1/0.1.1.0",
     {
       headers: new Headers(),
     },
   );
+  t.true(rawDataResponse.ok, `Failed to fetch raw data: ${rawDataResponse.statusText}`);
   const bytes = await rawDataResponse.arrayBuffer();
   const base64 = btoa(String.fromCharCode(...new Uint8Array(bytes)));
   t.snapshot(base64);
 });

Let's verify the experimental endpoint usage:

✅ Verification successful

Let me check the implementation details of these files to understand how the experimental endpoint is used.


Experimental endpoint usage is consistent with backend implementation

The experimental endpoint usage in the test is aligned with the backend implementation in DSRemoteTracingstoreClient.scala. The backend has proper handling for both Zarr v2 and experimental Zarr v3 endpoints through the getZarrVersionDependantSubPath helper. The test is correctly exercising this experimental endpoint.

However, the error handling suggestion in the review comment remains valid:

  • Add response status checks using t.true(response.ok) for both fetch calls
  • Include status text in error messages for better debugging
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other occurrences of the experimental endpoint
# to ensure consistent usage across the codebase
rg -l "zarr3_experimental" --type ts --type scala

Length of output: 220


Script:

#!/bin/bash
# Check the implementation in both files
echo "=== Frontend Test Implementation ==="
rg -A 10 "zarr3_experimental" frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts

echo -e "\n=== Scala Implementation ==="
rg -A 10 "zarr3_experimental" webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteTracingstoreClient.scala

Length of output: 2631

frontend/javascripts/test/e2e-setup.ts (1)

134-136: LGTM! Import statement addition improves type safety.

The addition of the type import statement in the generated TypeScript file ensures proper type checking and follows TypeScript best practices.

Comment on lines +62 to +70
if (!dataDirectory.listFiles().exists(_.getName == "test-dataset"))
ZipIO.unzipToFolder(
testDatasetZip,
Paths.get(dataDirectory.toPath.toString, "test-dataset"),
includeHiddenFiles = true,
hiddenFilesWhitelist = List(),
truncateCommonPrefix = true,
excludeFromPrefix = None
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling in unzip operation

The unzip operation could fail silently. Consider:

  1. Adding error handling for the unzip operation
  2. Validating the unzipped content
-    if (!dataDirectory.listFiles().exists(_.getName == "test-dataset"))
-      ZipIO.unzipToFolder(
+    if (!dataDirectory.listFiles().exists(_.getName == "test-dataset")) {
+      try {
+        ZipIO.unzipToFolder(
           testDatasetZip,
           Paths.get(dataDirectory.toPath.toString, "test-dataset"),
           includeHiddenFiles = true,
           hiddenFilesWhitelist = List(),
           truncateCommonPrefix = true,
           excludeFromPrefix = None
-      )
+        )
+        // Validate unzipped content
+        if (!new File(dataDirectory, "test-dataset").exists()) {
+          throw new Exception("Failed to unzip test dataset properly")
+        }
+      } catch {
+        case e: Exception => 
+          logger.error("Failed to prepare test dataset", e)
+          throw e
+      }
+    }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +73 to +89
test("Zarr streaming", async (t) => {
const zattrsResp = await fetch("/data/zarr/Organization_X/test-dataset/segmentation/.zattrs", {
headers: new Headers(),
});
const zattrs = await zattrsResp.text();
t.snapshot(zattrs);

const rawDataResponse = await fetch(
"/data/zarr/Organization_X/test-dataset/segmentation/1/0.1.1.0",
{
headers: new Headers(),
},
);
const bytes = await rawDataResponse.arrayBuffer();
const base64 = btoa(String.fromCharCode(...new Uint8Array(bytes)));
t.snapshot(base64);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for fetch requests

While the test implementation is good, it should handle potential network failures or invalid responses.

Consider adding error handling:

 test("Zarr streaming", async (t) => {
   const zattrsResp = await fetch("/data/zarr/Organization_X/test-dataset/segmentation/.zattrs", {
     headers: new Headers(),
   });
+  t.true(zattrsResp.ok, `Failed to fetch .zattrs: ${zattrsResp.statusText}`);
   const zattrs = await zattrsResp.text();
   t.snapshot(zattrs);

   const rawDataResponse = await fetch(
     "/data/zarr/Organization_X/test-dataset/segmentation/1/0.1.1.0",
     {
       headers: new Headers(),
     },
   );
+  t.true(rawDataResponse.ok, `Failed to fetch raw data: ${rawDataResponse.statusText}`);
   const bytes = await rawDataResponse.arrayBuffer();
   const base64 = btoa(String.fromCharCode(...new Uint8Array(bytes)));
   t.snapshot(base64);
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test("Zarr streaming", async (t) => {
const zattrsResp = await fetch("/data/zarr/Organization_X/test-dataset/segmentation/.zattrs", {
headers: new Headers(),
});
const zattrs = await zattrsResp.text();
t.snapshot(zattrs);
const rawDataResponse = await fetch(
"/data/zarr/Organization_X/test-dataset/segmentation/1/0.1.1.0",
{
headers: new Headers(),
},
);
const bytes = await rawDataResponse.arrayBuffer();
const base64 = btoa(String.fromCharCode(...new Uint8Array(bytes)));
t.snapshot(base64);
});
test("Zarr streaming", async (t) => {
const zattrsResp = await fetch("/data/zarr/Organization_X/test-dataset/segmentation/.zattrs", {
headers: new Headers(),
});
t.true(zattrsResp.ok, `Failed to fetch .zattrs: ${zattrsResp.statusText}`);
const zattrs = await zattrsResp.text();
t.snapshot(zattrs);
const rawDataResponse = await fetch(
"/data/zarr/Organization_X/test-dataset/segmentation/1/0.1.1.0",
{
headers: new Headers(),
},
);
t.true(rawDataResponse.ok, `Failed to fetch raw data: ${rawDataResponse.statusText}`);
const bytes = await rawDataResponse.arrayBuffer();
const base64 = btoa(String.fromCharCode(...new Uint8Array(bytes)));
t.snapshot(base64);
});

@@ -39,6 +39,7 @@ const volatileKeys: Array<string | number | symbol> = [
"lastActivity",
"tracingTime",
"tracingId",
"sortingKey",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Snapshots need to be updated due to "sortingKey" changes

The codebase contains snapshot files that reference "sortingKey", specifically in datasets.e2e.js.md. Adding "sortingKey" to volatileKeys means these snapshots need to be recomputed as they will now handle this property differently during testing.

  • frontend/javascripts/test/snapshots/public-test/test-bundle/test/backend-snapshot-tests/datasets.e2e.js.md contains "sortingKey" references that will be affected
  • Other snapshot files may also need updates when tests are re-run with the new volatileKeys configuration
🔗 Analysis chain

Verify snapshot recomputation after adding "sortingKey".

The addition of "sortingKey" to volatileKeys is appropriate. As per the comment above the array, snapshots need to be recomputed when this array is modified.

Run the following script to check if there are any snapshot files that need updating:


Let me check the content of these snapshot files to see if they contain any instances of "sortingKey" that might be affected.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for snapshot files that might need updating
# Look for snapshot files in the test directory
fd -e snap -e snapshot . frontend/javascripts/test

Length of output: 1460


Script:

#!/bin/bash
# Search for "sortingKey" in snapshot files to understand the impact
rg "sortingKey" frontend/javascripts/test/snapshots/

Length of output: 193

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
test/e2e/End2EndSpec.scala (4)

51-56: Consider making paths configurable

The hardcoded paths "test/dataset/test-dataset.zip" and "binaryData/Organization_X" should ideally be configuration values for better flexibility and maintainability.

+ private val TestDatasetPath = "test/dataset/test-dataset.zip"
+ private val DataDirectoryPath = "binaryData/Organization_X"

  private def ensureTestDataset(): Unit = {
-   val testDatasetPath = "test/dataset/test-dataset.zip"
-   val dataDirectory = new File("binaryData/Organization_X")
+   val testDatasetPath = TestDatasetPath
+   val dataDirectory = new File(DataDirectoryPath)

51-86: Consider breaking down the method into smaller, focused functions

The ensureTestDataset method has multiple responsibilities. Consider breaking it down into smaller, focused functions for better maintainability and testing:

  1. validatePaths
  2. createDirectories
  3. unzipDataset
  4. validateDataset
private def validatePaths(testDatasetZip: File): Unit = {
  if (!testDatasetZip.exists()) {
    throw new Exception("Test dataset zip file does not exist.")
  }
}

private def createDirectories(dataDirectory: File): Unit = {
  if (!dataDirectory.exists()) {
    dataDirectory.mkdirs()
  }
}

private def unzipDataset(testDatasetZip: File, dataDirectory: File): Unit = {
  if (!dataDirectory.listFiles().exists(_.getName == "test-dataset")) {
    ZipIO.unzipToFolder(
      testDatasetZip,
      Paths.get(dataDirectory.toPath.toString, "test-dataset"),
      includeHiddenFiles = true,
      hiddenFilesWhitelist = List(),
      truncateCommonPrefix = true,
      excludeFromPrefix = None
    )
  }
}

private def validateDataset(dataDirectory: File): Unit = {
  // ... validation logic ...
}

private def ensureTestDataset(): Unit = {
  val testDatasetZip = new File(TestDatasetPath)
  val dataDirectory = new File(DataDirectoryPath)
  
  validatePaths(testDatasetZip)
  createDirectories(dataDirectory)
  unzipDataset(testDatasetZip, dataDirectory)
  validateDataset(dataDirectory)
}

73-79: Improve error messages with more context

The error messages could be more descriptive to help with debugging.

-     throw new Exception("Test dataset was not unzipped successfully.")
+     throw new Exception(s"Failed to unzip test dataset to ${dataDirectory.getAbsolutePath}/test-dataset")
-     throw new Exception("Required file does not exist.")
+     throw new Exception(s"Required file ${testFile.getAbsolutePath} does not exist")

62-70: Consider adding logging for better debugging

Add logging statements to track the progress of dataset preparation, especially during CI/CD runs.

     if (!dataDirectory.listFiles().exists(_.getName == "test-dataset"))
+      logger.info(s"Unzipping test dataset to ${dataDirectory.getAbsolutePath}/test-dataset")
       ZipIO.unzipToFolder(
         testDatasetZip,
         Paths.get(dataDirectory.toPath.toString, "test-dataset"),
         includeHiddenFiles = true,
         hiddenFilesWhitelist = List(),
         truncateCommonPrefix = true,
         excludeFromPrefix = None
       )
+      logger.info("Test dataset unzipped successfully")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 459ff77 and 152acca.

📒 Files selected for processing (1)
  • test/e2e/End2EndSpec.scala (4 hunks)
🔇 Additional comments (2)
test/e2e/End2EndSpec.scala (2)

3-3: LGTM: Imports are appropriate

The new imports are well-scoped and necessary for the file operations being performed.

Also applies to: 12-13


33-34: LGTM: Proper test data initialization

The dataset preparation is correctly placed before the test execution, ensuring prerequisites are met.

Comment on lines +80 to +85
val testFileSource = scala.io.Source.fromFile(testFile)
val testFileContent = try testFileSource.mkString
finally testFileSource.close()
if (testFileContent.isEmpty) {
throw new Exception("Required file is empty.")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use try-with-resources for file handling

The file resource handling could be improved using Scala's Using to ensure proper resource cleanup.

-   val testFileSource = scala.io.Source.fromFile(testFile)
-   val testFileContent = try testFileSource.mkString
-   finally testFileSource.close()
-   if (testFileContent.isEmpty) {
-     throw new Exception("Required file is empty.")
-   }
+   scala.util.Using(scala.io.Source.fromFile(testFile)) { source =>
+     val content = source.mkString
+     if (content.isEmpty) {
+       throw new Exception("Required file is empty.")
+     }
+     content
+   }.getOrElse(throw new Exception("Failed to read test file"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val testFileSource = scala.io.Source.fromFile(testFile)
val testFileContent = try testFileSource.mkString
finally testFileSource.close()
if (testFileContent.isEmpty) {
throw new Exception("Required file is empty.")
}
scala.util.Using(scala.io.Source.fromFile(testFile)) { source =>
val content = source.mkString
if (content.isEmpty) {
throw new Exception("Required file is empty.")
}
content
}.getOrElse(throw new Exception("Failed to read test file"))

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
test/e2e/End2EndSpec.scala (1)

51-87: Enhance error messages with more context.

The error messages could be more descriptive to help with debugging test failures.

-      throw new Exception("Test dataset zip file does not exist.")
+      throw new Exception(s"Test dataset zip file does not exist at path: $testDatasetPath")

-      throw new Exception("Test dataset was not unzipped successfully.")
+      throw new Exception(s"Failed to unzip test dataset to: ${dataDirectory.getAbsolutePath}/test-dataset")

-      throw new Exception("Required file does not exist.")
+      throw new Exception(s"Required file does not exist: ${testFile.getAbsolutePath}")

-      throw new Exception("Required file is empty.")
+      throw new Exception(s"Required file is empty: ${testFile.getAbsolutePath}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 152acca and 1d72230.

📒 Files selected for processing (1)
  • test/e2e/End2EndSpec.scala (4 hunks)
🔇 Additional comments (5)
test/e2e/End2EndSpec.scala (5)

3-3: LGTM: Imports are appropriate for the new functionality.

The added imports for ZipIO, File, and Paths are necessary for the dataset management functionality being implemented.

Also applies to: 12-13


33-34: LGTM: Proper test setup sequence.

The ensureTestDataset() call is correctly placed before the HTTP request and webdriver tests, ensuring proper test data initialization.


63-71: Improve error handling in unzip operation.

The unzip operation needs better error handling and validation.

The previous review comment about improving error handling in the unzip operation is still valid. Please implement the suggested changes for proper error handling and validation.


81-86: Use Scala's Using for resource management.

The file reading operation should use Scala's Using for proper resource management.

The previous review comment about using try-with-resources is still valid. Please implement the suggested changes for better resource management.


51-87: Verify test dataset structure in the codebase.

Let's verify the test dataset structure to ensure it contains the necessary Zarr files for streaming tests.

✅ Verification successful

Let me gather more information about the test dataset structure by examining the test files and related configurations.

This will help us understand the expected structure and validate if Zarr-related test configurations are properly set up.


Based on the gathered information, I can now provide the final response:

Test dataset structure verification is appropriate for its purpose

The test dataset setup in End2EndSpec.scala is correctly structured for its intended use. While it doesn't explicitly contain Zarr files, this is expected as:

  • The file datasource-properties.json is the primary requirement for the test setup
  • Zarr-specific tests are separately handled in Zarr3TestSuite.scala with mock data
  • The codebase shows extensive Zarr support with proper implementation files, but streaming tests don't necessarily require actual Zarr files in the test dataset
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify test dataset structure and content

# Check if the test dataset zip exists and examine its structure
if [ -f "test/dataset/test-dataset.zip" ]; then
    echo "Test dataset zip contents:"
    unzip -l test/dataset/test-dataset.zip
    
    # Look for Zarr-related files
    echo -e "\nSearching for Zarr files in the repository:"
    rg -l "\.zarr" || echo "No .zarr files found"
    rg -l "zarr.json" || echo "No zarr.json files found"
fi

Length of output: 4562


Script:

#!/bin/bash
# Check test files and configurations related to the test dataset

# Look for test dataset references and configurations
echo "Checking test dataset references:"
rg -l "test-dataset" test/

# Check content of test configuration files
echo -e "\nChecking Zarr3TestSuite.scala:"
cat test/backend/Zarr3TestSuite.scala

# Look for dataset properties file references
echo -e "\nChecking for dataset properties references:"
rg "datasource-properties.json" test/

Length of output: 2508

if (!dataDirectory.exists()) {
dataDirectory.mkdirs()
}
"chmod 777 binaryData/Organization_X".run()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace shell command with Java API for permission management.

Using shell commands for permission management is platform-dependent and potentially unsafe. Consider using Java's NIO API instead.

-    "chmod 777 binaryData/Organization_X".run()
+    dataDirectory.setReadable(true, false)
+    dataDirectory.setWritable(true, false)
+    dataDirectory.setExecutable(true, false)

Committable suggestion skipped: line range outside the PR's diff.

@frcroth frcroth requested review from fm3 and removed request for fm3 November 4, 2024 10:38
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
test/e2e/End2EndSpec.scala (1)

51-70: Consider enhancing error handling and logging

While the implementation is generally good, consider adding more detailed error messages and logging:

  1. Add logging for directory creation and zip extraction steps
  2. Include file paths in error messages for better debugging
 private def ensureTestDataset(): Unit = {
   val testDatasetPath = "test/dataset/test-dataset.zip"
   val dataDirectory = new File("binaryData/Organization_X")
   if (!dataDirectory.exists()) {
+    logger.info(s"Creating directory: ${dataDirectory.getAbsolutePath}")
     dataDirectory.mkdirs()
   }
   val testDatasetZip = new File(testDatasetPath)
   if (!testDatasetZip.exists()) {
-    throw new Exception("Test dataset zip file does not exist.")
+    throw new Exception(s"Test dataset zip file does not exist at: ${testDatasetZip.getAbsolutePath}")
   }
   // Skip unzipping if the test dataset is already present
-  if (!dataDirectory.listFiles().exists(_.getName == "test-dataset"))
+  if (!dataDirectory.listFiles().exists(_.getName == "test-dataset")) {
+    logger.info(s"Unzipping test dataset from ${testDatasetZip.getAbsolutePath}")
     ZipIO.unzipToFolder(
       testDatasetZip,
       Paths.get(dataDirectory.toPath.toString, "test-dataset"),
       includeHiddenFiles = true,
       hiddenFilesWhitelist = List(),
       truncateCommonPrefix = true,
       excludeFromPrefix = None
     )
+  }
.circleci/config.yml (1)

121-121: Consider using more restrictive permissions.

While 777 permissions work in CI, they are overly permissive. Consider using more restrictive permissions (e.g., 755) if possible, as this follows security best practices.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1d72230 and 595701e.

📒 Files selected for processing (2)
  • .circleci/config.yml (1 hunks)
  • test/e2e/End2EndSpec.scala (4 hunks)
🔇 Additional comments (3)
test/e2e/End2EndSpec.scala (2)

3-3: LGTM: Import statements are appropriate

The added imports are necessary for file operations and ZIP handling in the new dataset preparation functionality.

Also applies to: 12-13


33-34: LGTM: Proper test data initialization

The dataset preparation is correctly placed before the HTTP request and webdriver tests, ensuring test data availability.

.circleci/config.yml (1)

121-121: Verify the directory naming consistency.

Based on the previous review discussion, confirm if using Organization_X here aligns with the rest of the codebase, as there were mentions of legacy naming patterns (Connectomics).

✅ Verification successful

Directory naming Organization_X is consistent with the codebase

The naming pattern Organization_X is consistently used across the configuration files, specifically in:

  • .circleci/config.yml for directory creation
  • docker-compose.yml for volume mounting

There are no references to legacy naming patterns like "Connectomics", and the same directory structure is maintained throughout the configuration files.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining references to Connectomics in the codebase
# to ensure naming consistency

echo "Searching for 'Connectomics' references in docker-compose.yml and e2e test files..."
rg -i "connectomics" docker-compose.yml "test/**/*.{ts,scala}"

Length of output: 306


Script:

#!/bin/bash
# Let's do a broader search for both Organization_X and any potential legacy naming patterns
echo "Searching for Organization_X references..."
rg -i "organization_x" --type-add 'config:*.{yml,yaml,json}' --type config

echo -e "\nSearching for directory structure..."
fd -t d -d 3 "Organization_X\|organization_x\|connectomics" -x ls -ld {}

echo -e "\nChecking CircleCI config for similar directory creation patterns..."
rg "mkdir.*binary" .circleci/config.yml

Length of output: 735

Comment on lines +72 to +85
// Test if the dataset was unzipped successfully
if (!dataDirectory.listFiles().exists(_.getName == "test-dataset")) {
throw new Exception("Test dataset was not unzipped successfully.")
}
val testFile = new File(dataDirectory, "test-dataset/datasource-properties.json")
if (!testFile.exists()) {
throw new Exception("Required file does not exist.")
}
val testFileSource = scala.io.Source.fromFile(testFile)
val testFileContent = try testFileSource.mkString
finally testFileSource.close()
if (testFileContent.isEmpty) {
throw new Exception("Required file is empty.")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve resource management and validation robustness

The current implementation has two areas for improvement:

  1. Use Scala's Using for better resource management (as suggested in previous review)
  2. Add more comprehensive validation of the unzipped content
-    val testFileSource = scala.io.Source.fromFile(testFile)
-    val testFileContent = try testFileSource.mkString
-    finally testFileSource.close()
-    if (testFileContent.isEmpty) {
-      throw new Exception("Required file is empty.")
-    }
+    val testFileContent = scala.util.Using(scala.io.Source.fromFile(testFile)) { source =>
+      val content = source.mkString
+      if (content.isEmpty) {
+        throw new Exception(s"Required file is empty: ${testFile.getAbsolutePath}")
+      }
+      // Additional validation could be added here
+      if (!content.contains("\"dataFormat\"")) {
+        throw new Exception(s"Invalid datasource properties file format: ${testFile.getAbsolutePath}")
+      }
+      content
+    }.getOrElse(throw new Exception(s"Failed to read test file: ${testFile.getAbsolutePath}"))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Test if the dataset was unzipped successfully
if (!dataDirectory.listFiles().exists(_.getName == "test-dataset")) {
throw new Exception("Test dataset was not unzipped successfully.")
}
val testFile = new File(dataDirectory, "test-dataset/datasource-properties.json")
if (!testFile.exists()) {
throw new Exception("Required file does not exist.")
}
val testFileSource = scala.io.Source.fromFile(testFile)
val testFileContent = try testFileSource.mkString
finally testFileSource.close()
if (testFileContent.isEmpty) {
throw new Exception("Required file is empty.")
}
// Test if the dataset was unzipped successfully
if (!dataDirectory.listFiles().exists(_.getName == "test-dataset")) {
throw new Exception("Test dataset was not unzipped successfully.")
}
val testFile = new File(dataDirectory, "test-dataset/datasource-properties.json")
if (!testFile.exists()) {
throw new Exception("Required file does not exist.")
}
val testFileContent = scala.util.Using(scala.io.Source.fromFile(testFile)) { source =>
val content = source.mkString
if (content.isEmpty) {
throw new Exception(s"Required file is empty: ${testFile.getAbsolutePath}")
}
// Additional validation could be added here
if (!content.contains("\"dataFormat\"")) {
throw new Exception(s"Invalid datasource properties file format: ${testFile.getAbsolutePath}")
}
content
}.getOrElse(throw new Exception(s"Failed to read test file: ${testFile.getAbsolutePath}"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add end to end tests for zarr streaming
2 participants