-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update datumaro #8923
base: develop
Are you sure you want to change the base?
update datumaro #8923
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe pull request involves significant updates to the CVAT dataset management system, primarily focusing on restructuring import statements and class inheritance. The changes center around modifications in the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
cvat/apps/dataset_manager/formats/cvat.py (1)
Line range hint
1642-1642
: Implement__iter__
and__len__
methods forCvatTaskOrJobDataExtractor
.Since
CvatTaskOrJobDataExtractor
now inherits fromdm.SubsetBase
, it should implement the__iter__
and__len__
methods to function properly as a dataset extractor. Currently, these methods are missing, which may lead to errors or unexpected behavior.Apply this diff to add the missing methods:
+ def __iter__(self): + yield from self._items + + def __len__(self): + return len(self._items)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
cvat/apps/dataset_manager/bindings.py
(4 hunks)cvat/apps/dataset_manager/formats/cvat.py
(2 hunks)cvat/apps/dataset_manager/formats/icdar.py
(1 hunks)cvat/apps/dataset_manager/formats/market1501.py
(1 hunks)cvat/apps/dataset_manager/formats/mots.py
(1 hunks)cvat/apps/dataset_manager/formats/velodynepoint.py
(1 hunks)cvat/apps/dataset_manager/formats/yolo.py
(1 hunks)cvat/requirements/base.in
(1 hunks)cvat/requirements/base.txt
(13 hunks)
✅ Files skipped from review due to trivial changes (5)
- cvat/apps/dataset_manager/formats/mots.py
- cvat/apps/dataset_manager/formats/velodynepoint.py
- cvat/apps/dataset_manager/formats/market1501.py
- cvat/apps/dataset_manager/formats/yolo.py
- cvat/apps/dataset_manager/formats/icdar.py
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: build
- GitHub Check: generate_github_pages
- GitHub Check: testing
- GitHub Check: Analyze (python)
- GitHub Check: Linter
- GitHub Check: Analyze (javascript)
🔇 Additional comments (10)
cvat/apps/dataset_manager/formats/cvat.py (5)
25-26
: Verify the correctness of the new import statements.The added import of
DEFAULT_SUBSET_NAME
andDatasetBase
fromdatumaro.components.dataset_base
should be checked to ensure they are necessary and correctly used in the code below.
54-54
: Ensure compatibility with the new base classDatasetBase
.Changing the base class of
CvatExtractor
toDatasetBase
may require implementing or overriding specific methods expected byDatasetBase
. Please verify that all necessary methods are correctly implemented to prevent potential errors.
Line range hint
1653-1655
: Confirm correct initialization ofdm.SubsetBase
.Ensure that the parameters
media_type
andsubset
are correctly passed when initializingdm.SubsetBase
. They should align with the expected values for different dimensions to avoid potential issues.
Line range hint
1751-1751
: EnsureCVATProjectDataExtractor
is compatible withdm.DatasetBase
.The class
CVATProjectDataExtractor
now inherits fromdm.DatasetBase
. Verify that all required methods and properties are implemented appropriately, and any differences from the previous base classdm.Extractor
are properly handled.
Line range hint
1761-1762
: Confirm correct initialization ofdm.DatasetBase
.Ensure that the
media_type
parameter is correctly set when initializingdm.DatasetBase
, matching the dimension specified (dm.Image
for 2D data anddm.PointCloud
for 3D data).cvat/requirements/base.in (1)
15-15
: Verify the updateddatumaro
dependency commit hash.Ensure that the new commit hash
dad90703c8db9e7d7e692250ea8caf8af233c9a0
for thedatumaro
package corresponds to the intended version and does not introduce any breaking changes or compatibility issues.You can run the following script to check the commit message and changes associated with the new commit hash:
✅ Verification successful
The updated
datumaro
dependency commit is safe to useThe commit
dad90703
only removes thetritonclient[all]
dependency due to its unsafe transitive dependency onsetuptools
. This is a clean, well-documented change that doesn't introduce any breaking modifications to the codebase.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Retrieve the commit message and diff for the specified Datumaro commit. # Fetch the commit message git clone https://github.com/cvat-ai/datumaro.git temp_datumaro_repo cd temp_datumaro_repo git show --name-status dad90703c8db9e7d7e692250ea8caf8af233c9a0 # Clean up cd .. rm -rf temp_datumaro_repoLength of output: 592
Script:
#!/bin/bash # Get the detailed changes and check if this is the latest commit git clone https://github.com/cvat-ai/datumaro.git temp_datumaro_repo cd temp_datumaro_repo # Show the actual changes in the file git show dad90703c8db9e7d7e692250ea8caf8af233c9a0 -- requirements-core.txt # Check if this is the latest commit git rev-parse HEAD cd .. rm -rf temp_datumaro_repoLength of output: 935
cvat/apps/dataset_manager/bindings.py (1)
1751-1751
:CVATProjectDataExtractor
changes look good.The class
CVATProjectDataExtractor
correctly inherits fromdm.DatasetBase
, and necessary methods like__iter__
,__len__
, andcategories
are properly implemented.cvat/requirements/base.txt (3)
155-156
: Verify compatibility of new ML dependencies.Multiple ML-related packages have been added (huggingface-hub, nltk, openvino, scikit-learn, tokenizers). These additions suggest significant new ML capabilities in datumaro.
Could you clarify:
- Are these new dependencies required for all CVAT installations or only for specific features?
- What is the impact on system requirements (memory, CPU, GPU)?
Consider:
- Making ML-related dependencies optional through extras_require in setup.py
- Documenting system requirements for ML features
Also applies to: 217-218, 221-224, 363-368, 387-388
155-156
: Review security implications of new dependencies.New packages with network capabilities (huggingface-hub, openvino, pyarrow) have been added. These could potentially impact the security posture of the application.
Consider:
- Documenting network access requirements for new dependencies
- Implementing appropriate network security controls
- Adding these packages to security scanning workflows
Run security check:
Also applies to: 221-224, 263-264
✅ Verification successful
Security check passed for new dependencies
The security scan shows no known vulnerabilities in the specified versions of huggingface-hub (0.27.1), openvino, and pyarrow (18.1.0). However, since these packages have network capabilities, the architectural recommendations about documenting network access requirements and implementing appropriate security controls remain valid.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for known vulnerabilities in new dependencies # Check for security advisories for new packages for pkg in "huggingface-hub" "openvino" "pyarrow"; do echo "Checking $pkg..." gh api graphql -f query=" { securityVulnerabilities(first: 5, ecosystem: PIP, package: \"$pkg\") { nodes { advisory { summary severity publishedAt } vulnerableVersionRange firstPatchedVersion { identifier } } } }" doneLength of output: 2139
Script:
#!/bin/bash # Check pyarrow version in requirements rg "^pyarrow==" cvat/requirements/Length of output: 79
227-228
: Verify intentional version downgrades.The orjson package has been downgraded from 3.10.13 to 3.10.12. While protobuf has been upgraded, we should verify if the orjson downgrade is intentional.
Also applies to: 251-258
cvat/requirements/base.txt
Outdated
# pyjwt | ||
cycler==0.12.1 | ||
# via matplotlib | ||
datumaro @ git+https://github.com/cvat-ai/datumaro.git@08e77b216080555a57e12c01625be8c8201e3131 | ||
datumaro @ git+https://github.com/cvat-ai/datumaro.git@dad90703c8db9e7d7e692250ea8caf8af233c9a0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Consider using a stable release version instead of a development branch commit
The specified commit dad90703
is from a development branch dl/sync-with-upstream-5
and not associated with any release tag. The repository has stable releases available up to v1.9.0. Using a commit from a development branch may lead to stability issues.
🔗 Analysis chain
Verify the referenced datumaro commit.
The datumaro package is being updated to a new version. Let's verify this commit exists and is stable.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the referenced datumaro commit exists and is the latest stable version
# Check if the commit exists in the repository
gh repo clone cvat-ai/datumaro /tmp/datumaro
cd /tmp/datumaro
if git cat-file -e dad90703c8db9e7d7e692250ea8caf8af233c9a0 2>/dev/null; then
echo "Commit exists"
# Show commit message and date to verify it's recent and stable
git log -1 dad90703c8db9e7d7e692250ea8caf8af233c9a0 --pretty=format:"%h %ad %s"
else
echo "Commit not found"
fi
Length of output: 2314
Script:
#!/bin/bash
# Get detailed information about the commit
cd /tmp/datumaro
echo "=== Commit Details ==="
git log -1 dad90703c8db9e7d7e692250ea8caf8af233c9a0 --pretty=format:'%h %ai %s'
echo -e "\n\n=== Latest Tag ==="
git describe --tags --abbrev=0
echo -e "\n\n=== Relationship to Latest Tag ==="
git describe --tags dad90703c8db9e7d7e692250ea8caf8af233c9a0
Length of output: 337
Script:
#!/bin/bash
cd /tmp/datumaro
echo "=== All Tags (sorted) ==="
git tag -l | sort -V
echo -e "\n=== Branch Information ==="
git branch -r --contains dad90703c8db9e7d7e692250ea8caf8af233c9a0
echo -e "\n=== Commit Context ==="
git log --oneline -n 5 dad90703c8db9e7d7e692250ea8caf8af233c9a0
Length of output: 775
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #8923 +/- ##
========================================
Coverage 73.84% 73.85%
========================================
Files 412 412
Lines 44261 44262 +1
Branches 3993 3993
========================================
+ Hits 32686 32688 +2
+ Misses 11575 11574 -1
|
Quality Gate passedIssues Measures |
Motivation and context
depends on cvat-ai/datumaro#77
How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
Dependency Updates
datumaro
package to a new commitfilelock
,huggingface-hub
,openvino
, andscikit-learn
orjson
,protobuf
, and other existing dependenciesCode Structure Changes
ItemTransform
andDatasetItem
imports to different modules