-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up GitHub workflows #591
speed up GitHub workflows #591
Conversation
WalkthroughThe changes in this pull request enhance the linting and testing workflows within the GitHub repository, particularly for Python and Go applications. Key updates include renaming the linting workflow, upgrading the checkout action, adding a setup step for Go, and modifying the test command to include new options. Additionally, the Dockerfile for Python has been updated to improve dependency management and build processes, while adjustments have been made to the instantiation of the Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
.github/workflows/perl-build-docker.yml (3)
3-8
: Workflow trigger optimization looks good, consider expanding file paths.The change to make the workflow trigger more specific is a good optimization. It will reduce unnecessary builds, saving time and resources.
Consider if there are any other files or directories that might affect the Perl Dockerfile build. For example, if there are any shared configuration files or dependencies, you might want to include them in the trigger paths. This ensures the workflow runs when any relevant changes are made.
21-28
: Docker layer caching is a great addition, consider cache management.The addition of Docker layer caching is an excellent optimization that should significantly speed up your builds.
To manage cache growth over time, consider adding a size limit or an expiration policy to your cache action. This can prevent your cache from growing too large and potentially slowing down your workflow. For example, you could add a
max-age
parameter to your cache key:key: ${{ runner.os }}-buildx-${{ github.sha }}-${{ github.run_id }} restore-keys: | ${{ runner.os }}-buildx-And then use GitHub Actions' built-in cache eviction policy, which removes least recently used caches when the total size limit is reached.
31-33
: Excellent use of Docker layer caching, consider improving readability.The addition of caching to the Docker build command is a great optimization that aligns well with the new caching step. This should significantly improve build times.
To improve readability and maintainability, consider using YAML's folded style for the multi-line command. This can make the command easier to read and modify in the future. Here's an example:
run: > docker build --cache-from=type=local,src=/tmp/.buildx-cache --cache-to=type=local,dest=/tmp/.buildx-cache,mode=max -t bystro-annotator -f Dockerfile.perl .This format is equivalent to the current command but is more readable and easier to modify.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- .github/workflows/perl-build-docker.yml (2 hunks)
- .github/workflows/python-build-docker.yml (2 hunks)
🧰 Additional context used
🔇 Additional comments (4)
.github/workflows/python-build-docker.yml (4)
3-8
: Excellent optimization of workflow triggers!The changes to the workflow trigger are well-thought-out and align perfectly with the PR objective of speeding up Docker builds. By specifying exact paths that should trigger the workflow, you've effectively reduced unnecessary CI/CD runs. This optimization will save resources and speed up the overall development process.
The specified paths (
Dockerfile.python
,python/**
, andgo/**
) seem appropriate for this workflow, covering all relevant files that could affect the Python Docker build.
21-28
: Great addition of Docker layer caching!The introduction of Docker layer caching is an excellent enhancement that directly addresses the PR's objective of speeding up Docker builds. Here's why this change is beneficial:
- It uses the
actions/cache@v3
action, which is the current best practice for caching in GitHub Actions.- The cache key strategy (
${{ runner.os }}-buildx-${{ github.sha }}
) allows for per-commit caching while also providing a fallback to previous caches, optimizing both specificity and hit rate.- The cache path (
/tmp/.buildx-cache
) is a standard location for buildx caches, ensuring compatibility with other buildx operations.This change will significantly reduce build times for subsequent runs, especially when changes are minor.
31-33
: Effective implementation of Docker build caching!The updates to the Docker build command are well-implemented and crucial for leveraging the newly added caching mechanism. Here's a breakdown of the improvements:
- The
--cache-from
option correctly specifies the source of the cache, allowing Docker to use previously cached layers.- The
--cache-to
option withmode=max
ensures that all layers are cached, maximizing the effectiveness of the cache for future builds.- The multi-line formatting improves readability, making the command easier to understand and maintain.
These changes, in conjunction with the caching step, will significantly reduce build times, especially for builds with minimal changes. This directly contributes to the PR's goal of speeding up Docker builds.
Line range hint
1-33
: Overall, excellent improvements to speed up Docker builds!The changes made to this workflow file are well-implemented and directly address the PR's objective of speeding up Docker builds. The combination of optimized workflow triggers and Docker layer caching will lead to significant performance improvements in the CI/CD pipeline.
To fully verify the impact of these changes, it would be beneficial to compare build times before and after this implementation.
Consider running the following script to analyze the workflow run times:
This script will help quantify the improvement in build times, providing concrete evidence of the optimization's effectiveness.
✅ Verification successful
Please update the verification script to correctly calculate workflow durations:
Verification Successful: Docker build times have improved.
The average build duration decreased from approximately 10 minutes before the changes to 8.53 minutes after implementing Docker layer caching. This confirms that the workflow optimizations effectively speed up Docker builds in the CI/CD pipeline.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Compare workflow run times before and after the changes # Fetch the last 10 workflow runs before this PR gh run list --workflow=".github/workflows/python-build-docker.yml" --limit=10 --json durationMs --jq '.[].durationMs' | awk '{ sum += $1; n++ } END { if (n > 0) print "Average duration before changes: " sum / n / 60000 " minutes" }' # Fetch the workflow runs after this PR (assuming this is the latest commit) gh run list --workflow=".github/workflows/python-build-docker.yml" --limit=5 --json durationMs --jq '.[].durationMs' | awk '{ sum += $1; n++ } END { if (n > 0) print "Average duration after changes: " sum / n / 60000 " minutes" }'Length of output: 935
Script:
#!/bin/bash # Description: Compare workflow run times before and after the changes # Function to calculate average duration in minutes calculate_average_duration() { jq -r ' map( { startedAt: (.startedAt | fromdateiso8601), updatedAt: (.updatedAt | fromdateiso8601) } | (.updatedAt - .startedAt) ) | add / length / 60 ' } # Fetch the last 10 workflow runs before this PR gh run list --workflow=".github/workflows/python-build-docker.yml" --limit=10 --json startedAt,updatedAt | calculate_average_duration | awk '{ print "Average duration before changes: " $1 " minutes" }' # Fetch the workflow runs after this PR (assuming this is the latest commit) gh run list --workflow=".github/workflows/python-build-docker.yml" --limit=5 --json startedAt,updatedAt | calculate_average_duration | awk '{ print "Average duration after changes: " $1 " minutes" }'Length of output: 917
…ar/bystro into feature/improve-docker-build-speed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (7)
.github/workflows/perl-test-install-apt.yml (1)
1-1
: Workflow name change looks good, but consider including 'Test'.The new name "Bystro Annotator Install Script (APT)" is more specific about the workflow's purpose. However, since the workflow still includes testing steps, consider adding 'Test' to the name for completeness.
Suggestion:
-name: Bystro Annotator Install Script (APT) +name: Bystro Annotator Install and Test Script (APT).github/workflows/perl-test-install-rpm.yml (3)
1-1
: Consider refining the workflow name for accuracy.The new name "Bystro Annotator Install Script (RPM)" is more specific, but it doesn't fully capture the testing aspect of the workflow. Consider adjusting it to "Bystro Annotator Install and Test (RPM)" to accurately reflect both the installation and testing steps in the workflow.
3-6
: Approve trigger event changes with a minor suggestion.The updated trigger events are well-aligned with the workflow's purpose and the PR objective. Running on release events instead of pull requests can significantly reduce unnecessary builds, potentially speeding up overall CI/CD processes.
Consider adding a
workflow_call
event to allow this workflow to be reused in other workflows if needed:on: release: types: [created, published] workflow_dispatch: workflow_call:This addition would enhance the workflow's flexibility without affecting its current behavior.
Line range hint
8-124
: Consider further optimizations for faster builds.While the changes to the trigger events will help reduce unnecessary runs, consider the following optimizations to potentially speed up the workflow execution:
Use caching for dependencies:
Add a step to cache the dnf packages to speed up future runs.Parallelize tests:
Theprove
command is already using parallelization with-j$(nproc)
, which is good. Consider if any other steps can be parallelized.Conditional steps:
Some verification steps might be redundant if the installation script is reliable. Consider making some checks conditional or combining them to reduce overall execution time.Here's an example of how you might implement caching:
- name: Cache dnf packages uses: actions/cache@v2 with: path: /var/cache/dnf key: ${{ runner.os }}-dnf-${{ hashFiles('**/install-rpm.sh') }} restore-keys: | ${{ runner.os }}-dnf- - name: Install dependencies run: | sudo dnf makecache sudo dnf install -y tarThese suggestions aim to further align the workflow with the PR objective of speeding up builds.
.github/workflows/python-release.yml (2)
Line range hint
30-81
: Approve linux job changes with a suggestion for bystro-api step.The additions of Golang installation, pip updates, and Python dependency installation are good practices that ensure consistent and up-to-date build environments. The --durations=0 option in pytest is helpful for identifying slow tests.
Consider expanding the bystro-api smoke test to include a basic functionality check, not just --help. This could provide more confidence in the build. For example:
- bystro-api --help + bystro-api --version + bystro-api --test-connection # Replace with an actual test command if available🧰 Tools
🪛 actionlint
58-58: shellcheck reported issue in this script: SC2086:info:17:13: Double quote to prevent globbing and word splitting
(shellcheck)
Line range hint
166-182
: Approve the new release job with a suggestion.The addition of the release job is an excellent automation that aligns well with the new trigger events. It ensures that PyPI releases happen automatically when a new tag is created, which can significantly streamline the release process.
Consider adding a step to create release notes or update a changelog as part of this job. This could be done using a GitHub Action like
release-drafter/release-drafter@v5
. For example:- name: Draft Release Notes uses: release-drafter/release-drafter@v5 env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}This would help maintain clear documentation of changes for each release.
.github/workflows/perl-package-build.yml (1)
3-8
: Optimization approved, consider expanding paths and adding documentation.The addition of the
paths
filter to the workflow trigger is a good optimization. It will help reduce unnecessary builds and save computational resources by only running the workflow when relevant files are modified.However, consider the following suggestions:
Evaluate if there are any other directories that might indirectly affect the Perl package build. For example, if there are shared scripts or configuration files in other locations, you might want to include them in the
paths
list.Add a comment explaining the rationale behind the chosen paths. This will help future maintainers understand why these specific directories were selected.
Consider updating the configuration as follows:
on: pull_request: paths: - "perl/**" - "go/**" - "config/**" # Add any other relevant paths here # - "shared/**" # - "scripts/**" # Add a comment explaining the rationale, for example: # This workflow runs only when files that directly affect the Perl package are modified. # The included paths cover the main Perl code, Go dependencies, and configuration files.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
- .github/workflows/lint.yml (1 hunks)
- .github/workflows/perl-build-docker.yml (2 hunks)
- .github/workflows/perl-package-build.yml (1 hunks)
- .github/workflows/perl-test-install-apt.yml (1 hunks)
- .github/workflows/perl-test-install-rpm.yml (1 hunks)
- .github/workflows/python-build-docker.yml (2 hunks)
- .github/workflows/python-release.yml (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/perl-build-docker.yml
- .github/workflows/python-build-docker.yml
🧰 Additional context used
🔇 Additional comments (4)
.github/workflows/lint.yml (1)
45-45
: Consider the relevance of--durations=0
flag to PR objectiveThe addition of the
--durations=0
flag to pytest is a good practice for obtaining detailed information about test execution times. This can be valuable for identifying performance bottlenecks in the test suite.However, it's worth noting that this change doesn't directly contribute to the stated PR objective of speeding up Docker builds. While it may provide insights that could lead to optimizations in the CI/CD pipeline, the connection to Docker build speed is indirect at best.
Consider the following:
- Does this change align with the immediate goal of speeding up Docker builds?
- If not, should it be part of a separate PR focused on test optimization?
- Is there a specific reason for adding this flag now, such as recent performance issues in the test suite?
To better understand the impact of this change, you could run the following command to check if there are any particularly slow tests:
This will help determine if there are any tests that might be significantly impacting the overall build time.
.github/workflows/perl-test-install-apt.yml (1)
3-6
: Trigger events change improves workflow efficiency.The modification of trigger events from pull requests to release events and manual dispatch is a good improvement. This change ensures that the installation and testing process is executed at more appropriate times (during releases) while still allowing for manual triggering when needed.
This adjustment aligns well with the PR objective of speeding up Docker builds, as it reduces unnecessary executions during the development process.
.github/workflows/python-release.yml (2)
Line range hint
99-149
: Approve macos job changes.The changes in the macos job mirror those in the linux job, which is good for consistency across platforms. The inclusion of both x86_64 and aarch64 architectures ensures compatibility with both Intel and Apple Silicon Macs, which is excellent for broad user support.
🧰 Tools
🪛 actionlint
126-126: shellcheck reported issue in this script: SC2086:info:17:13: Double quote to prevent globbing and word splitting
(shellcheck)
9-11
: Approve trigger event changes with a consideration.The change from push to release events aligns well with the PR objective to speed up Docker builds by reducing unnecessary workflow runs. This is a good optimization.
Consider if you still need to run this workflow on pull requests for testing purposes. If so, you might want to add a conditional job that runs on pull requests but skips the release steps. Here's a script to check if there are other workflows that run on pull requests:
✅ Verification successful
To proceed with the verification, please run the following corrected script to check for workflows that run on pull requests:
Pull request triggers still present in other workflows.
The removal of the
pull_request
trigger from.github/workflows/python-release.yml
does not impact the ability to test changes before merging, as other workflows continue to run on pull requests.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for workflows that run on pull requests rg --type yaml 'on:.*\n.*pull_request:' .github/workflowsLength of output: 251
Script:
#!/bin/bash # Description: Check for workflows that run on pull requests with multiline support rg --multiline --type yaml 'on:.*\n.*pull_request:' .github/workflowsLength of output: 377
…ar/bystro into feature/improve-docker-build-speed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- .github/workflows/lint.yml (1 hunks)
- .github/workflows/perl-package-build.yml (3 hunks)
- python/requirements-dev.txt (1 hunks)
✅ Files skipped from review due to trivial changes (1)
- python/requirements-dev.txt
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/perl-package-build.yml
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml
38-38: shellcheck reported issue in this script: SC2046:warning:1:19: Quote this to prevent word splitting
(shellcheck)
42-42: shellcheck reported issue in this script: SC2046:warning:2:36: Quote this to prevent word splitting
(shellcheck)
🔇 Additional comments (6)
.github/workflows/lint.yml (6)
1-1
: Improved workflow nameThe updated workflow name "Python Lint and Test" is more descriptive and accurately reflects the purpose of this workflow. This change enhances clarity for developers and maintainers.
7-7
: Consistent job namingThe job name has been updated to "Python Lint and Test", matching the workflow name. This change maintains consistency and improves clarity within the workflow file.
20-30
: Enhanced caching mechanismThe addition of built-in pip caching and Rust build artifact caching is a good optimization that can significantly speed up the workflow execution.
However, the inclusion of Rust caching in a workflow named "Python Lint and Test" is unexpected. Could you please clarify:
- Are there Rust components in this Python project?
- If so, should the workflow name be updated to reflect both Python and Rust?
- Is the
Cargo.lock
file present in the repository root?You can verify the presence of Rust-related files with:
#!/bin/bash # Check for Rust-related files echo "Cargo.toml files:" fd Cargo.toml echo "Cargo.lock file:" fd Cargo.lock echo "*.rs files:" fd -e rs
31-36
: Streamlined installation processThe installation step has been consolidated and simplified, which is a good improvement. The use of
make develop
suggests a standardized setup process, which is a best practice.To ensure the setup process works as expected, please verify:
- Does the
make develop
command handle all necessary setup steps, including any previously managed by virtual environment activation?- Are all required dependencies correctly installed and configured?
You can check the contents of the Makefile to verify the
develop
target:#!/bin/bash # Display the content of the Makefile, focusing on the develop target awk '/^develop:/, /^[a-zA-Z]/' Makefile
47-48
: Enhanced test executionThe updated test command with
--durations=0
and--n auto
options is a good improvement. This will provide timing information for all tests and enable parallel test execution, potentially speeding up the test suite significantly.To ensure optimal test execution, please verify:
- Is pytest-xdist installed and configured correctly in the project's dependencies?
- Are there any tests that are not suitable for parallel execution (e.g., due to shared resources or side effects)?
You can check the project's dependencies with:
If pytest-xdist is not found, consider adding it to the project's development dependencies.
✅ Verification successful
Test Execution Configuration Verified
The presence of
pytest-xdist
in the project's dependencies confirms that parallel test execution is set up correctly. Additionally, no tests were found that would hinder parallel execution, ensuring that the test suite should run efficiently with the updated command.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for pytest-xdist in the project's dependencies grep -E "pytest-xdist|pytest\[xdist\]" python/requirements*.txt python/pyproject.tomlLength of output: 137
Script:
#!/bin/bash # Search for pytest markers that disable parallel execution rg --files-with-matches --ignore-case "pytest.mark.serial" tests/ # Search for file write operations in test files rg --files-with-matches --ignore-case "\.write\(" tests/Length of output: 218
Script:
#!/bin/bash # Find all Python test files based on common naming conventions fd --extension py --regex 'test_.*\.py$|.*_test\.py$' > test_files.txt # Search within the found test files for pytest markers that disable parallel execution rg --files-with-matches --ignore-case "pytest.mark.serial" $(cat test_files.txt) > serial_tests.txt # Search within the found test files for file write operations rg --files-with-matches --ignore-case "\.write\(" $(cat test_files.txt) > write_operations.txtLength of output: 9832
13-15
: Updated checkout action to v4Updating to the latest version of actions is a good practice. However, it's important to ensure that this update doesn't introduce any breaking changes.
Please verify that the
actions/checkout@v4
is compatible with your workflow and doesn't introduce any unexpected behavior. You can check the changelog for any significant changes:✅ Verification successful
Action Required: Correct and Rerun Release Notes Fetch Script
Please use the following corrected shell script to fetch the latest release notes for
actions/checkout@v4
:
Action Required: Correct the Release Notes Fetch Script Syntax
Please use the following corrected shell script to fetch the latest release notes for
actions/checkout@v4
:
Update to actions/checkout@v4 Verified
The update to
actions/checkout@v4
has been verified. Release notes indicate that the update does not introduce any breaking changes and should work seamlessly with your workflow.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Fetch the latest release notes for actions/checkout gh release view --repo actions/checkout v4.0.0Length of output: 264
Script:
#!/bin/bash # Fetch the latest release notes for actions/checkout@v4 gh release view --repo actions/checkout --tag v4.0.0 --json bodyLength of output: 418
Script:
#!/bin/bash # Fetch the latest release notes for actions/checkout@v4 gh release view v4.0.0 --repo actions/checkout --json bodyLength of output: 680
.github/workflows/lint.yml
Outdated
- name: Linting with ruff | ||
run: | | ||
source ~/.venv/ci-venv/bin/activate | ||
cd python && ruff $(git ls-files '*.py') | ||
|
||
- name: Static analysis with mypy | ||
run: | | ||
source ~/.venv/ci-venv/bin/activate | ||
mypy --version | ||
mypy --config-file python/mypy.ini $(git ls-files '*.py') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved linting and static analysis steps
The adjustments to the linting and static analysis steps are good improvements, particularly the use of git ls-files '*.py'
to ensure only tracked Python files are processed.
However, the static analysis tool has flagged potential issues with word splitting in the commands. To address this, please update the commands as follows:
- cd python && ruff $(git ls-files '*.py')
+ cd python && ruff "$(git ls-files '*.py')"
- mypy --config-file python/mypy.ini $(git ls-files '*.py')
+ mypy --config-file python/mypy.ini "$(git ls-files '*.py')"
These changes will prevent potential issues with filenames containing spaces or special characters.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- name: Linting with ruff | |
run: | | |
source ~/.venv/ci-venv/bin/activate | |
cd python && ruff $(git ls-files '*.py') | |
- name: Static analysis with mypy | |
run: | | |
source ~/.venv/ci-venv/bin/activate | |
mypy --version | |
mypy --config-file python/mypy.ini $(git ls-files '*.py') | |
- name: Linting with ruff | |
run: | | |
cd python && ruff "$(git ls-files '*.py')" | |
- name: Static analysis with mypy | |
run: | | |
mypy --version | |
mypy --config-file python/mypy.ini "$(git ls-files '*.py')" |
🧰 Tools
🪛 actionlint
38-38: shellcheck reported issue in this script: SC2046:warning:1:19: Quote this to prevent word splitting
(shellcheck)
42-42: shellcheck reported issue in this script: SC2046:warning:2:36: Quote this to prevent word splitting
(shellcheck)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (5)
python/python/bystro/ancestry/tests/test_adversarial_autoencoder.py (1)
6-8
: Improved flexibility in data generation. Consider adding type hints.The changes to the
generate_data
function enhance its flexibility by allowing customization of key parameters. This is a good improvement for testing different scenarios.Consider adding type hints to improve code readability and maintainability:
def generate_data(N: int = 10000, p: int = 100, L: int = 3, sigma: float = 1.0) -> tuple:Also, you might want to add a brief docstring explaining the purpose of each parameter and the return values.
.github/workflows/lint.yml (2)
13-15
: Approved: Updated checkout action and added Go setupGreat improvements:
- Updating the checkout action to v4 ensures you're using the latest features and bug fixes.
- Adding Go setup suggests the project now has Go dependencies, which is properly configured with a specific version and caching enabled.
Consider using a version variable for Go to make it easier to update in the future:
env: GO_VERSION: '1.21.3' steps: - name: Set up Go uses: actions/setup-go@v4 with: go-version: ${{ env.GO_VERSION }} cache: trueThis approach allows for easier version management across multiple workflows if needed.
Also applies to: 21-26
44-54
: Approved: Improved setup with virtual environmentGreat improvements in the setup process:
- Adding a virtual environment setup step isolates project dependencies, which is a best practice.
- The installation step is comprehensive, including virtual environment activation, pip upgrade, dependency installation, and development setup.
For consistency and to avoid potential issues with different shell environments, consider using the full path to the virtual environment's activate script:
- name: Install dependencies and build run: | . ~/.venv/ci-venv/bin/activate python -m pip install --upgrade pip pip install -r python/requirements-dev.txt make developThis change uses the POSIX-compliant
.
command instead ofsource
, which may not be available in all shell environments.🧰 Tools
🪛 actionlint
49-49: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
python/python/bystro/prs/prscs_hmcmc.py (2)
138-138
: Consider removing or modifying the debug print statement.The added print statement appears to be for debugging purposes. While it can be helpful during development, it's generally not recommended to leave such statements in production code.
Consider one of the following options:
- Remove the print statement entirely if it's no longer needed.
- Replace it with a proper logging statement, such as:
import logging logging.debug(f"MCMC options: {self.mcmc_options}")- If the information is crucial, consider adding it to the method's docstring or as a comment instead of a print statement.
139-139
: Approve the use of**self.mcmc_options
in MCMC instantiation.The change to use
**self.mcmc_options
when instantiating the MCMC object is a good improvement. It allows for more flexible configuration of MCMC parameters and maintains consistency with the options set in the_fill_mcmc_options
method.Consider updating the docstring of the
fit
method to reflect this change. You could add a note about the MCMC options being configurable through the class initialization. For example:def fit(self, Z, y): """ The fit method. Parameters ---------- Z : np.array-like,(n_samples,p) The covariates of interest y : np.array-like,(n_samples,) The trait, disease, or outcome Returns ------- self Notes ----- MCMC options are configurable through the class initialization and are used in the MCMC run. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (7)
- .github/workflows/lint.yml (1 hunks)
- .github/workflows/perl-build-docker.yml (2 hunks)
- .github/workflows/python-build-docker.yml (2 hunks)
- python/python/bystro/ancestry/tests/test_adversarial_autoencoder.py (2 hunks)
- python/python/bystro/prs/prscs_hmcmc.py (2 hunks)
- python/python/bystro/prs/tests/test_prscs_hmcmc.py (1 hunks)
- python/requirements-dev.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- .github/workflows/perl-build-docker.yml
- .github/workflows/python-build-docker.yml
- python/requirements-dev.txt
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml
49-49: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
56-56: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
56-56: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting
(shellcheck)
61-61: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
61-61: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting
(shellcheck)
67-67: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
🔇 Additional comments (7)
python/python/bystro/prs/tests/test_prscs_hmcmc.py (1)
22-23
: Consider the trade-off between test speed and effectivenessThe changes to
test_prscs()
significantly reduce the computational complexity:
Data generation (line 22):
- Samples reduced from 100,000 to 100
- Features reduced from 25 to 5
Model instantiation (line 23):
- MCMC options added with very low values (
num_samples: 10, warmup_steps: 1
)While these changes align with the PR objective to speed up Docker builds, they might compromise the test's effectiveness. The drastic reduction in data size and model complexity could make the test less robust in catching potential issues.
To ensure the test remains meaningful, please run the following script to compare the execution time and any differences in test outcomes between the old and new versions:
Based on the results, we can better assess if the speed improvement justifies the potential loss in test robustness. If the time saved is significant and the test outcomes remain consistent, the changes may be acceptable. Otherwise, consider finding a middle ground that balances speed and effectiveness.
python/python/bystro/ancestry/tests/test_adversarial_autoencoder.py (2)
Line range hint
1-34
: Overall improvements in test flexibility and configurationThe changes in this file enhance the flexibility of the test setup, which is beneficial for thorough testing and potentially faster builds. The parameterization of
generate_data
and the updated model initialization intest_adversarial_autoencoder
allow for easier configuration and testing of different scenarios.These changes align well with the PR objective of speeding up Docker builds, as they can contribute to more efficient testing processes. Good job on improving the test suite!
23-23
: Updated model initialization. Verify AdversarialAutoencoder interface.The changes to the
AdversarialAutoencoder
initialization look good. The new parameters allow for more flexible configuration of the model.Please ensure that these changes are consistent with the updated
AdversarialAutoencoder
class interface. Run the following script to verify the class definition:Also, consider adding a comment explaining the significance of the
2
parameter and then_iterations
option..github/workflows/lint.yml (4)
1-1
: Improved workflow and job namingThe updated names for both the workflow and the job provide a clearer description of their purpose. This change enhances the overall clarity of the CI/CD process.
Also applies to: 7-7
36-43
: Approved: pip caching addedExcellent addition of pip caching. This addresses the concern raised in the previous comment about Python-specific caching. The cache key and restore keys are properly set up for the pip cache, which will help optimize Python dependency installation times.
This change resolves the concern raised in the previous comment about the removal of Python-specific caching.
65-69
: Approved: Enhanced test executionExcellent updates to the test command:
- Adding
--durations=0
will show the time taken for all tests, helping identify slow tests for potential optimization.- The
-n auto
option enables automatic parallelization of tests, which can significantly speed up test execution.These changes align well with the PR objective of speeding up builds. Great job on improving the efficiency of the testing process!
🧰 Tools
🪛 actionlint
67-67: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
27-35
: Rust caching added, but Python caching removed?The addition of Rust build artifact caching is a good improvement, suggesting the project now includes Rust components. The cache key and restore keys are properly set up for Rust artifacts.
However, it appears that the previous Python-specific caching (for pip) has been removed. This could potentially impact Python dependency installation times.
Could you confirm if the removal of Python-specific caching was intentional? If not, consider keeping both Rust and Python caching to optimize build times for both languages.
To verify the current caching setup, you can run:
This will help ensure that both Python and Rust dependencies are being cached appropriately.
@@ -52,7 +52,7 @@ def _fill_mcmc_options(self, mcmc_options): | |||
""" | |||
default_options = { | |||
"num_chains": 1, | |||
"num_warmup": 1000, | |||
"warmup_steps": 1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Incomplete renaming of num_warmup
to warmup_steps
.
The search revealed that num_warmup
is still used in the following files:
python/python/bystro/_template_npr.py
:"num_warmup": 500,
python/python/bystro/prs/prscs_hmcmc.py
:num_warmup : int
Please update these instances to warmup_steps
to maintain consistency across the codebase.
🔗 Analysis chain
Approve renaming of num_warmup
to warmup_steps
.
The change from "num_warmup"
to "warmup_steps"
improves clarity in describing the warmup phase of the MCMC process. This aligns well with common terminology in MCMC literature.
To ensure this change doesn't break existing code, please run the following script to check for any remaining uses of num_warmup
:
If the script returns any results, those occurrences should be updated to use warmup_steps
for consistency.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining uses of 'num_warmup' in Python files
# Test: Search for 'num_warmup' in Python files
rg --type python 'num_warmup'
Length of output: 60
Script:
#!/bin/bash
# Description: Check for any remaining uses of 'num_warmup' in Python files
# Test: Search for 'num_warmup' in Python files
rg --type py 'num_warmup'
Length of output: 159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
Dockerfile.python (1)
Line range hint
1-62
: Suggestions for Dockerfile optimizationWhile the Dockerfile is well-structured, consider the following improvements:
- Optimize apt-get commands:
Combine apt-get update and install, and clean up in the same layer to reduce image size.-RUN apt-get update && apt-get install -y --no-install-recommends \ +RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential \ git \ wget curl\ openssl libcurl4-openssl-dev libssl-dev \ tar lz4 pigz tabix unzip \ patch \ awscli \ unzip \ libssl-dev \ - pkg-config \ - && curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y \ - && . $HOME/.cargo/env + pkg-config \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +# Install Rust +RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y \ + && . $HOME/.cargo/env
- Consolidate COPY commands:
Group COPY commands to reduce the number of layers.-COPY ./startup.yml ./startup.yml -COPY ./python ./python -COPY ./config ./config +COPY ./startup.yml ./python ./config ./
- Consider adding a .dockerignore file:
This can prevent unnecessary files from being copied into the image, potentially speeding up the build process and reducing image size.These optimizations can help reduce the image size and potentially speed up the build process further.
.github/workflows/lint.yml (4)
27-53
: LGTM: Improved caching strategyExcellent improvements to the caching strategy:
- Adding caches for Rust Cargo, Go modules, and pip suggests a polyglot project structure.
- Using specific file hashes (e.g., Cargo.lock, go.sum) for cache keys is a good practice for proper cache invalidation.
One minor suggestion:
Consider adding a
~/.cache/pip
to therestore-keys
for the pip cache to potentially reuse a cache from a previous build with a different hash:restore-keys: | ${{ runner.os }}-pip- ${{ runner.os }}-This could speed up builds when dependencies haven't changed.
55-68
: LGTM: Improved dependency managementGood improvements in dependency management:
- Using a virtual environment ensures isolation of project dependencies.
- The
make develop
command suggests a standardized build process, which is excellent for consistency.- Setting Go-related environment variables confirms Go usage in the project.
Consider combining the virtual environment setup and dependency installation steps to reduce the number of steps and potential points of failure:
- name: Set up virtual environment and install dependencies env: GOROOT: /opt/hostedtoolcache/go/1.21.3/x64 GOPATH: ${{ env.GOPATH }} run: | python -m venv ~/.venv/ci-venv source ~/.venv/ci-venv/bin/activate python -m pip install --upgrade pip pip install -r python/requirements-dev.txt make developThis change would streamline the workflow and ensure that the virtual environment is used consistently.
🧰 Tools
🪛 actionlint
63-63: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
81-83
: LGTM: Improved test executionThe updates to the pytest command are excellent improvements:
--durations=0
shows the time taken for all tests, helping identify slow tests.-n auto
enables automatic parallelization, potentially speeding up test execution.Consider adding the
--color=yes
flag to ensure colored output in CI logs, which can improve readability:cd python && pytest --durations=0 -n auto --color=yes
Also, you might want to add the
--capture=no
flag to show print statements and logging output for failed tests, which can be helpful for debugging:cd python && pytest --durations=0 -n auto --color=yes --capture=no
🧰 Tools
🪛 actionlint
81-81: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
1-83
: Overall excellent improvements to the CI/CD workflowThe changes in this file significantly enhance the build process:
- Better naming and organization of workflow steps.
- Introduction of Go support, suggesting a more polyglot project structure.
- Improved caching strategy for multiple languages.
- Enhanced linting and testing with parallelization and duration reporting.
These improvements should lead to faster, more reliable builds and better code quality checks. Great work on optimizing the CI/CD pipeline!
As the project seems to be evolving into a polyglot structure with Python, Go, and possibly Rust, consider the following architectural advice:
- Ensure clear documentation of the project's language requirements and setup process.
- Consider using a tool like
asdf
orrtx
for managing multiple language versions consistently across development and CI environments.- If not already in place, implement integration tests that verify the interaction between components written in different languages.
🧰 Tools
🪛 actionlint
63-63: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
70-70: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
70-70: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting
(shellcheck)
75-75: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
75-75: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting
(shellcheck)
81-81: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- .github/workflows/lint.yml (1 hunks)
- Dockerfile.python (1 hunks)
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml
63-63: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
70-70: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
70-70: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting
(shellcheck)
75-75: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
75-75: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting
(shellcheck)
81-81: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
🔇 Additional comments (5)
Dockerfile.python (1)
60-60
: Improved pytest command for faster and more informative test runsThe changes to the pytest command are beneficial:
--durations=0
: This flag shows the time taken for all tests, helping to identify slow tests that might be bottlenecks in the build process.-n auto
: This flag enables parallel test execution, automatically determining the number of CPUs to use. This can significantly speed up the testing process, especially for larger test suites.These improvements align well with the PR objective of speeding up Docker builds by potentially reducing the time taken for tests to run.
.github/workflows/lint.yml (4)
1-3
: LGTM: Improved workflow nameThe updated workflow name "Python Lint and Test" is more descriptive and accurately reflects the purpose of this workflow.
7-12
: LGTM: Consistent job namingThe job name has been updated to "Python Lint and Test", maintaining consistency with the workflow name.
69-79
:⚠️ Potential issueFix potential word splitting issues in linting and static analysis commands
The linting and static analysis steps are good for ensuring code quality. However, there's a potential issue with word splitting in the commands that use
git ls-files '*.py'
.To address this, please update the commands as follows:
- cd python && ruff $(git ls-files '*.py') + cd python && git ls-files '*.py' | xargs ruff - mypy --config-file python/mypy.ini $(git ls-files '*.py') + git ls-files '*.py' | xargs mypy --config-file python/mypy.iniThese changes use
xargs
to properly handle filenames that might contain spaces or special characters.Also, consider adding the
--color=always
flag to the MyPy command to ensure colored output in CI logs, which can improve readability:git ls-files '*.py' | xargs mypy --config-file python/mypy.ini --color=always
Let's verify if there are any Python files with spaces in their names:
#!/bin/bash # Description: Check for Python files with spaces in their names echo "Python files with spaces in their names:" find . -name "* *.py" -type f🧰 Tools
🪛 actionlint
70-70: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
70-70: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting
(shellcheck)
75-75: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
75-75: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting
(shellcheck)
13-26
: LGTM: Improved setup with Go integrationGreat improvements in the setup:
- Upgrading the checkout action to v4 ensures you're using the latest features and bug fixes.
- Adding Go setup suggests the project now has Go dependencies.
Could you clarify the introduction of Go in the project? This change implies new Go dependencies, but it's not immediately clear from the PR description. Let's verify the Go usage:
✅ Verification successful
LGTM: Go integration verified
The addition of Go setup has been confirmed:
- Multiple Go files and configuration files (
go.mod
,go.sum
) are present, indicating intentional use of Go.- The Python import of
go
refers to an internal function, showing proper integration between Python and Go components.- The README provides clear instructions for installing Go, reinforcing its intended usage.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify Go usage in the project # Test: Check for Go files echo "Checking for Go files:" fd -e go # Test: Check for Go-related configuration files echo "Checking for Go-related configuration files:" fd -e mod -e sum # Test: Check for Go imports in Python files echo "Checking for Go imports in Python files:" rg -t python 'import\s+go'Length of output: 512
Script:
#!/bin/bash # Description: Corrected script to verify Go usage in the project # Test: Check for Go files echo "Checking for Go files:" fd -e go # Test: Check for Go-related configuration files echo "Checking for Go-related configuration files:" fd -e mod -e sum # Test: Check for Go imports in Python files echo "Checking for Go imports in Python files:" rg -t py 'import\s+go'Length of output: 712
Script:
#!/bin/bash # Description: Investigate the usage of 'go' in Python files # Test: List Python files that import 'go' echo "Listing Python files that import 'go':" rg -t py 'import\s+go|from\s+.*\s+import\s+go' # Test: Search for 'go' module or class definitions in Python files echo "Searching for 'go' module or class definitions in Python files:" rg -t py 'def\s+go\b|class\s+Go\b' # Test: Check if 'go' is listed in Python dependency files echo "Checking for 'go' in Python dependency files:" rg 'go' README.md requirements.txt Pipfile setup.pyLength of output: 1283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (5)
.github/workflows/lint.yml (2)
21-26
: Go setup added with cachingThe addition of Go setup with a specific version and caching is a good improvement. This ensures consistency across environments and can significantly speed up builds, aligning with the PR's objective.
Consider using a more flexible version specifier for Go to automatically get minor updates:
go-version: "1.21.x"This will use the latest patch version of Go 1.21, allowing for automatic security updates without breaking changes.
49-52
: Improved test execution with parallelization and duration reportingThe updates to the test command are excellent improvements:
- Activating the virtual environment ensures the correct Python environment is used.
- The
--durations=0
option helps identify slow tests by displaying the time taken for all tests.- The
-n auto
option enables automatic parallelization, which can significantly speed up test execution.These changes align well with the PR's objective of speeding up builds. To further improve the output, consider adding the
-v
flag for more verbose test output:- cd python && pytest --durations=0 -n auto + cd python && pytest -v --durations=0 -n autoThis will provide more detailed information about each test, making it easier to identify and debug any issues that may arise.
🧰 Tools
🪛 actionlint
50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
.github/workflows/perl-package-build.yml (3)
24-31
: LGTM: Effective caching strategy for Perl modulesThe addition of the caching step for Perl modules is an excellent improvement:
- It will speed up subsequent workflow runs by reusing previously installed modules.
- The cache key is well-designed, incorporating both the runner OS and the
cpanfile
hash.- The restore-keys fallback is correctly implemented.
Consider adding a comment explaining the caching strategy for better maintainability:
- name: Cache Perl modules uses: actions/cache@v3 with: path: ~/.perl-cpm # Cache key based on OS and cpanfile hash, with fallback to any previous cache for the OS key: ${{ runner.os }}-perl-${{ hashFiles('perl/cpanfile') }} restore-keys: | ${{ runner.os }}-perl-
Line range hint
33-111
: Consider simplifying Go installationWhile this step wasn't changed in this PR, it's worth noting that the Go installation process is quite complex. Consider simplifying it by using a dedicated GitHub Action for Go installation, such as
actions/setup-go@v4
. This would make the workflow more maintainable and less error-prone.Here's an example of how you could simplify the Go installation:
- name: Install Go uses: actions/setup-go@v4 with: go-version: '1.21.3' # Specify the Go version you needThis approach would handle architecture differences automatically and keep Go up-to-date more easily.
Line range hint
123-137
: LGTM: Unchanged final steps with a suggestionThe final steps of the workflow (checkout, installing Bystro dependencies, testing, and building) remain unchanged and follow good practices:
- Using
actions/checkout@v4
ensures you're using a recent version of the action.- The use of
dzil
(Dist::Zilla) for managing the Perl distribution is a good practice.- The testing and building steps are appropriately placed at the end of the workflow.
While these steps are solid, consider adding a step to upload the built tarball as an artifact. This can be useful for debugging or further processing. Here's an example:
- name: Upload Bystro tarball uses: actions/upload-artifact@v3 with: name: bystro-perl-package path: perl/*.tar.gzThis would make the built package easily accessible from the GitHub Actions interface.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- .github/workflows/lint.yml (1 hunks)
- .github/workflows/perl-build-docker.yml (1 hunks)
- .github/workflows/perl-package-build.yml (3 hunks)
- .github/workflows/perl-test-install-apt.yml (1 hunks)
- .github/workflows/perl-test-install-rpm.yml (1 hunks)
- .github/workflows/python-build-docker.yml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- .github/workflows/perl-build-docker.yml
- .github/workflows/perl-test-install-apt.yml
- .github/workflows/perl-test-install-rpm.yml
- .github/workflows/python-build-docker.yml
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml
32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting
(shellcheck)
44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting
(shellcheck)
50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
🔇 Additional comments (5)
.github/workflows/lint.yml (4)
1-1
: Improved workflow and job namingThe updated names for both the workflow and the job are more descriptive and accurately reflect the purpose of this CI process. This change enhances clarity and maintainability.
Also applies to: 7-7
13-15
: Checkout action updated to v4Updating the checkout action to v4 is a good practice. It ensures you're using the latest features and security improvements. This change may also contribute to faster builds, aligning with the PR's objective.
27-30
: Virtual environment setup addedAdding a virtual environment setup is a good practice for Python projects. The comment suggests this is needed for maturin, which implies the project might be using Rust bindings for Python.
Could you provide more context on the introduction of maturin? This change might impact build times, so it would be helpful to understand its necessity and potential performance implications.
31-37
: Streamlined installation and build processThe updated installation step, which combines dependency installation and build processes, is a good optimization. Activating the virtual environment before running commands ensures the correct Python environment is used. These changes should contribute to faster build times, aligning with the PR's objective.
🧰 Tools
🪛 actionlint
32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
.github/workflows/perl-package-build.yml (1)
1-1
: LGTM: Improved workflow name and trigger conditionsThe changes to the workflow name and trigger conditions are beneficial:
- The new name "Build Bystro Annotator Perl Package" is more specific and descriptive.
- The updated trigger conditions will help reduce unnecessary workflow runs by only triggering on relevant file changes.
These improvements enhance the clarity and efficiency of the CI/CD process.
Also applies to: 3-8
- name: Static analysis with MyPy | ||
run: | | ||
source ~/.venv/ci-venv/bin/activate | ||
mypy --version | ||
mypy --config-file python/mypy.ini $(git ls-files '*.py') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential word splitting issue in MyPy command and version output improvement
The static analysis step with MyPy has been improved by activating the virtual environment and using git ls-files '*.py'
to process only tracked Python files. Printing the MyPy version is a good addition for debugging and reproducibility. However, there's a potential issue with word splitting in the command.
To address this and improve the version output, please update the step as follows:
- name: Static analysis with MyPy
run: |
source ~/.venv/ci-venv/bin/activate
echo "MyPy version: $(mypy --version)"
git ls-files '*.py' | xargs mypy --config-file python/mypy.ini
This change uses xargs
to properly handle filenames that might contain spaces or special characters and improves the version output format.
🧰 Tools
🪛 actionlint
44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting
(shellcheck)
- name: Linting with Ruff | ||
run: | | ||
source ~/.venv/ci-venv/bin/activate | ||
cd python && ruff $(git ls-files '*.py') | ||
- name: Static analysis with mypy | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential word splitting issue in Ruff command
The linting step with Ruff has been improved by activating the virtual environment and using git ls-files '*.py'
to process only tracked Python files. However, there's a potential issue with word splitting in the command.
To address this, please update the command as follows:
- cd python && ruff $(git ls-files '*.py')
+ cd python && git ls-files '*.py' | xargs ruff
This change uses xargs
to properly handle filenames that might contain spaces or special characters, addressing the concern raised in the static analysis hint.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- name: Linting with Ruff | |
run: | | |
source ~/.venv/ci-venv/bin/activate | |
cd python && ruff $(git ls-files '*.py') | |
- name: Static analysis with mypy | |
- name: Linting with Ruff | |
run: | | |
source ~/.venv/ci-venv/bin/activate | |
cd python && git ls-files '*.py' | xargs ruff | |
🧰 Tools
🪛 actionlint
39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting
(shellcheck)
go install github.com/bystrogenomics/[email protected].3 | ||
cpm install -g https://github.com/bystrogenomics/msgpack-perl.git | ||
cpm install -g --no-test MouseX::Getopt | ||
git clone --depth 1 --recurse-submodules https://github.com/salortiz/LMDB_File.git \ | ||
&& cd LMDB_File \ | ||
&& cpanm --quiet . | ||
ALIEN_INSTALL_TYPE=share cpm install -g --test Alien::LMDB | ||
cpm install -g --test LMDB_File | ||
cpm install -g MouseX::Getopt # fails due to differences from expected output; unimportant | ||
cpm install -g --test IO::FDPass | ||
cpm install -g --test Beanstalk::Client | ||
cpm install -g --test Sys::CpuAffinity | ||
cpm install -g --test DBD::MariaDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve dependency updates with a note on MouseX::Getopt
The updates to the dependency installation process are generally good:
- Updating
bystro-vcf
to version 2.2.3 is likely a necessary version bump. - Adding
--test
flags to most module installations is a good practice to ensure they work correctly in the CI environment.
However, there's a concern with the MouseX::Getopt
installation:
- The comment indicates that it "fails due to differences from expected output".
- This failure is marked as "unimportant", but it's generally not a good practice to ignore test failures.
Consider investigating why MouseX::Getopt
is failing its tests and either:
- Fix the underlying issue causing the test failure.
- Pin to a specific version of
MouseX::Getopt
that passes its tests. - If the test failure is truly inconsequential, document why it's safe to ignore.
Example of pinning to a specific version:
cpm install -g MouseX::[email protected] # Replace 0.34 with a version known to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
python/python/bystro/supervised_ppca/tests/test_generative_missing_pt.py (3)
13-13
: LGTM! Consider parameterizing the number of iterations.The update to include
training_options
in the PPCAM instantiation is correct and aligns with the changes in the class's__init__
method.To improve test flexibility and potentially reduce test execution time, consider parameterizing the number of iterations:
N_ITERATIONS = 20 # Define at the top of the file # Then use it in the test ppcam = PPCAM(n_components=n_components, training_options={"n_iterations": N_ITERATIONS})This way, you can easily adjust the number of iterations for all tests if needed.
22-23
: LGTM! Consider refactoring for clarity and completeness.The update to include
training_options
with bothn_iterations
anduse_gpu
parameters is correct. However, the presence of two PPCAM instantiations with different options in the same test function might be confusing.Consider refactoring the test to clearly separate and test both GPU and non-GPU versions:
def test_ppcam_fit(): # ... (previous code) for use_gpu in [True, False]: ppcam = PPCAM( n_components=n_components, training_options={"use_gpu": use_gpu, "n_iterations": N_ITERATIONS} ) ppcam.fit(X) # Test if the model is fitted assert ppcam.W_ is not None assert ppcam.sigma2_ is not None assert ppcam.p is not NoneThis approach tests both GPU and non-GPU versions, ensuring complete coverage of the PPCAM functionality.
32-32
: LGTM! Consider using a fixture to reduce code duplication.The updates to include
training_options
in the PPCAM instantiation for bothtest_ppcam_get_covariance
andtest_ppcam_get_noise
functions are correct and consistent with the changes intest_ppcam_fit
.To reduce code duplication and improve maintainability, consider using a pytest fixture to create the PPCAM instance:
import pytest @pytest.fixture def ppcam_instance(n_components): return PPCAM(n_components=n_components, training_options={"n_iterations": N_ITERATIONS}) def test_ppcam_get_covariance(ppcam_instance): # Use ppcam_instance instead of creating a new one def test_ppcam_get_noise(ppcam_instance): # Use ppcam_instance instead of creating a new oneThis approach centralizes the PPCAM instantiation, making it easier to update all tests if the initialization parameters change in the future.
Also applies to: 45-45
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- .github/workflows/lint.yml (1 hunks)
- Dockerfile.python (1 hunks)
- python/python/bystro/supervised_ppca/tests/test_generative_missing_pt.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Dockerfile.python
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml
32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting
(shellcheck)
44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting
(shellcheck)
50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
🔇 Additional comments (7)
.github/workflows/lint.yml (7)
1-1
: Improved workflow nameThe change from "lint" to "Python Lint and Test" provides a more accurate and descriptive name for the workflow. This improvement enhances clarity and aligns with best practices for CI/CD pipeline naming.
7-7
: Improved job naming and action versioningThe job name update to "Python Lint and Test" maintains consistency with the workflow name, enhancing overall clarity. Additionally, upgrading the checkout action to v4 is a good practice, ensuring you're using the latest features and bug fixes.
Also applies to: 13-14
50-52
: Enhanced test command with duration reportingThe addition of
--durations=0
to the pytest command is a good improvement. This option shows the time taken for all tests, which can be very useful for identifying slow tests and optimizing overall test performance.This change aligns well with the PR objective of speeding up builds. Consider adding a comment explaining the purpose of this flag, such as:
# Report duration of all tests to help identify and optimize slow tests cd python && pytest --durations=0This will help other contributors understand the reasoning behind this addition.
🧰 Tools
🪛 actionlint
50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
1-52
: Overall workflow improvementsThe changes made to this workflow file significantly enhance its functionality and align well with the PR objective of speeding up Docker builds:
- The workflow and job names are now more descriptive.
- The addition of Go setup suggests expanded language support.
- The use of a virtual environment improves dependency isolation.
- Updates to linting, static analysis, and testing steps improve efficiency and provide more useful output.
These changes collectively contribute to a more robust and informative CI process. The suggestions provided in the review comments, particularly regarding word splitting in commands and additional documentation, will further refine the workflow.
Great job on these improvements! They should contribute to faster and more reliable builds.
🧰 Tools
🪛 actionlint
32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting
(shellcheck)
44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting
(shellcheck)
50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
27-30
: Addition of virtual environment setupThe introduction of a virtual environment setup is an excellent practice for Python projects. It ensures isolation and reproducibility of the project's dependencies.
The comment suggests that this is needed for maturin, which typically implies Rust integration with Python. Could you provide more context on the use of maturin in this project? This information would be valuable for understanding the project's architecture and dependencies.
Let's check for Rust-related files and maturin configuration:
✅ Verification successful
Approval of Virtual Environment Setup
The virtual environment setup is correctly implemented to support maturin for Rust integration. The presence of Rust files and maturin configuration in the project verifies this necessity.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for Rust files and maturin configuration echo "Rust files in the project:" fd -e rs echo "Maturin configuration:" fd pyproject.toml -x grep -i maturin {} || echo "No maturin configuration found"Length of output: 346
31-37
: Improved dependency installation processThe updated dependency installation process is well-structured:
- It correctly activates the virtual environment before installing dependencies.
- Using a separate
requirements-dev.txt
file for development dependencies is a good practice.- The addition of the
make develop
command suggests a streamlined setup process.These changes will help ensure consistent environments across development and CI.
Consider adding a comment or documentation about what the
make develop
command does, as it might not be immediately clear to new contributors.Let's check the contents of the Makefile:
✅ Verification successful
Document the
make develop
commandThe
make develop
command performs the following actions:
- install-go: Installs the necessary Go dependencies.
- build-python-dev: Builds the Python components in development mode.
- ray-start-local: Starts Ray locally with specific configurations.
Adding this documentation will help new contributors understand the setup process and the purpose of each step involved in
make develop
.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check the contents of the Makefile echo "Contents of Makefile:" fd Makefile -x cat {} || echo "No Makefile found"Length of output: 1298
🧰 Tools
🪛 actionlint
32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
21-25
: Addition of Go setupThe inclusion of Go setup in the workflow is a good addition, ensuring that the necessary Go environment is available for the project. The specific version (1.21.3) and enabled caching will contribute to consistent and faster builds.
However, could you please clarify the role of Go in this project? This addition suggests that the project now includes Go components or dependencies, which might be worth mentioning in the PR description or documentation.
To understand the Go usage in the project, let's check for Go files:
✅ Verification successful
Go setup verified
The presence of Go files in the project confirms that adding the Go setup to the workflow is necessary and appropriate. No further actions are required.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for Go files in the project echo "Go files in the project:" fd -e goLength of output: 346
Script:
#!/bin/bash # Correctly check for Go files in the project echo "Go files in the project:" fd -e goLength of output: 375
- name: Linting with Ruff | ||
run: | | ||
source ~/.venv/ci-venv/bin/activate | ||
cd python && ruff $(git ls-files '*.py') | ||
- name: Static analysis with mypy | ||
|
||
- name: Static analysis with MyPy | ||
run: | | ||
source ~/.venv/ci-venv/bin/activate | ||
mypy --version | ||
mypy --config-file python/mypy.ini $(git ls-files '*.py') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved linting and static analysis steps
The updates to the linting and static analysis steps are good improvements:
- Activating the virtual environment ensures the correct versions of tools are used.
- Using
git ls-files '*.py'
ensures only tracked Python files are processed.
However, there's a potential issue with word splitting in these commands, as flagged by the static analysis tool.
To address this, please update the commands as follows:
- cd python && ruff $(git ls-files '*.py')
+ cd python && git ls-files '*.py' | xargs ruff
- mypy --config-file python/mypy.ini $(git ls-files '*.py')
+ git ls-files '*.py' | xargs mypy --config-file python/mypy.ini
These changes use xargs
to properly handle filenames that might contain spaces or special characters.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- name: Linting with Ruff | |
run: | | |
source ~/.venv/ci-venv/bin/activate | |
cd python && ruff $(git ls-files '*.py') | |
- name: Static analysis with mypy | |
- name: Static analysis with MyPy | |
run: | | |
source ~/.venv/ci-venv/bin/activate | |
mypy --version | |
mypy --config-file python/mypy.ini $(git ls-files '*.py') | |
- name: Linting with Ruff | |
run: | | |
source ~/.venv/ci-venv/bin/activate | |
cd python && git ls-files '*.py' | xargs ruff | |
- name: Static analysis with MyPy | |
run: | | |
source ~/.venv/ci-venv/bin/activate | |
mypy --version | |
git ls-files '*.py' | xargs mypy --config-file python/mypy.ini |
🧰 Tools
🪛 actionlint
39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting
(shellcheck)
44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location
(shellcheck)
44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting
(shellcheck)
Summary by CodeRabbit
New Features
Improvements