Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up GitHub workflows #591

Merged

Conversation

akotlar
Copy link
Collaborator

@akotlar akotlar commented Oct 13, 2024

  • speed up docker builds

Summary by CodeRabbit

  • New Features

    • Enhanced workflow configurations for linting and testing processes, now named "Python Lint and Test."
    • Introduced caching for Go modules to improve efficiency.
    • Added a setup step for Go version 1.21.3.
  • Improvements

    • Updated trigger conditions for workflows to respond to release events and specific file changes.
    • Modified pytest commands to provide detailed output on test durations.
    • Renamed workflows for clarity regarding their purpose.
    • Expanded Dockerfile to include additional dependencies for a comprehensive environment setup.

Copy link

coderabbitai bot commented Oct 13, 2024

Caution

Review failed

The head commit changed during the review from 8b34c7d to 1700691.

Walkthrough

The changes in this pull request enhance the linting and testing workflows within the GitHub repository, particularly for Python and Go applications. Key updates include renaming the linting workflow, upgrading the checkout action, adding a setup step for Go, and modifying the test command to include new options. Additionally, the Dockerfile for Python has been updated to improve dependency management and build processes, while adjustments have been made to the instantiation of the PPCAM class in specific test functions.

Changes

File Path Change Summary
.github/workflows/lint.yml - Updated workflow and job names.
- Upgraded checkout action to version 4.
- Added setup step for Go.
- Removed Python caching; added Go caching.
- Modified test command to include --durations=0.
Dockerfile.python - Updated base image to python:3.11.8-bookworm.
- Expanded command for installing dependencies.
- Modified pytest command to include --durations=0.
python/python/bystro/supervised_ppca/tests/test_generative_missing_pt.py - Updated PPCAM class instantiation in tests to include n_iterations in training_options.

Possibly related PRs

  • Fixed various issues #559: This PR involves changes to the PPCAM class instantiation in tests, which may relate to the testing framework updates in the main PR, particularly the modifications to test commands and configurations.
  • update ray, and reduce test runtime #589: This PR updates dependencies like scikit-learn, which could be relevant to the testing and linting processes mentioned in the main PR, especially since both involve enhancements to the testing framework and runtime efficiency.

Suggested reviewers

  • cristinaetrv
  • austinTalbot7241993

Poem

🐇 In the meadow where our code takes flight,
Workflows now shine, oh what a sight!
With each new change, we leap with cheer,
Building and testing, our goals are clear!
Through Python and Go, we craft with grace,
In the garden of code, we find our place! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
.github/workflows/perl-build-docker.yml (3)

3-8: Workflow trigger optimization looks good, consider expanding file paths.

The change to make the workflow trigger more specific is a good optimization. It will reduce unnecessary builds, saving time and resources.

Consider if there are any other files or directories that might affect the Perl Dockerfile build. For example, if there are any shared configuration files or dependencies, you might want to include them in the trigger paths. This ensures the workflow runs when any relevant changes are made.


21-28: Docker layer caching is a great addition, consider cache management.

The addition of Docker layer caching is an excellent optimization that should significantly speed up your builds.

To manage cache growth over time, consider adding a size limit or an expiration policy to your cache action. This can prevent your cache from growing too large and potentially slowing down your workflow. For example, you could add a max-age parameter to your cache key:

key: ${{ runner.os }}-buildx-${{ github.sha }}-${{ github.run_id }}
restore-keys: |
  ${{ runner.os }}-buildx-

And then use GitHub Actions' built-in cache eviction policy, which removes least recently used caches when the total size limit is reached.


31-33: Excellent use of Docker layer caching, consider improving readability.

The addition of caching to the Docker build command is a great optimization that aligns well with the new caching step. This should significantly improve build times.

To improve readability and maintainability, consider using YAML's folded style for the multi-line command. This can make the command easier to read and modify in the future. Here's an example:

run: >
  docker build
  --cache-from=type=local,src=/tmp/.buildx-cache
  --cache-to=type=local,dest=/tmp/.buildx-cache,mode=max
  -t bystro-annotator
  -f Dockerfile.perl
  .

This format is equivalent to the current command but is more readable and easier to modify.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 52779e0 and 15e7bcb.

📒 Files selected for processing (2)
  • .github/workflows/perl-build-docker.yml (2 hunks)
  • .github/workflows/python-build-docker.yml (2 hunks)
🧰 Additional context used
🔇 Additional comments (4)
.github/workflows/python-build-docker.yml (4)

3-8: Excellent optimization of workflow triggers!

The changes to the workflow trigger are well-thought-out and align perfectly with the PR objective of speeding up Docker builds. By specifying exact paths that should trigger the workflow, you've effectively reduced unnecessary CI/CD runs. This optimization will save resources and speed up the overall development process.

The specified paths (Dockerfile.python, python/**, and go/**) seem appropriate for this workflow, covering all relevant files that could affect the Python Docker build.


21-28: Great addition of Docker layer caching!

The introduction of Docker layer caching is an excellent enhancement that directly addresses the PR's objective of speeding up Docker builds. Here's why this change is beneficial:

  1. It uses the actions/cache@v3 action, which is the current best practice for caching in GitHub Actions.
  2. The cache key strategy (${{ runner.os }}-buildx-${{ github.sha }}) allows for per-commit caching while also providing a fallback to previous caches, optimizing both specificity and hit rate.
  3. The cache path (/tmp/.buildx-cache) is a standard location for buildx caches, ensuring compatibility with other buildx operations.

This change will significantly reduce build times for subsequent runs, especially when changes are minor.


31-33: Effective implementation of Docker build caching!

The updates to the Docker build command are well-implemented and crucial for leveraging the newly added caching mechanism. Here's a breakdown of the improvements:

  1. The --cache-from option correctly specifies the source of the cache, allowing Docker to use previously cached layers.
  2. The --cache-to option with mode=max ensures that all layers are cached, maximizing the effectiveness of the cache for future builds.
  3. The multi-line formatting improves readability, making the command easier to understand and maintain.

These changes, in conjunction with the caching step, will significantly reduce build times, especially for builds with minimal changes. This directly contributes to the PR's goal of speeding up Docker builds.


Line range hint 1-33: Overall, excellent improvements to speed up Docker builds!

The changes made to this workflow file are well-implemented and directly address the PR's objective of speeding up Docker builds. The combination of optimized workflow triggers and Docker layer caching will lead to significant performance improvements in the CI/CD pipeline.

To fully verify the impact of these changes, it would be beneficial to compare build times before and after this implementation.

Consider running the following script to analyze the workflow run times:

This script will help quantify the improvement in build times, providing concrete evidence of the optimization's effectiveness.

✅ Verification successful

Please update the verification script to correctly calculate workflow durations:


Verification Successful: Docker build times have improved.

The average build duration decreased from approximately 10 minutes before the changes to 8.53 minutes after implementing Docker layer caching. This confirms that the workflow optimizations effectively speed up Docker builds in the CI/CD pipeline.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Compare workflow run times before and after the changes

# Fetch the last 10 workflow runs before this PR
gh run list --workflow=".github/workflows/python-build-docker.yml" --limit=10 --json durationMs --jq '.[].durationMs' | awk '{ sum += $1; n++ } END { if (n > 0) print "Average duration before changes: " sum / n / 60000 " minutes" }'

# Fetch the workflow runs after this PR (assuming this is the latest commit)
gh run list --workflow=".github/workflows/python-build-docker.yml" --limit=5 --json durationMs --jq '.[].durationMs' | awk '{ sum += $1; n++ } END { if (n > 0) print "Average duration after changes: " sum / n / 60000 " minutes" }'

Length of output: 935


Script:

#!/bin/bash
# Description: Compare workflow run times before and after the changes

# Function to calculate average duration in minutes
calculate_average_duration() {
  jq -r '
    map(
      {
        startedAt: (.startedAt | fromdateiso8601),
        updatedAt: (.updatedAt | fromdateiso8601)
      } | 
      (.updatedAt - .startedAt)
    ) 
    | add / length / 60
  ' 
}

# Fetch the last 10 workflow runs before this PR
gh run list --workflow=".github/workflows/python-build-docker.yml" --limit=10 --json startedAt,updatedAt | calculate_average_duration | awk '{ print "Average duration before changes: " $1 " minutes" }'

# Fetch the workflow runs after this PR (assuming this is the latest commit)
gh run list --workflow=".github/workflows/python-build-docker.yml" --limit=5 --json startedAt,updatedAt | calculate_average_duration | awk '{ print "Average duration after changes: " $1 " minutes" }'

Length of output: 917

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (7)
.github/workflows/perl-test-install-apt.yml (1)

1-1: Workflow name change looks good, but consider including 'Test'.

The new name "Bystro Annotator Install Script (APT)" is more specific about the workflow's purpose. However, since the workflow still includes testing steps, consider adding 'Test' to the name for completeness.

Suggestion:

-name: Bystro Annotator Install Script (APT)
+name: Bystro Annotator Install and Test Script (APT)
.github/workflows/perl-test-install-rpm.yml (3)

1-1: Consider refining the workflow name for accuracy.

The new name "Bystro Annotator Install Script (RPM)" is more specific, but it doesn't fully capture the testing aspect of the workflow. Consider adjusting it to "Bystro Annotator Install and Test (RPM)" to accurately reflect both the installation and testing steps in the workflow.


3-6: Approve trigger event changes with a minor suggestion.

The updated trigger events are well-aligned with the workflow's purpose and the PR objective. Running on release events instead of pull requests can significantly reduce unnecessary builds, potentially speeding up overall CI/CD processes.

Consider adding a workflow_call event to allow this workflow to be reused in other workflows if needed:

on:
  release:
    types: [created, published]
  workflow_dispatch:
  workflow_call:

This addition would enhance the workflow's flexibility without affecting its current behavior.


Line range hint 8-124: Consider further optimizations for faster builds.

While the changes to the trigger events will help reduce unnecessary runs, consider the following optimizations to potentially speed up the workflow execution:

  1. Use caching for dependencies:
    Add a step to cache the dnf packages to speed up future runs.

  2. Parallelize tests:
    The prove command is already using parallelization with -j$(nproc), which is good. Consider if any other steps can be parallelized.

  3. Conditional steps:
    Some verification steps might be redundant if the installation script is reliable. Consider making some checks conditional or combining them to reduce overall execution time.

Here's an example of how you might implement caching:

- name: Cache dnf packages
  uses: actions/cache@v2
  with:
    path: /var/cache/dnf
    key: ${{ runner.os }}-dnf-${{ hashFiles('**/install-rpm.sh') }}
    restore-keys: |
      ${{ runner.os }}-dnf-

- name: Install dependencies
  run: |
    sudo dnf makecache
    sudo dnf install -y tar

These suggestions aim to further align the workflow with the PR objective of speeding up builds.

.github/workflows/python-release.yml (2)

Line range hint 30-81: Approve linux job changes with a suggestion for bystro-api step.

The additions of Golang installation, pip updates, and Python dependency installation are good practices that ensure consistent and up-to-date build environments. The --durations=0 option in pytest is helpful for identifying slow tests.

Consider expanding the bystro-api smoke test to include a basic functionality check, not just --help. This could provide more confidence in the build. For example:

- bystro-api --help
+ bystro-api --version
+ bystro-api --test-connection  # Replace with an actual test command if available
🧰 Tools
🪛 actionlint

58-58: shellcheck reported issue in this script: SC2086:info:17:13: Double quote to prevent globbing and word splitting

(shellcheck)


Line range hint 166-182: Approve the new release job with a suggestion.

The addition of the release job is an excellent automation that aligns well with the new trigger events. It ensures that PyPI releases happen automatically when a new tag is created, which can significantly streamline the release process.

Consider adding a step to create release notes or update a changelog as part of this job. This could be done using a GitHub Action like release-drafter/release-drafter@v5. For example:

- name: Draft Release Notes
  uses: release-drafter/release-drafter@v5
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

This would help maintain clear documentation of changes for each release.

.github/workflows/perl-package-build.yml (1)

3-8: Optimization approved, consider expanding paths and adding documentation.

The addition of the paths filter to the workflow trigger is a good optimization. It will help reduce unnecessary builds and save computational resources by only running the workflow when relevant files are modified.

However, consider the following suggestions:

  1. Evaluate if there are any other directories that might indirectly affect the Perl package build. For example, if there are shared scripts or configuration files in other locations, you might want to include them in the paths list.

  2. Add a comment explaining the rationale behind the chosen paths. This will help future maintainers understand why these specific directories were selected.

Consider updating the configuration as follows:

on:
  pull_request:
    paths:
      - "perl/**"
      - "go/**"
      - "config/**"
      # Add any other relevant paths here
      # - "shared/**"
      # - "scripts/**"

# Add a comment explaining the rationale, for example:
# This workflow runs only when files that directly affect the Perl package are modified.
# The included paths cover the main Perl code, Go dependencies, and configuration files.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 15e7bcb and 9af818f.

📒 Files selected for processing (7)
  • .github/workflows/lint.yml (1 hunks)
  • .github/workflows/perl-build-docker.yml (2 hunks)
  • .github/workflows/perl-package-build.yml (1 hunks)
  • .github/workflows/perl-test-install-apt.yml (1 hunks)
  • .github/workflows/perl-test-install-rpm.yml (1 hunks)
  • .github/workflows/python-build-docker.yml (2 hunks)
  • .github/workflows/python-release.yml (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/perl-build-docker.yml
  • .github/workflows/python-build-docker.yml
🧰 Additional context used
🔇 Additional comments (4)
.github/workflows/lint.yml (1)

45-45: Consider the relevance of --durations=0 flag to PR objective

The addition of the --durations=0 flag to pytest is a good practice for obtaining detailed information about test execution times. This can be valuable for identifying performance bottlenecks in the test suite.

However, it's worth noting that this change doesn't directly contribute to the stated PR objective of speeding up Docker builds. While it may provide insights that could lead to optimizations in the CI/CD pipeline, the connection to Docker build speed is indirect at best.

Consider the following:

  1. Does this change align with the immediate goal of speeding up Docker builds?
  2. If not, should it be part of a separate PR focused on test optimization?
  3. Is there a specific reason for adding this flag now, such as recent performance issues in the test suite?

To better understand the impact of this change, you could run the following command to check if there are any particularly slow tests:

This will help determine if there are any tests that might be significantly impacting the overall build time.

.github/workflows/perl-test-install-apt.yml (1)

3-6: Trigger events change improves workflow efficiency.

The modification of trigger events from pull requests to release events and manual dispatch is a good improvement. This change ensures that the installation and testing process is executed at more appropriate times (during releases) while still allowing for manual triggering when needed.

This adjustment aligns well with the PR objective of speeding up Docker builds, as it reduces unnecessary executions during the development process.

.github/workflows/python-release.yml (2)

Line range hint 99-149: Approve macos job changes.

The changes in the macos job mirror those in the linux job, which is good for consistency across platforms. The inclusion of both x86_64 and aarch64 architectures ensures compatibility with both Intel and Apple Silicon Macs, which is excellent for broad user support.

🧰 Tools
🪛 actionlint

126-126: shellcheck reported issue in this script: SC2086:info:17:13: Double quote to prevent globbing and word splitting

(shellcheck)


9-11: Approve trigger event changes with a consideration.

The change from push to release events aligns well with the PR objective to speed up Docker builds by reducing unnecessary workflow runs. This is a good optimization.

Consider if you still need to run this workflow on pull requests for testing purposes. If so, you might want to add a conditional job that runs on pull requests but skips the release steps. Here's a script to check if there are other workflows that run on pull requests:

✅ Verification successful

To proceed with the verification, please run the following corrected script to check for workflows that run on pull requests:


Pull request triggers still present in other workflows.

The removal of the pull_request trigger from .github/workflows/python-release.yml does not impact the ability to test changes before merging, as other workflows continue to run on pull requests.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for workflows that run on pull requests
rg --type yaml 'on:.*\n.*pull_request:' .github/workflows

Length of output: 251


Script:

#!/bin/bash
# Description: Check for workflows that run on pull requests with multiline support
rg --multiline --type yaml 'on:.*\n.*pull_request:' .github/workflows

Length of output: 377

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 9af818f and 81bd58e.

📒 Files selected for processing (3)
  • .github/workflows/lint.yml (1 hunks)
  • .github/workflows/perl-package-build.yml (3 hunks)
  • python/requirements-dev.txt (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • python/requirements-dev.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/perl-package-build.yml
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml

38-38: shellcheck reported issue in this script: SC2046:warning:1:19: Quote this to prevent word splitting

(shellcheck)


42-42: shellcheck reported issue in this script: SC2046:warning:2:36: Quote this to prevent word splitting

(shellcheck)

🔇 Additional comments (6)
.github/workflows/lint.yml (6)

1-1: Improved workflow name

The updated workflow name "Python Lint and Test" is more descriptive and accurately reflects the purpose of this workflow. This change enhances clarity for developers and maintainers.


7-7: Consistent job naming

The job name has been updated to "Python Lint and Test", matching the workflow name. This change maintains consistency and improves clarity within the workflow file.


20-30: Enhanced caching mechanism

The addition of built-in pip caching and Rust build artifact caching is a good optimization that can significantly speed up the workflow execution.

However, the inclusion of Rust caching in a workflow named "Python Lint and Test" is unexpected. Could you please clarify:

  1. Are there Rust components in this Python project?
  2. If so, should the workflow name be updated to reflect both Python and Rust?
  3. Is the Cargo.lock file present in the repository root?

You can verify the presence of Rust-related files with:

#!/bin/bash
# Check for Rust-related files
echo "Cargo.toml files:"
fd Cargo.toml
echo "Cargo.lock file:"
fd Cargo.lock
echo "*.rs files:"
fd -e rs

31-36: Streamlined installation process

The installation step has been consolidated and simplified, which is a good improvement. The use of make develop suggests a standardized setup process, which is a best practice.

To ensure the setup process works as expected, please verify:

  1. Does the make develop command handle all necessary setup steps, including any previously managed by virtual environment activation?
  2. Are all required dependencies correctly installed and configured?

You can check the contents of the Makefile to verify the develop target:

#!/bin/bash
# Display the content of the Makefile, focusing on the develop target
awk '/^develop:/, /^[a-zA-Z]/' Makefile

47-48: Enhanced test execution

The updated test command with --durations=0 and --n auto options is a good improvement. This will provide timing information for all tests and enable parallel test execution, potentially speeding up the test suite significantly.

To ensure optimal test execution, please verify:

  1. Is pytest-xdist installed and configured correctly in the project's dependencies?
  2. Are there any tests that are not suitable for parallel execution (e.g., due to shared resources or side effects)?

You can check the project's dependencies with:

If pytest-xdist is not found, consider adding it to the project's development dependencies.

✅ Verification successful

Test Execution Configuration Verified

The presence of pytest-xdist in the project's dependencies confirms that parallel test execution is set up correctly. Additionally, no tests were found that would hinder parallel execution, ensuring that the test suite should run efficiently with the updated command.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for pytest-xdist in the project's dependencies
grep -E "pytest-xdist|pytest\[xdist\]" python/requirements*.txt python/pyproject.toml

Length of output: 137


Script:

#!/bin/bash
# Search for pytest markers that disable parallel execution
rg --files-with-matches --ignore-case "pytest.mark.serial" tests/

# Search for file write operations in test files
rg --files-with-matches --ignore-case "\.write\(" tests/

Length of output: 218


Script:

#!/bin/bash
# Find all Python test files based on common naming conventions
fd --extension py --regex 'test_.*\.py$|.*_test\.py$' > test_files.txt

# Search within the found test files for pytest markers that disable parallel execution
rg --files-with-matches --ignore-case "pytest.mark.serial" $(cat test_files.txt) > serial_tests.txt

# Search within the found test files for file write operations
rg --files-with-matches --ignore-case "\.write\(" $(cat test_files.txt) > write_operations.txt

Length of output: 9832


13-15: Updated checkout action to v4

Updating to the latest version of actions is a good practice. However, it's important to ensure that this update doesn't introduce any breaking changes.

Please verify that the actions/checkout@v4 is compatible with your workflow and doesn't introduce any unexpected behavior. You can check the changelog for any significant changes:

✅ Verification successful

Action Required: Correct and Rerun Release Notes Fetch Script

Please use the following corrected shell script to fetch the latest release notes for actions/checkout@v4:


Action Required: Correct the Release Notes Fetch Script Syntax

Please use the following corrected shell script to fetch the latest release notes for actions/checkout@v4:


Update to actions/checkout@v4 Verified

The update to actions/checkout@v4 has been verified. Release notes indicate that the update does not introduce any breaking changes and should work seamlessly with your workflow.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Fetch the latest release notes for actions/checkout
gh release view --repo actions/checkout v4.0.0

Length of output: 264


Script:

#!/bin/bash
# Fetch the latest release notes for actions/checkout@v4
gh release view --repo actions/checkout --tag v4.0.0 --json body

Length of output: 418


Script:

#!/bin/bash
# Fetch the latest release notes for actions/checkout@v4
gh release view v4.0.0 --repo actions/checkout --json body

Length of output: 680

Comment on lines 37 to 44
- name: Linting with ruff
run: |
source ~/.venv/ci-venv/bin/activate
cd python && ruff $(git ls-files '*.py')

- name: Static analysis with mypy
run: |
source ~/.venv/ci-venv/bin/activate
mypy --version
mypy --config-file python/mypy.ini $(git ls-files '*.py')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improved linting and static analysis steps

The adjustments to the linting and static analysis steps are good improvements, particularly the use of git ls-files '*.py' to ensure only tracked Python files are processed.

However, the static analysis tool has flagged potential issues with word splitting in the commands. To address this, please update the commands as follows:

- cd python && ruff $(git ls-files '*.py')
+ cd python && ruff "$(git ls-files '*.py')"

- mypy --config-file python/mypy.ini $(git ls-files '*.py')
+ mypy --config-file python/mypy.ini "$(git ls-files '*.py')"

These changes will prevent potential issues with filenames containing spaces or special characters.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Linting with ruff
run: |
source ~/.venv/ci-venv/bin/activate
cd python && ruff $(git ls-files '*.py')
- name: Static analysis with mypy
run: |
source ~/.venv/ci-venv/bin/activate
mypy --version
mypy --config-file python/mypy.ini $(git ls-files '*.py')
- name: Linting with ruff
run: |
cd python && ruff "$(git ls-files '*.py')"
- name: Static analysis with mypy
run: |
mypy --version
mypy --config-file python/mypy.ini "$(git ls-files '*.py')"
🧰 Tools
🪛 actionlint

38-38: shellcheck reported issue in this script: SC2046:warning:1:19: Quote this to prevent word splitting

(shellcheck)


42-42: shellcheck reported issue in this script: SC2046:warning:2:36: Quote this to prevent word splitting

(shellcheck)

@akotlar akotlar changed the title speed up docker builds speed up GitHub workflows Oct 13, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
python/python/bystro/ancestry/tests/test_adversarial_autoencoder.py (1)

6-8: Improved flexibility in data generation. Consider adding type hints.

The changes to the generate_data function enhance its flexibility by allowing customization of key parameters. This is a good improvement for testing different scenarios.

Consider adding type hints to improve code readability and maintainability:

def generate_data(N: int = 10000, p: int = 100, L: int = 3, sigma: float = 1.0) -> tuple:

Also, you might want to add a brief docstring explaining the purpose of each parameter and the return values.

.github/workflows/lint.yml (2)

13-15: Approved: Updated checkout action and added Go setup

Great improvements:

  1. Updating the checkout action to v4 ensures you're using the latest features and bug fixes.
  2. Adding Go setup suggests the project now has Go dependencies, which is properly configured with a specific version and caching enabled.

Consider using a version variable for Go to make it easier to update in the future:

env:
  GO_VERSION: '1.21.3'

steps:
  - name: Set up Go
    uses: actions/setup-go@v4
    with:
      go-version: ${{ env.GO_VERSION }}
      cache: true

This approach allows for easier version management across multiple workflows if needed.

Also applies to: 21-26


44-54: Approved: Improved setup with virtual environment

Great improvements in the setup process:

  1. Adding a virtual environment setup step isolates project dependencies, which is a best practice.
  2. The installation step is comprehensive, including virtual environment activation, pip upgrade, dependency installation, and development setup.

For consistency and to avoid potential issues with different shell environments, consider using the full path to the virtual environment's activate script:

- name: Install dependencies and build
  run: |
    . ~/.venv/ci-venv/bin/activate
    python -m pip install --upgrade pip
    pip install -r python/requirements-dev.txt
    make develop

This change uses the POSIX-compliant . command instead of source, which may not be available in all shell environments.

🧰 Tools
🪛 actionlint

49-49: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)

python/python/bystro/prs/prscs_hmcmc.py (2)

138-138: Consider removing or modifying the debug print statement.

The added print statement appears to be for debugging purposes. While it can be helpful during development, it's generally not recommended to leave such statements in production code.

Consider one of the following options:

  1. Remove the print statement entirely if it's no longer needed.
  2. Replace it with a proper logging statement, such as:
    import logging
    logging.debug(f"MCMC options: {self.mcmc_options}")
  3. If the information is crucial, consider adding it to the method's docstring or as a comment instead of a print statement.

139-139: Approve the use of **self.mcmc_options in MCMC instantiation.

The change to use **self.mcmc_options when instantiating the MCMC object is a good improvement. It allows for more flexible configuration of MCMC parameters and maintains consistency with the options set in the _fill_mcmc_options method.

Consider updating the docstring of the fit method to reflect this change. You could add a note about the MCMC options being configurable through the class initialization. For example:

def fit(self, Z, y):
    """
    The fit method.

    Parameters
    ----------
    Z : np.array-like,(n_samples,p)
        The covariates of interest
    y : np.array-like,(n_samples,)
        The trait, disease, or outcome

    Returns
    -------
    self

    Notes
    -----
    MCMC options are configurable through the class initialization
    and are used in the MCMC run.
    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 81bd58e and bd36483.

📒 Files selected for processing (7)
  • .github/workflows/lint.yml (1 hunks)
  • .github/workflows/perl-build-docker.yml (2 hunks)
  • .github/workflows/python-build-docker.yml (2 hunks)
  • python/python/bystro/ancestry/tests/test_adversarial_autoencoder.py (2 hunks)
  • python/python/bystro/prs/prscs_hmcmc.py (2 hunks)
  • python/python/bystro/prs/tests/test_prscs_hmcmc.py (1 hunks)
  • python/requirements-dev.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • .github/workflows/perl-build-docker.yml
  • .github/workflows/python-build-docker.yml
  • python/requirements-dev.txt
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml

49-49: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


56-56: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


56-56: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting

(shellcheck)


61-61: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


61-61: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting

(shellcheck)


67-67: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)

🔇 Additional comments (7)
python/python/bystro/prs/tests/test_prscs_hmcmc.py (1)

22-23: Consider the trade-off between test speed and effectiveness

The changes to test_prscs() significantly reduce the computational complexity:

  1. Data generation (line 22):

    • Samples reduced from 100,000 to 100
    • Features reduced from 25 to 5
  2. Model instantiation (line 23):

    • MCMC options added with very low values (num_samples: 10, warmup_steps: 1)

While these changes align with the PR objective to speed up Docker builds, they might compromise the test's effectiveness. The drastic reduction in data size and model complexity could make the test less robust in catching potential issues.

To ensure the test remains meaningful, please run the following script to compare the execution time and any differences in test outcomes between the old and new versions:

Based on the results, we can better assess if the speed improvement justifies the potential loss in test robustness. If the time saved is significant and the test outcomes remain consistent, the changes may be acceptable. Otherwise, consider finding a middle ground that balances speed and effectiveness.

python/python/bystro/ancestry/tests/test_adversarial_autoencoder.py (2)

Line range hint 1-34: Overall improvements in test flexibility and configuration

The changes in this file enhance the flexibility of the test setup, which is beneficial for thorough testing and potentially faster builds. The parameterization of generate_data and the updated model initialization in test_adversarial_autoencoder allow for easier configuration and testing of different scenarios.

These changes align well with the PR objective of speeding up Docker builds, as they can contribute to more efficient testing processes. Good job on improving the test suite!


23-23: Updated model initialization. Verify AdversarialAutoencoder interface.

The changes to the AdversarialAutoencoder initialization look good. The new parameters allow for more flexible configuration of the model.

Please ensure that these changes are consistent with the updated AdversarialAutoencoder class interface. Run the following script to verify the class definition:

Also, consider adding a comment explaining the significance of the 2 parameter and the n_iterations option.

.github/workflows/lint.yml (4)

1-1: Improved workflow and job naming

The updated names for both the workflow and the job provide a clearer description of their purpose. This change enhances the overall clarity of the CI/CD process.

Also applies to: 7-7


36-43: Approved: pip caching added

Excellent addition of pip caching. This addresses the concern raised in the previous comment about Python-specific caching. The cache key and restore keys are properly set up for the pip cache, which will help optimize Python dependency installation times.

This change resolves the concern raised in the previous comment about the removal of Python-specific caching.


65-69: Approved: Enhanced test execution

Excellent updates to the test command:

  1. Adding --durations=0 will show the time taken for all tests, helping identify slow tests for potential optimization.
  2. The -n auto option enables automatic parallelization of tests, which can significantly speed up test execution.

These changes align well with the PR objective of speeding up builds. Great job on improving the efficiency of the testing process!

🧰 Tools
🪛 actionlint

67-67: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


27-35: Rust caching added, but Python caching removed?

The addition of Rust build artifact caching is a good improvement, suggesting the project now includes Rust components. The cache key and restore keys are properly set up for Rust artifacts.

However, it appears that the previous Python-specific caching (for pip) has been removed. This could potentially impact Python dependency installation times.

Could you confirm if the removal of Python-specific caching was intentional? If not, consider keeping both Rust and Python caching to optimize build times for both languages.

To verify the current caching setup, you can run:

This will help ensure that both Python and Rust dependencies are being cached appropriately.

@@ -52,7 +52,7 @@ def _fill_mcmc_options(self, mcmc_options):
"""
default_options = {
"num_chains": 1,
"num_warmup": 1000,
"warmup_steps": 1000,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Incomplete renaming of num_warmup to warmup_steps.

The search revealed that num_warmup is still used in the following files:

  • python/python/bystro/_template_npr.py:
                "num_warmup": 500,
    
  • python/python/bystro/prs/prscs_hmcmc.py:
            num_warmup : int
    

Please update these instances to warmup_steps to maintain consistency across the codebase.

🔗 Analysis chain

Approve renaming of num_warmup to warmup_steps.

The change from "num_warmup" to "warmup_steps" improves clarity in describing the warmup phase of the MCMC process. This aligns well with common terminology in MCMC literature.

To ensure this change doesn't break existing code, please run the following script to check for any remaining uses of num_warmup:

If the script returns any results, those occurrences should be updated to use warmup_steps for consistency.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining uses of 'num_warmup' in Python files

# Test: Search for 'num_warmup' in Python files
rg --type python 'num_warmup'

Length of output: 60


Script:

#!/bin/bash
# Description: Check for any remaining uses of 'num_warmup' in Python files

# Test: Search for 'num_warmup' in Python files
rg --type py 'num_warmup'

Length of output: 159

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
Dockerfile.python (1)

Line range hint 1-62: Suggestions for Dockerfile optimization

While the Dockerfile is well-structured, consider the following improvements:

  1. Optimize apt-get commands:
    Combine apt-get update and install, and clean up in the same layer to reduce image size.
-RUN apt-get update && apt-get install -y --no-install-recommends \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     build-essential \
     git \
     wget curl\
     openssl libcurl4-openssl-dev libssl-dev \
     tar lz4 pigz tabix unzip \
     patch \
     awscli \
     unzip \
     libssl-dev \
-    pkg-config \ 
-    && curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y \
-    && . $HOME/.cargo/env
+    pkg-config \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Rust
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y \
+    && . $HOME/.cargo/env
  1. Consolidate COPY commands:
    Group COPY commands to reduce the number of layers.
-COPY ./startup.yml ./startup.yml
-COPY ./python ./python
-COPY ./config ./config
+COPY ./startup.yml ./python ./config ./
  1. Consider adding a .dockerignore file:
    This can prevent unnecessary files from being copied into the image, potentially speeding up the build process and reducing image size.

These optimizations can help reduce the image size and potentially speed up the build process further.

.github/workflows/lint.yml (4)

27-53: LGTM: Improved caching strategy

Excellent improvements to the caching strategy:

  1. Adding caches for Rust Cargo, Go modules, and pip suggests a polyglot project structure.
  2. Using specific file hashes (e.g., Cargo.lock, go.sum) for cache keys is a good practice for proper cache invalidation.

One minor suggestion:

Consider adding a ~/.cache/pip to the restore-keys for the pip cache to potentially reuse a cache from a previous build with a different hash:

restore-keys: |
  ${{ runner.os }}-pip-
  ${{ runner.os }}-

This could speed up builds when dependencies haven't changed.


55-68: LGTM: Improved dependency management

Good improvements in dependency management:

  1. Using a virtual environment ensures isolation of project dependencies.
  2. The make develop command suggests a standardized build process, which is excellent for consistency.
  3. Setting Go-related environment variables confirms Go usage in the project.

Consider combining the virtual environment setup and dependency installation steps to reduce the number of steps and potential points of failure:

- name: Set up virtual environment and install dependencies
  env:
    GOROOT: /opt/hostedtoolcache/go/1.21.3/x64
    GOPATH: ${{ env.GOPATH }}
  run: |
    python -m venv ~/.venv/ci-venv
    source ~/.venv/ci-venv/bin/activate
    python -m pip install --upgrade pip
    pip install -r python/requirements-dev.txt
    make develop

This change would streamline the workflow and ensure that the virtual environment is used consistently.

🧰 Tools
🪛 actionlint

63-63: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


81-83: LGTM: Improved test execution

The updates to the pytest command are excellent improvements:

  1. --durations=0 shows the time taken for all tests, helping identify slow tests.
  2. -n auto enables automatic parallelization, potentially speeding up test execution.

Consider adding the --color=yes flag to ensure colored output in CI logs, which can improve readability:

cd python && pytest --durations=0 -n auto --color=yes

Also, you might want to add the --capture=no flag to show print statements and logging output for failed tests, which can be helpful for debugging:

cd python && pytest --durations=0 -n auto --color=yes --capture=no
🧰 Tools
🪛 actionlint

81-81: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


1-83: Overall excellent improvements to the CI/CD workflow

The changes in this file significantly enhance the build process:

  1. Better naming and organization of workflow steps.
  2. Introduction of Go support, suggesting a more polyglot project structure.
  3. Improved caching strategy for multiple languages.
  4. Enhanced linting and testing with parallelization and duration reporting.

These improvements should lead to faster, more reliable builds and better code quality checks. Great work on optimizing the CI/CD pipeline!

As the project seems to be evolving into a polyglot structure with Python, Go, and possibly Rust, consider the following architectural advice:

  1. Ensure clear documentation of the project's language requirements and setup process.
  2. Consider using a tool like asdf or rtx for managing multiple language versions consistently across development and CI environments.
  3. If not already in place, implement integration tests that verify the interaction between components written in different languages.
🧰 Tools
🪛 actionlint

63-63: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


70-70: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


70-70: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


75-75: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting

(shellcheck)


81-81: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 5883122 and 89f8374.

📒 Files selected for processing (2)
  • .github/workflows/lint.yml (1 hunks)
  • Dockerfile.python (1 hunks)
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml

63-63: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


70-70: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


70-70: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


75-75: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting

(shellcheck)


81-81: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)

🔇 Additional comments (5)
Dockerfile.python (1)

60-60: Improved pytest command for faster and more informative test runs

The changes to the pytest command are beneficial:

  1. --durations=0: This flag shows the time taken for all tests, helping to identify slow tests that might be bottlenecks in the build process.
  2. -n auto: This flag enables parallel test execution, automatically determining the number of CPUs to use. This can significantly speed up the testing process, especially for larger test suites.

These improvements align well with the PR objective of speeding up Docker builds by potentially reducing the time taken for tests to run.

.github/workflows/lint.yml (4)

1-3: LGTM: Improved workflow name

The updated workflow name "Python Lint and Test" is more descriptive and accurately reflects the purpose of this workflow.


7-12: LGTM: Consistent job naming

The job name has been updated to "Python Lint and Test", maintaining consistency with the workflow name.


69-79: ⚠️ Potential issue

Fix potential word splitting issues in linting and static analysis commands

The linting and static analysis steps are good for ensuring code quality. However, there's a potential issue with word splitting in the commands that use git ls-files '*.py'.

To address this, please update the commands as follows:

- cd python && ruff $(git ls-files '*.py')
+ cd python && git ls-files '*.py' | xargs ruff

- mypy --config-file python/mypy.ini $(git ls-files '*.py')
+ git ls-files '*.py' | xargs mypy --config-file python/mypy.ini

These changes use xargs to properly handle filenames that might contain spaces or special characters.

Also, consider adding the --color=always flag to the MyPy command to ensure colored output in CI logs, which can improve readability:

git ls-files '*.py' | xargs mypy --config-file python/mypy.ini --color=always

Let's verify if there are any Python files with spaces in their names:

#!/bin/bash
# Description: Check for Python files with spaces in their names

echo "Python files with spaces in their names:"
find . -name "* *.py" -type f
🧰 Tools
🪛 actionlint

70-70: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


70-70: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


75-75: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting

(shellcheck)


13-26: LGTM: Improved setup with Go integration

Great improvements in the setup:

  1. Upgrading the checkout action to v4 ensures you're using the latest features and bug fixes.
  2. Adding Go setup suggests the project now has Go dependencies.

Could you clarify the introduction of Go in the project? This change implies new Go dependencies, but it's not immediately clear from the PR description. Let's verify the Go usage:

✅ Verification successful

LGTM: Go integration verified

The addition of Go setup has been confirmed:

  1. Multiple Go files and configuration files (go.mod, go.sum) are present, indicating intentional use of Go.
  2. The Python import of go refers to an internal function, showing proper integration between Python and Go components.
  3. The README provides clear instructions for installing Go, reinforcing its intended usage.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify Go usage in the project

# Test: Check for Go files
echo "Checking for Go files:"
fd -e go

# Test: Check for Go-related configuration files
echo "Checking for Go-related configuration files:"
fd -e mod -e sum

# Test: Check for Go imports in Python files
echo "Checking for Go imports in Python files:"
rg -t python 'import\s+go'

Length of output: 512


Script:

#!/bin/bash
# Description: Corrected script to verify Go usage in the project

# Test: Check for Go files
echo "Checking for Go files:"
fd -e go

# Test: Check for Go-related configuration files
echo "Checking for Go-related configuration files:"
fd -e mod -e sum

# Test: Check for Go imports in Python files
echo "Checking for Go imports in Python files:"
rg -t py 'import\s+go'

Length of output: 712


Script:

#!/bin/bash
# Description: Investigate the usage of 'go' in Python files

# Test: List Python files that import 'go'
echo "Listing Python files that import 'go':"
rg -t py 'import\s+go|from\s+.*\s+import\s+go'

# Test: Search for 'go' module or class definitions in Python files
echo "Searching for 'go' module or class definitions in Python files:"
rg -t py 'def\s+go\b|class\s+Go\b'

# Test: Check if 'go' is listed in Python dependency files
echo "Checking for 'go' in Python dependency files:"
rg 'go' README.md requirements.txt Pipfile setup.py

Length of output: 1283

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)
.github/workflows/lint.yml (2)

21-26: Go setup added with caching

The addition of Go setup with a specific version and caching is a good improvement. This ensures consistency across environments and can significantly speed up builds, aligning with the PR's objective.

Consider using a more flexible version specifier for Go to automatically get minor updates:

go-version: "1.21.x"

This will use the latest patch version of Go 1.21, allowing for automatic security updates without breaking changes.


49-52: Improved test execution with parallelization and duration reporting

The updates to the test command are excellent improvements:

  1. Activating the virtual environment ensures the correct Python environment is used.
  2. The --durations=0 option helps identify slow tests by displaying the time taken for all tests.
  3. The -n auto option enables automatic parallelization, which can significantly speed up test execution.

These changes align well with the PR's objective of speeding up builds. To further improve the output, consider adding the -v flag for more verbose test output:

- cd python && pytest --durations=0 -n auto
+ cd python && pytest -v --durations=0 -n auto

This will provide more detailed information about each test, making it easier to identify and debug any issues that may arise.

🧰 Tools
🪛 actionlint

50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)

.github/workflows/perl-package-build.yml (3)

24-31: LGTM: Effective caching strategy for Perl modules

The addition of the caching step for Perl modules is an excellent improvement:

  1. It will speed up subsequent workflow runs by reusing previously installed modules.
  2. The cache key is well-designed, incorporating both the runner OS and the cpanfile hash.
  3. The restore-keys fallback is correctly implemented.

Consider adding a comment explaining the caching strategy for better maintainability:

      - name: Cache Perl modules
        uses: actions/cache@v3
        with:
          path: ~/.perl-cpm
          # Cache key based on OS and cpanfile hash, with fallback to any previous cache for the OS
          key: ${{ runner.os }}-perl-${{ hashFiles('perl/cpanfile') }}
          restore-keys: |
            ${{ runner.os }}-perl-

Line range hint 33-111: Consider simplifying Go installation

While this step wasn't changed in this PR, it's worth noting that the Go installation process is quite complex. Consider simplifying it by using a dedicated GitHub Action for Go installation, such as actions/setup-go@v4. This would make the workflow more maintainable and less error-prone.

Here's an example of how you could simplify the Go installation:

      - name: Install Go
        uses: actions/setup-go@v4
        with:
          go-version: '1.21.3' # Specify the Go version you need

This approach would handle architecture differences automatically and keep Go up-to-date more easily.


Line range hint 123-137: LGTM: Unchanged final steps with a suggestion

The final steps of the workflow (checkout, installing Bystro dependencies, testing, and building) remain unchanged and follow good practices:

  1. Using actions/checkout@v4 ensures you're using a recent version of the action.
  2. The use of dzil (Dist::Zilla) for managing the Perl distribution is a good practice.
  3. The testing and building steps are appropriately placed at the end of the workflow.

While these steps are solid, consider adding a step to upload the built tarball as an artifact. This can be useful for debugging or further processing. Here's an example:

      - name: Upload Bystro tarball
        uses: actions/upload-artifact@v3
        with:
          name: bystro-perl-package
          path: perl/*.tar.gz

This would make the built package easily accessible from the GitHub Actions interface.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 89f8374 and 6749043.

📒 Files selected for processing (6)
  • .github/workflows/lint.yml (1 hunks)
  • .github/workflows/perl-build-docker.yml (1 hunks)
  • .github/workflows/perl-package-build.yml (3 hunks)
  • .github/workflows/perl-test-install-apt.yml (1 hunks)
  • .github/workflows/perl-test-install-rpm.yml (1 hunks)
  • .github/workflows/python-build-docker.yml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • .github/workflows/perl-build-docker.yml
  • .github/workflows/perl-test-install-apt.yml
  • .github/workflows/perl-test-install-rpm.yml
  • .github/workflows/python-build-docker.yml
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml

32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting

(shellcheck)


44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting

(shellcheck)


50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)

🔇 Additional comments (5)
.github/workflows/lint.yml (4)

1-1: Improved workflow and job naming

The updated names for both the workflow and the job are more descriptive and accurately reflect the purpose of this CI process. This change enhances clarity and maintainability.

Also applies to: 7-7


13-15: Checkout action updated to v4

Updating the checkout action to v4 is a good practice. It ensures you're using the latest features and security improvements. This change may also contribute to faster builds, aligning with the PR's objective.


27-30: Virtual environment setup added

Adding a virtual environment setup is a good practice for Python projects. The comment suggests this is needed for maturin, which implies the project might be using Rust bindings for Python.

Could you provide more context on the introduction of maturin? This change might impact build times, so it would be helpful to understand its necessity and potential performance implications.


31-37: Streamlined installation and build process

The updated installation step, which combines dependency installation and build processes, is a good optimization. Activating the virtual environment before running commands ensures the correct Python environment is used. These changes should contribute to faster build times, aligning with the PR's objective.

🧰 Tools
🪛 actionlint

32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)

.github/workflows/perl-package-build.yml (1)

1-1: LGTM: Improved workflow name and trigger conditions

The changes to the workflow name and trigger conditions are beneficial:

  1. The new name "Build Bystro Annotator Perl Package" is more specific and descriptive.
  2. The updated trigger conditions will help reduce unnecessary workflow runs by only triggering on relevant file changes.

These improvements enhance the clarity and efficiency of the CI/CD process.

Also applies to: 3-8

Comment on lines +43 to +48
- name: Static analysis with MyPy
run: |
source ~/.venv/ci-venv/bin/activate
mypy --version
mypy --config-file python/mypy.ini $(git ls-files '*.py')

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential word splitting issue in MyPy command and version output improvement

The static analysis step with MyPy has been improved by activating the virtual environment and using git ls-files '*.py' to process only tracked Python files. Printing the MyPy version is a good addition for debugging and reproducibility. However, there's a potential issue with word splitting in the command.

To address this and improve the version output, please update the step as follows:

- name: Static analysis with MyPy
  run: |
    source ~/.venv/ci-venv/bin/activate
    echo "MyPy version: $(mypy --version)"
    git ls-files '*.py' | xargs mypy --config-file python/mypy.ini

This change uses xargs to properly handle filenames that might contain spaces or special characters and improves the version output format.

🧰 Tools
🪛 actionlint

44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting

(shellcheck)

Comment on lines +38 to +42
- name: Linting with Ruff
run: |
source ~/.venv/ci-venv/bin/activate
cd python && ruff $(git ls-files '*.py')
- name: Static analysis with mypy

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential word splitting issue in Ruff command

The linting step with Ruff has been improved by activating the virtual environment and using git ls-files '*.py' to process only tracked Python files. However, there's a potential issue with word splitting in the command.

To address this, please update the command as follows:

- cd python && ruff $(git ls-files '*.py')
+ cd python && git ls-files '*.py' | xargs ruff

This change uses xargs to properly handle filenames that might contain spaces or special characters, addressing the concern raised in the static analysis hint.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Linting with Ruff
run: |
source ~/.venv/ci-venv/bin/activate
cd python && ruff $(git ls-files '*.py')
- name: Static analysis with mypy
- name: Linting with Ruff
run: |
source ~/.venv/ci-venv/bin/activate
cd python && git ls-files '*.py' | xargs ruff
🧰 Tools
🪛 actionlint

39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting

(shellcheck)

Comment on lines +113 to +121
go install github.com/bystrogenomics/[email protected].3
cpm install -g https://github.com/bystrogenomics/msgpack-perl.git
cpm install -g --no-test MouseX::Getopt
git clone --depth 1 --recurse-submodules https://github.com/salortiz/LMDB_File.git \
&& cd LMDB_File \
&& cpanm --quiet .
ALIEN_INSTALL_TYPE=share cpm install -g --test Alien::LMDB
cpm install -g --test LMDB_File
cpm install -g MouseX::Getopt # fails due to differences from expected output; unimportant
cpm install -g --test IO::FDPass
cpm install -g --test Beanstalk::Client
cpm install -g --test Sys::CpuAffinity
cpm install -g --test DBD::MariaDB
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Approve dependency updates with a note on MouseX::Getopt

The updates to the dependency installation process are generally good:

  1. Updating bystro-vcf to version 2.2.3 is likely a necessary version bump.
  2. Adding --test flags to most module installations is a good practice to ensure they work correctly in the CI environment.

However, there's a concern with the MouseX::Getopt installation:

  • The comment indicates that it "fails due to differences from expected output".
  • This failure is marked as "unimportant", but it's generally not a good practice to ignore test failures.

Consider investigating why MouseX::Getopt is failing its tests and either:

  1. Fix the underlying issue causing the test failure.
  2. Pin to a specific version of MouseX::Getopt that passes its tests.
  3. If the test failure is truly inconsequential, document why it's safe to ignore.

Example of pinning to a specific version:

cpm install -g MouseX::[email protected]  # Replace 0.34 with a version known to work

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
python/python/bystro/supervised_ppca/tests/test_generative_missing_pt.py (3)

13-13: LGTM! Consider parameterizing the number of iterations.

The update to include training_options in the PPCAM instantiation is correct and aligns with the changes in the class's __init__ method.

To improve test flexibility and potentially reduce test execution time, consider parameterizing the number of iterations:

N_ITERATIONS = 20  # Define at the top of the file

# Then use it in the test
ppcam = PPCAM(n_components=n_components, training_options={"n_iterations": N_ITERATIONS})

This way, you can easily adjust the number of iterations for all tests if needed.


22-23: LGTM! Consider refactoring for clarity and completeness.

The update to include training_options with both n_iterations and use_gpu parameters is correct. However, the presence of two PPCAM instantiations with different options in the same test function might be confusing.

Consider refactoring the test to clearly separate and test both GPU and non-GPU versions:

def test_ppcam_fit():
    # ... (previous code)

    for use_gpu in [True, False]:
        ppcam = PPCAM(
            n_components=n_components,
            training_options={"use_gpu": use_gpu, "n_iterations": N_ITERATIONS}
        )
        ppcam.fit(X)

        # Test if the model is fitted
        assert ppcam.W_ is not None
        assert ppcam.sigma2_ is not None
        assert ppcam.p is not None

This approach tests both GPU and non-GPU versions, ensuring complete coverage of the PPCAM functionality.


32-32: LGTM! Consider using a fixture to reduce code duplication.

The updates to include training_options in the PPCAM instantiation for both test_ppcam_get_covariance and test_ppcam_get_noise functions are correct and consistent with the changes in test_ppcam_fit.

To reduce code duplication and improve maintainability, consider using a pytest fixture to create the PPCAM instance:

import pytest

@pytest.fixture
def ppcam_instance(n_components):
    return PPCAM(n_components=n_components, training_options={"n_iterations": N_ITERATIONS})

def test_ppcam_get_covariance(ppcam_instance):
    # Use ppcam_instance instead of creating a new one

def test_ppcam_get_noise(ppcam_instance):
    # Use ppcam_instance instead of creating a new one

This approach centralizes the PPCAM instantiation, making it easier to update all tests if the initialization parameters change in the future.

Also applies to: 45-45

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f3e0784 and 63c501d.

📒 Files selected for processing (3)
  • .github/workflows/lint.yml (1 hunks)
  • Dockerfile.python (1 hunks)
  • python/python/bystro/supervised_ppca/tests/test_generative_missing_pt.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Dockerfile.python
🧰 Additional context used
🪛 actionlint
.github/workflows/lint.yml

32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting

(shellcheck)


44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting

(shellcheck)


50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)

🔇 Additional comments (7)
.github/workflows/lint.yml (7)

1-1: Improved workflow name

The change from "lint" to "Python Lint and Test" provides a more accurate and descriptive name for the workflow. This improvement enhances clarity and aligns with best practices for CI/CD pipeline naming.


7-7: Improved job naming and action versioning

The job name update to "Python Lint and Test" maintains consistency with the workflow name, enhancing overall clarity. Additionally, upgrading the checkout action to v4 is a good practice, ensuring you're using the latest features and bug fixes.

Also applies to: 13-14


50-52: Enhanced test command with duration reporting

The addition of --durations=0 to the pytest command is a good improvement. This option shows the time taken for all tests, which can be very useful for identifying slow tests and optimizing overall test performance.

This change aligns well with the PR objective of speeding up builds. Consider adding a comment explaining the purpose of this flag, such as:

# Report duration of all tests to help identify and optimize slow tests
cd python && pytest --durations=0

This will help other contributors understand the reasoning behind this addition.

🧰 Tools
🪛 actionlint

50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


1-52: Overall workflow improvements

The changes made to this workflow file significantly enhance its functionality and align well with the PR objective of speeding up Docker builds:

  1. The workflow and job names are now more descriptive.
  2. The addition of Go setup suggests expanded language support.
  3. The use of a virtual environment improves dependency isolation.
  4. Updates to linting, static analysis, and testing steps improve efficiency and provide more useful output.

These changes collectively contribute to a more robust and informative CI process. The suggestions provided in the review comments, particularly regarding word splitting in commands and additional documentation, will further refine the workflow.

Great job on these improvements! They should contribute to faster and more reliable builds.

🧰 Tools
🪛 actionlint

32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting

(shellcheck)


44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting

(shellcheck)


50-50: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


27-30: Addition of virtual environment setup

The introduction of a virtual environment setup is an excellent practice for Python projects. It ensures isolation and reproducibility of the project's dependencies.

The comment suggests that this is needed for maturin, which typically implies Rust integration with Python. Could you provide more context on the use of maturin in this project? This information would be valuable for understanding the project's architecture and dependencies.

Let's check for Rust-related files and maturin configuration:

✅ Verification successful

Approval of Virtual Environment Setup

The virtual environment setup is correctly implemented to support maturin for Rust integration. The presence of Rust files and maturin configuration in the project verifies this necessity.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for Rust files and maturin configuration
echo "Rust files in the project:"
fd -e rs
echo "Maturin configuration:"
fd pyproject.toml -x grep -i maturin {} || echo "No maturin configuration found"

Length of output: 346


31-37: Improved dependency installation process

The updated dependency installation process is well-structured:

  1. It correctly activates the virtual environment before installing dependencies.
  2. Using a separate requirements-dev.txt file for development dependencies is a good practice.
  3. The addition of the make develop command suggests a streamlined setup process.

These changes will help ensure consistent environments across development and CI.

Consider adding a comment or documentation about what the make develop command does, as it might not be immediately clear to new contributors.

Let's check the contents of the Makefile:

✅ Verification successful

Document the make develop command

The make develop command performs the following actions:

  1. install-go: Installs the necessary Go dependencies.
  2. build-python-dev: Builds the Python components in development mode.
  3. ray-start-local: Starts Ray locally with specific configurations.

Adding this documentation will help new contributors understand the setup process and the purpose of each step involved in make develop.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the contents of the Makefile
echo "Contents of Makefile:"
fd Makefile -x cat {} || echo "No Makefile found"

Length of output: 1298

🧰 Tools
🪛 actionlint

32-32: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


21-25: Addition of Go setup

The inclusion of Go setup in the workflow is a good addition, ensuring that the necessary Go environment is available for the project. The specific version (1.21.3) and enabled caching will contribute to consistent and faster builds.

However, could you please clarify the role of Go in this project? This addition suggests that the project now includes Go components or dependencies, which might be worth mentioning in the PR description or documentation.

To understand the Go usage in the project, let's check for Go files:

✅ Verification successful

Go setup verified

The presence of Go files in the project confirms that adding the Go setup to the workflow is necessary and appropriate. No further actions are required.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for Go files in the project
echo "Go files in the project:"
fd -e go

Length of output: 346


Script:

#!/bin/bash
# Correctly check for Go files in the project
echo "Go files in the project:"
fd -e go

Length of output: 375

Comment on lines +38 to 47
- name: Linting with Ruff
run: |
source ~/.venv/ci-venv/bin/activate
cd python && ruff $(git ls-files '*.py')
- name: Static analysis with mypy

- name: Static analysis with MyPy
run: |
source ~/.venv/ci-venv/bin/activate
mypy --version
mypy --config-file python/mypy.ini $(git ls-files '*.py')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improved linting and static analysis steps

The updates to the linting and static analysis steps are good improvements:

  1. Activating the virtual environment ensures the correct versions of tools are used.
  2. Using git ls-files '*.py' ensures only tracked Python files are processed.

However, there's a potential issue with word splitting in these commands, as flagged by the static analysis tool.

To address this, please update the commands as follows:

- cd python && ruff $(git ls-files '*.py')
+ cd python && git ls-files '*.py' | xargs ruff

- mypy --config-file python/mypy.ini $(git ls-files '*.py')
+ git ls-files '*.py' | xargs mypy --config-file python/mypy.ini

These changes use xargs to properly handle filenames that might contain spaces or special characters.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Linting with Ruff
run: |
source ~/.venv/ci-venv/bin/activate
cd python && ruff $(git ls-files '*.py')
- name: Static analysis with mypy
- name: Static analysis with MyPy
run: |
source ~/.venv/ci-venv/bin/activate
mypy --version
mypy --config-file python/mypy.ini $(git ls-files '*.py')
- name: Linting with Ruff
run: |
source ~/.venv/ci-venv/bin/activate
cd python && git ls-files '*.py' | xargs ruff
- name: Static analysis with MyPy
run: |
source ~/.venv/ci-venv/bin/activate
mypy --version
git ls-files '*.py' | xargs mypy --config-file python/mypy.ini
🧰 Tools
🪛 actionlint

39-39: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


39-39: shellcheck reported issue in this script: SC2046:warning:2:19: Quote this to prevent word splitting

(shellcheck)


44-44: shellcheck reported issue in this script: SC1090:warning:1:8: ShellCheck can't follow non-constant source. Use a directive to specify location

(shellcheck)


44-44: shellcheck reported issue in this script: SC2046:warning:3:36: Quote this to prevent word splitting

(shellcheck)

@akotlar akotlar merged commit 43bd860 into bystrogenomics:master Oct 14, 2024
2 checks passed
@akotlar akotlar deleted the feature/improve-docker-build-speed branch November 19, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant