Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Don't run integration tests in PRs #4768

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

kratman
Copy link
Contributor

@kratman kratman commented Jan 16, 2025

Description

This change would remove the integration tests from PRs, but leave them running in the scheduled nightly test runs.
While not done here, I think this is part of #3662, where we should have alerts if integration tests fail in scheduled tests, but not run them in each PR.

Reasoning:

  • Integration tests run for ~15 minutes per test job (240 total CI minutes per push)
  • I have rarely seen integration tests fail
  • Integration tests don't count towards coverage
  • Some of the integration tests do not even assert anything, so they are only looking for crashes

Type of change

  • Optimization (back-end change that speeds up the code)

Key checklist:

  • No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
  • All tests pass: $ python -m pytest (or $ nox -s tests)
  • The documentation builds: $ python -m pytest --doctest-plus src (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ nox -s quick.

Further checks:

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@kratman
Copy link
Contributor Author

kratman commented Jan 16, 2025

I am interested in hearing the thoughts of the other @pybamm-team/maintainers on this topic. This is really a question of how much value the much longer tests are adding. I know that major interface changes will cause them to fail

@kratman kratman marked this pull request as draft January 16, 2025 17:00
@kratman
Copy link
Contributor Author

kratman commented Jan 16, 2025

Original runtime for some tests:
image

With this change:
image

@valentinsulzer
Copy link
Member

I've experienced a couple of cases in the past where the integration tests caught some edge cases that weren't caught by the unit tests, then required fairly significant effort to fix. So the risk would be that such changes get through then require major effort from maintainers to fix

@kratman
Copy link
Contributor Author

kratman commented Jan 16, 2025

I've experienced a couple of cases in the past where the integration tests caught some edge cases that weren't caught by the unit tests, then required fairly significant effort to fix. So the risk would be that such changes get through then require major effort from maintainers to fix

Yeah that is why I think it is also important to have it create issues on failures. So anything that does get through is caught quickly.

The other potential improvement could just be disabling them on draft PRs, but that still requires some discipline from devs on marking PRs as drafts.

None of the options are perfect, but our CI does run for an extremely long time. I basically setup this PR for discussion, because I needed the timings

@kratman kratman changed the title Don't run integration tests in PRs Discussion: Don't run integration tests in PRs Jan 16, 2025
@aabills
Copy link
Contributor

aabills commented Jan 16, 2025

I'd expect a whole lot of shape errors on develop if this were implemented because the unit tests don't test setting parameters etc. for various model combinations. Those are typically not easy to debug.

Copy link

codecov bot commented Jan 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.66%. Comparing base (1ac422a) to head (dd7c0d9).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4768   +/-   ##
========================================
  Coverage    98.66%   98.66%           
========================================
  Files          303      303           
  Lines        23227    23227           
========================================
  Hits         22916    22916           
  Misses         311      311           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aabills
Copy link
Contributor

aabills commented Jan 18, 2025

Just a thought -- while I think we should run integration on pr's, we probably don't need to run the full matrix. I think most of the os and python version specific stuff will likely get caught in unit tests.

@Saransh-cpp
Copy link
Member

Saransh-cpp commented Jan 18, 2025

Adding onto @aabills' comment, we should only run the tests (ideally both unit and integration) on ubuntu (with all the python versions as breaking deprecations are often introduced) because pybamm has pure python wheels now (platform independent). The platform dependent things should be/are being tested in pybammsolvers' CI.

@agriyakhetarpal
Copy link
Member

I would propose not reducing to testing only on Ubuntu as of now because there can still be platform-specific issues that can lurk outside of pybammsolvers (even if it's something as simple as differences in character encodings across platforms, for example, which we have come across before, or with issues coming from our dependencies most of which are not pure Python, such as CasADi/SciPy/NumPy).

However, in my opinion, not running integration tests on PRs sounds good to me if we keep these points in mind:

  • that they should run as a part of the nightly tests, and an issue is opened if they fail (as suggested above)
  • that there should be a way for users to still run them on PRs if they want – this is quite easy to add with GHA in just ~3 lines of code

Most of our PRs might not bring failures in the integration tests, but the second point would ensure anyone can trigger them by adding [integration] in the commit message if they feel their changes can break things. Another option is to use labels, such as "run integration tests," which a workflow can respond to (downside being that only collaborators can add those labels).

For point 1, I've used https://github.com/xarray-contrib/issue-from-pytest-log to create issues from failing CI jobs in other projects – we can also use this here.

@agriyakhetarpal
Copy link
Member

Or, we could request NumFOCUS to pay for larger runners 😛 They have some budgets allocated for this: https://github.com/numfocus/infrastructure/

@kratman
Copy link
Contributor Author

kratman commented Jan 18, 2025

I think limiting the test matrix on integration tests sounds like a good compromise.

I know there are issues with our dependencies on windows. I was running into some of the while trying to upgrade to python 3.13 and numpy 2.0 in the fall. So I think testing on all OSes is still a good idea for at least the unit tests

@valentinsulzer
Copy link
Member

Switching to IDAKLU as solver should also help speed up a lot of the integration tests

@brosaplanella
Copy link
Member

I also have found some cases where integration tests caught some bugs, so I find them useful. Also from my experience, I think most of the time, when an integration tests fails, it fails across platforms, so reducing the matrix sounds like a good option for me. Removing them from draft PRs also sounds good, even if only half of the PRs are marked as drafts, it significantly reduces the overall computational time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants