Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce GitHub Actions unit test durations by removing tox and using test parallelization #1345

Merged
merged 14 commits into from
Nov 3, 2024

Conversation

jayaddison
Copy link
Collaborator

I removed unittest-parallel from our test suite a couple of years ago in #660, under the justification that it introduced unnecessarily complexity.

However: to support a potential increase in the number of pull requests we run tests for, and based on noticing an increase in the number of continuous integration machine-minutes spent, I'd like to see if we can reduce the duration of our GitHub Actions unit test workflows, and believe that unittest-parallel may be an effective way to do that. So here it is again, although only as a dependency during the relevant workflow.

Depends-upon / includes:

@jayaddison
Copy link
Collaborator Author

Note: removal of tox also contributes to the reduction in test runtime here; I introduced it in #650 -- feeling that it provided useful standardisation and isolation of tests.

@jayaddison jayaddison changed the title Proof-of-concept: reduce GitHub Actions unit test durations using test parallelization Proof-of-concept: reduce GitHub Actions unit test durations by removing tox and using test parallelization Oct 25, 2024
@jknndy
Copy link
Collaborator

jknndy commented Oct 25, 2024

This seems like a great improvement, running the parallel tests on my local machine cut runtime by over 35 seconds. I wonder if it would be worth including information about this alternative testing method in the readme if it gets merged.

@jayaddison
Copy link
Collaborator Author

I'm not sure how to decide about that, to be honest. It's good that the library should (and does, afaik) be testable with a super-simple, standard workflow (create a venv, source that, pip install, run unittest) -- but also it's certainly nice to save people time and maybe some compute resource (although running stuff in parallel, arguably doesn't always save CPU/memory.. sometimes it only compresses it, and could even add a tiny amount of overhead).

@jknndy
Copy link
Collaborator

jknndy commented Nov 3, 2024

It's good that the library should (and does, afaik) be testable with a super-simple, standard workflow

We could include something in the existing docs/how-to-develop-scraper.md below the current documentation regarding testing.

**Check that everything is working by running the tests**
~```shell
$ python -m unittest
~```
This will run all the tests for all the scrapers. You should not see any errors or failures

**OPTIONAL: To run the full test suite in parallel**
~```shell
$ pip install unittest-parallel
$ unittest-parallel --level test
~```

Copy link
Collaborator

@jknndy jknndy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvements all around!

Once #1343 is merged this will be a great addition.

@jayaddison
Copy link
Collaborator Author

Thanks again - and agreed with the idea of adding the parallel-tests mode to the dev documentation. Merging this soon.

@jayaddison jayaddison changed the title Proof-of-concept: reduce GitHub Actions unit test durations by removing tox and using test parallelization Reduce GitHub Actions unit test durations by removing tox and using test parallelization Nov 3, 2024
@jayaddison
Copy link
Collaborator Author

(ah, ok; some refactoring of the documentation required first, by the looks of it. I'll spend some time on that during the week)

@jayaddison
Copy link
Collaborator Author

(ah, ok; some refactoring of the documentation required first, by the looks of it. I'll spend some time on that during the week)

In fact... on second thoughts; let's merge the test-parallelism for continuous integration first, because that affects open pull requests and merges. I'll include the documentation change during the week.

@jayaddison jayaddison merged commit 4663b33 into main Nov 3, 2024
18 checks passed
@jayaddison jayaddison deleted the ci/reduced-unittest-duration-poc branch November 3, 2024 21:13
jayaddison added a commit that referenced this pull request Nov 3, 2024
… using test parallelization (#1345)"

This reverts commit 4663b33.

This appears to be incompatible with `v14` legacy tests; more investigation required.
@jayaddison
Copy link
Collaborator Author

Note: this change appears to have been incompatible with the v14 branch; that isn't something I had considered could be possible before backporting it there, but it is.

It seems that this legacy test is the problem there:

cls.harvester_class = cls.scraper_class(url=start_url)

I'll investigate that soon. In some ways it appears that it could be similar to #1342 (per-class test setup).

jayaddison added a commit that referenced this pull request Nov 3, 2024
…d using test parallelization (#1345)"

This reverts commit f05e05e.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants