-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add note about naming files in parallel jobs #8941
base: master
Are you sure you want to change the base?
Conversation
Hey @nokite! Thank you for the PR. It would be great to get a better understanding of what you want to get working here, any chance you can share more? Parallelism is generally used for splitting up work across execution environments, rather than running the same work multiple times, generally not writing to the same files from the various parallel running jobs. So, it would be useful to understand more to get this addition to the docs right, or potentially offer a different way to achieve what's needed. |
Absolutely, I'll try to explain my goals. Thanks for replying! I use parallelism in order to build and deploy a product in multiple variants in reasonable time. By passing $CIRCLE_NODE_INDEX, I tell it which set of variants to build in each parallel run/instance. Each parallel run builds a couple of variants (just enough to stay below the 3h runtime limit). I would be happy to achieve this in any way. My assumption was that a good way would be to:
So the issue I ran into is that you can't save these unique logs in the workspace, as the filename has to be hardcoded - so it can't vary between parallel runs (by using the index, for instance). |
Hey @nokite, thank you! Sorry if I'm wrong here but it sounds like you might be able to simplify things. In CircleCI, parallelism as a feature (configuring a number of parallel execution environments and telling CircleCI how to split work across them) is generally reserved for splitting a test suite. What you describe sounds to me like you want concurrent jobs in a workflow, so you can configure your sets of variants to build in separate jobs and then create a workflow where those jobs run concurrently: https://circleci.com/docs/concurrency/#concurrency-in-workflows. Then you would be able to see in the UI/wherever which failed/built/deployed for each job. Basically "parallel" and "concurrent" can mean largely the same thing but in CircleCI parallelism is a specific testing-focussed feature, whereas concurrency is about running jobs, doing work at the same time across multiple execution environments. Please let me know if I misunderstood and oversimplified things here! |
@rosieyohannan thanks, I think we're on the same page. Technically, I believe my usage of parallelism fits its purpose. Regarding what you suggested - I agree it's totally valid, and that's how I started. It bugged me that the only difference between the separate calls was an index (which I passed as a parameter). The job itself was still defined only once - as there was no reason to duplicate it. Then I figured out that what I was doing was basically parallelism - same type of work called with an index. So I refactored the config and used parallelism, which felt right (and awesome 🙂). It reduced the number of repetitive lines dramatically, and I liked the different way it was shown in the CircleCI UI. I could see everything at a glance (fitting on a single screen), and could easily switch between runs. It ended up being a long comment unfortunately, but I hope I managed to illustrate why I see parallelism as the right tool for the job. Let me know! |
Hi @nokite, Thanks for reporting this, the confusion I think comes from what info is available at which points in the process. E.g. env-vars aren't available for interpolation when we're processing config. I think we could make the different phases and what's available in each phase clearer in the docs. To get on to your root issue, there's a few different options available that might get you unblocked:
(edited to fix config snippet) |
@gordonsyme Thanks for your suggestions, I appreciate your time. P.S. I can't comment on artifacts, I would have to try it out with my setup. |
@nokite awesome, glad you're sorted :)
That'll teach me to write out config off the top of my head 😅, I'll edit my first reply so the correct form is out there for anyone else who comes across it. |
As for the PR, I'm OK if you close it and handle the update on your side, as you have a better understanding. I can suggest the following:
|
Description
Added a note about naming files in
persist_to_workspace
during parallel jobs - names are fixed, you cannot use parallelism environment variables to make the names of files unique and identifiable.Proven by this error message:
Reasons
This behavior is not explained, so one has to figure it out by trial and error.
Furthermore, there does not seem to be a way to achieve the goal at all - save files from multiple parallel runs of a job, and give them unique names that don't conflict with each other).
Furthermore, the error message could be improved to help figure out the problem. Currently it says
The specified paths did not match any files in /tmp/deploy_logs
- it doesn't mention what paths were used: was it resolved todeploy_logs_0
or is it literallydeploy_logs_$CIRCLE_NODE_INDEX
Content Checklist
Please follow our style when contributing to CircleCI docs. Our style guide is here: https://circleci.com/docs/style/style-guide-overview.
Please take a moment to check through the following items when submitting your PR (this is just a guide so will not be relevant for all PRs) 😸: