Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Check archived jobs for last known job number before creating cluster
We ran into a bug where we deleted a job cluster and then recreated the job cluster with the same name. The old job cluster had 4 stages and the new one had two. When a job was completed, it would write to the archived tables. However, there already existed a job cluster there with the same ID. The KV provider only overwrote the rows for stages 1 and 2. It did not delete the values for stages 3 and 4. When Mantis tried to load the archived job, it would see job metadata indicating 2 stages, but then would receive 4 stages (two from the new job and 4 from the old job). This would lead to the Mantis not loading the job. We could probably consider this a bug in the Dynamo KV Provider, _but_ it felt like we don't want to overwrite archived jobs in any scenario since we'd like to maintain a record of those jobs. Instead, the problem is further upstream. When we create a job, we should be reasonably confident that the Job ID is globally unique. However, when creating a job cluster, the `lastJobCount` value is always set to 0. We should instead check if there are any archived jobs with the same cluster name. If so, we should grab the last value and set that as the last known job number. We desire the following scenario 1. Create a job cluster "MyJob" 2. Create a job "MyJob-1" 3. Delete the job and job cluster 4. Create another job cluster MyJob 5. Create a job "MyJob-2" instead of "MyJob-1" Previously, we would have an archived job "MyJob-1" and an active job "MyJob-1" that are distinct. Stopping the active one would overwrite the archived one.
- Loading branch information