Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: wrong terminal counts calculated during migration check #5400

Merged
merged 1 commit into from
Jan 20, 2025

Conversation

Sidddddarth
Copy link
Member

@Sidddddarth Sidddddarth commented Dec 27, 2024

Description

Wrong count calculated as part of migration check that happens every 30 seconds.
We fetch counts of terminal statuses in the status table, but since at startup we "cleanup" old jobs just by appending another aborted status irrespective of it's state, that above query can count more than one terminal status per job.

Understandably while actually migrating the jobs, we see that more jobs than we expect have been moved because we were expecting a lesser number.
This issue happened now because archival tables have a default retention of 24 hours. so on successive restarts, more and more statuses were being appended for the same job. And we expect the following expression number of jobs to be migrated:

numExpectedNumberOfMigratedJobs(e) = number of jobs(a) - number of terminal statuses in status table(b)

Due to the cleanup at startup even when a remains same, b increases based on the retention duration effectively decreasing e. And server panics when it actually migrated more jobs than e.
Now with this fix: we change b to number of jobIDs with terminal status in the status table and it's bound to remain the same even if we append more statuses for the same job.

Linear Ticket

Resolves PIPE-1827

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@Sidddddarth Sidddddarth changed the title fix: processing pickup race condition wrong terminal counts calculated during migration check Dec 27, 2024
@Sidddddarth Sidddddarth marked this pull request as ready for review December 27, 2024 11:16
@Sidddddarth Sidddddarth changed the title wrong terminal counts calculated during migration check fix: wrong terminal counts calculated during migration check Dec 27, 2024
Copy link

codecov bot commented Dec 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.84%. Comparing base (3258e9d) to head (5f1f0db).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5400      +/-   ##
==========================================
+ Coverage   74.74%   74.84%   +0.10%     
==========================================
  Files         440      440              
  Lines       61668    61668              
==========================================
+ Hits        46093    46155      +62     
+ Misses      13029    12968      -61     
+ Partials     2546     2545       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ktgowtham ktgowtham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the startup, what kind of clean up we do? and if for a job a terminal status already exists, why do we append more?

@ktgowtham ktgowtham requested a review from cisse21 December 30, 2024 11:03
@Sidddddarth
Copy link
Member Author

In the startup, what kind of clean up we do? and if for a job a terminal status already exists, why do we append more?

Just appends an aborted status without checking for the status already present.
like so:

INSERT INTO
	status_table (job_id, job_state, error_response)
SELECT
	job_id, 'aborted', '{"reason": "job max age exceeded"}'
FROM
	jobs_table
WHERE
	created_at <= $deadline;

@cisse21
Copy link
Member

cisse21 commented Jan 3, 2025

Lets change the base to master and release it in 1.40

@Sidddddarth Sidddddarth changed the base branch from release/1.39.x to master January 20, 2025 02:47
@Sidddddarth Sidddddarth force-pushed the fix.migrationNumTerminalCounts branch from d40ff5e to 5f1f0db Compare January 20, 2025 02:52
@Sidddddarth Sidddddarth requested a review from ktgowtham January 20, 2025 02:52
@Sidddddarth Sidddddarth enabled auto-merge (squash) January 20, 2025 05:43
@Sidddddarth Sidddddarth merged commit 2541b1c into master Jan 20, 2025
58 checks passed
@Sidddddarth Sidddddarth deleted the fix.migrationNumTerminalCounts branch January 20, 2025 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants