Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(bug): fixed a bug where processing step was over counting #28

Merged

Conversation

thoven87
Copy link
Contributor

@thoven87 thoven87 commented Sep 15, 2024

This PR is for the followings:

1 - fixed an issue where the queued meter would become negative during retries
2 - added number of workers to job metrics
processing count can give the same metric
3 - added discarded jobs metrics

With the most recent commit:

Once can get prometheus UI or Grafana to update queue in realtime with the following query

count(swift_jobs_meter{status="queued"} unless on(jobID) (swift_jobs_meter{status="processing"})) or vector(0)

let jobName = id.name
let id = try await self.queue.push(buffer, options: options)
Meter(label: "swift.jobs.meter", dimensions: [("status", "queued"), ("name", jobName)]).increment()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was causing issues when a push fails?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meter label is hard coded here, but not in JobQueueHandler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am moving the labels into an internal structure named JobMetricsHelper so they can be accessible in the swift jobs library.

@@ -27,6 +27,7 @@ final class JobQueueHandler<Queue: JobQueueDriver>: Sendable {
self.logger = logger
self.jobRegistry = .init()
self.options = options
Meter(label: "swift.jobs.worker.count").increment(by: Double(numWorkers))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this number decrement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when an app crashes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not decrement this gauge. We can use it for alerting purposes because once an app crashes or shutdown, after a while prometheus will return 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not decrement this gauge. We can use it for alerting purposes because once an app crashes or shutdown, after a while prometheus will return 0

Are you saying that prometheus will revert a Meter to zero if the connection from the app is lost due to a crash? What does that mean for all of the other Meters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have phrased the previous sentence better. Prometheus would not reset the meter to zero, there would be no data if one where to query for most recent update.

max_over_time(swift_jobs_worker_count{}[5m]))

To be safe, I will try to decrement during graceful shutdown. I can see the worker count might be misleading in a continuous deployment environment.

Copy link
Member

@adam-fowler adam-fowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change, but otherwise we are good.

Sources/Jobs/JobQueueHandler.swift Outdated Show resolved Hide resolved
@adam-fowler adam-fowler merged commit 21a3e86 into hummingbird-project:main Sep 17, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants