-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backup: Export tenant backup schedule metric #141156
base: master
Are you sure you want to change the base?
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Quick question: Is there a way I can test the exported metric before running it in staging? @msbutler |
pkg/backup/schedule_exec.go
Outdated
@@ -578,6 +584,12 @@ func init() { | |||
Measurement: "Jobs", | |||
Unit: metric.Unit_TIMESTAMP_SEC, | |||
}), | |||
RpoTenantMetric: NewExportedGaugeVec(metric.Metadata{ | |||
Name: "schedules.BACKUP.tenant-last-completed-time", | |||
Help: "The unix timestamp of the most recently completed backup by a tenant schedule specified as maintaining this metric", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a "tenant schedule" but rather a schedule for backing up a tenant. So the schedule still exists on the host cluster, but backs up the tenant.
pkg/backup/schedule_exec.go
Outdated
@@ -362,6 +363,11 @@ func (e *scheduledBackupExecutor) backupSucceeded( | |||
e.metrics.RpoMetric.Update(details.(jobspb.BackupDetails).EndTime.GoTime().Unix()) | |||
} | |||
|
|||
if details.(jobspb.BackupDetails).SpecificTenantIds != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the UpdatesLastBackupMetric
for determining the "special" schedule that will update the metric — should we do the same for these backup tenant schedules?
Thoughts @msbutler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it looks like the update for this metric should be in the UpdatesLastBackupmetric
branch above.
pkg/backup/schedule_exec.go
Outdated
@@ -362,6 +363,11 @@ func (e *scheduledBackupExecutor) backupSucceeded( | |||
e.metrics.RpoMetric.Update(details.(jobspb.BackupDetails).EndTime.GoTime().Unix()) | |||
} | |||
|
|||
if details.(jobspb.BackupDetails).SpecificTenantIds != nil { | |||
e.metrics.RpoTenantMetric.Update(map[string]string{"tenant_id": details.(jobspb.BackupDetails).SpecificTenantIds}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpecificTenantIDs
is a []string
, so you'll want to loop over them and emit a metric for each one.
@edwardguo-crl I'm not aware of a way to use a custom Cockroach binary for the clusters in your local CC deployment, and my gut tells me that we don't have a mechanism built for that. @benbardin any insights on this? |
@edwardguo-crl The commit should ideally be prefixed with the package it affects and indicated in the PR title as well. We also generally don't link to the CC ticket in this repo. nit: I think the PR description can be a bit more specific — there's no mechanism in place that makes the metric "cloud only". I'd be a bit more descriptive in what the metric actually represents as well. |
@edwardguo-crl thanks for opening this! A few general process nits:
|
@edwardguo-crl i would augment a test in backup_test.go that creates a backup of several tenants and verify the metrics were updated using some of the infra used in https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/metric_test.go |
1bf628b
to
c7c940a
Compare
c7c940a
to
4c39fda
Compare
4c39fda
to
87eeef8
Compare
87eeef8
to
039b0d1
Compare
039b0d1
to
510fd82
Compare
pkg/backup/schedule_exec.go
Outdated
@@ -578,6 +588,12 @@ func init() { | |||
Measurement: "Jobs", | |||
Unit: metric.Unit_TIMESTAMP_SEC, | |||
}), | |||
RpoTenantMetric: metric.NewExportedGaugeVec(metric.Metadata{ | |||
Name: "schedules.BACKUP.tenant-last-completed-time", | |||
Help: "The Unix timestamp of the most recently completed backup by host schedule for the tenant", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like "The unix timestamp of the most recently completed tenant backup by a host schedule specified as maintaining this metric" would be more fitting and similar to the existing description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like: "The unix timestamp of the most recently completed host scheduled backup by virtual cluster specified as maintaining this metric"
pkg/backup/schedule_exec_test.go
Outdated
RpoTenantMetric: metric.NewExportedGaugeVec(metric.Metadata{}, []string{"tenant_id"}), | ||
}, | ||
} | ||
env := scheduledjobs.ProdJobSchedulerEnv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like our existing tests use a test environment instead of the production scheduler environment. I'm not familiar with this part of the codebase though, so @msbutler would have a better idea on how to best do the testing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the env matters here. @edwardguo-crl this is an awesome test!
eb3a930
to
081f236
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edwardguo-crl awesome job with the test! this patch is nearly there.
pkg/backup/schedule_exec.go
Outdated
@@ -574,10 +581,16 @@ func init() { | |||
ExecutorPTSMetrics: &pm, | |||
RpoMetric: metric.NewGauge(metric.Metadata{ | |||
Name: "schedules.BACKUP.last-completed-time", | |||
Help: "The unix timestamp of the most recently completed backup by a schedule specified as maintaining this metric", | |||
Help: "The unix timestamp of the most recently completed tenant backup by a host schedule specified as maintaining this metric", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would revert this change. The metric is bumped by a schedule within the cluster, so "host" is misleading. We also typically refer to backups taken within a tenant as just a "backup". A "tenant backup" typically refers to a backup of a tenant.
pkg/backup/schedule_exec.go
Outdated
Measurement: "Jobs", | ||
Unit: metric.Unit_TIMESTAMP_SEC, | ||
}), | ||
RpoTenantMetric: metric.NewExportedGaugeVec(metric.Metadata{ | ||
Name: "schedules.BACKUP.tenant-last-completed-time", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we're trying to scrub all mentions of "tenant" in our codebase and docs. Use virtual cluster instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about this metric name: "schedules.BACKUP.last-completed-time-by-virtual_cluster"
pkg/backup/schedule_exec.go
Outdated
@@ -578,6 +588,12 @@ func init() { | |||
Measurement: "Jobs", | |||
Unit: metric.Unit_TIMESTAMP_SEC, | |||
}), | |||
RpoTenantMetric: metric.NewExportedGaugeVec(metric.Metadata{ | |||
Name: "schedules.BACKUP.tenant-last-completed-time", | |||
Help: "The Unix timestamp of the most recently completed backup by host schedule for the tenant", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like: "The unix timestamp of the most recently completed host scheduled backup by virtual cluster specified as maintaining this metric"
return tenantIDs | ||
} | ||
|
||
func verifyRPOTenantMetricLabels( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also pass in and verify the expected end time?
pkg/backup/schedule_exec_test.go
Outdated
RpoTenantMetric: metric.NewExportedGaugeVec(metric.Metadata{}, []string{"tenant_id"}), | ||
}, | ||
} | ||
env := scheduledjobs.ProdJobSchedulerEnv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the env matters here. @edwardguo-crl this is an awesome test!
err := executor.backupSucceeded(ctx, nil, schedule, details, env) | ||
require.NoError(t, err) | ||
|
||
expectedTenantIDs := []string{"system", "2"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: also verify that executor.metrics.RpoMetric.Value()
was also updated in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is a Value()
method in RpoMetric. I verified the Gauge metric value in a separate function.
081f236
to
7485eaf
Compare
bd86e18
to
e30ec71
Compare
Export Cloud tenant backup schedule completion metric with tenant id tag. Issue: cockroachdb#141167 Epic: None Release note: None
e30ec71
to
af4d934
Compare
Export Cloud tenant backup schedule completion metric with tenant id tag.
Issue: #141167
Epic: None
Release note: None