Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Remove job deletion for jobs with TTLSecondsAfterFinished set #2375

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tom1299
Copy link
Contributor

@tom1299 tom1299 commented Jan 14, 2025

Description

Currently jobs are directly deleted after they are completed by the trivy-operator without honouring the jobs ttlSecondsAfterFinished and thus also the helm configuration parameter operator.scanJobTTL.
The change introduced in this PR will only delete the job if it has no ttlSecondsAfterFinished.

Related issues

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

@tom1299 tom1299 changed the title Remove job deletion for jobs with TTLSecondsAfterFinished set fix: Remove job deletion for jobs with TTLSecondsAfterFinished set Jan 14, 2025
@github-actions github-actions bot added the bug label Jan 14, 2025
@@ -357,6 +357,10 @@ func (r *ScanJobController) completedContainers(ctx context.Context, scanJob *ba
}

func (r *ScanJobController) deleteJob(ctx context.Context, job *batchv1.Job) error {
if job.Spec.TTLSecondsAfterFinished != nil && *job.Spec.TTLSecondsAfterFinished != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through the docs but it wasn't clear to me why we should check for both !=nil and !=0. Could you explain that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually now that you mentioned it, it seems to me that the !=0 is superfluous / not really necessary. I would remove this from the if statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update your PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simar7 I'm sorry but somehow I totally forgot to do the update. I made the change and squashed the commits so that the commit history looks better.

@tom1299 tom1299 force-pushed the vulnerability-scanjob-is-immediately-deleted branch from 4a7e50a to 92f70f0 Compare January 21, 2025 07:27
@simar7
Copy link
Member

simar7 commented Jan 24, 2025

@afdesk can you take another look?

Copy link
Contributor

@afdesk afdesk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@tom1299 thanks for your contribution

@afdesk
Copy link
Contributor

afdesk commented Jan 28, 2025

@simar7 will merge it, right?

@simar7 simar7 self-requested a review January 29, 2025 01:40
@simar7
Copy link
Member

simar7 commented Jan 29, 2025

@tom1299 is it possible to write a test for this case? While it is a trivial change, it is an important one. Maybe we can use this test setup?

DescribeTable("On ttl reconcile loop",
func(report client.Object, reportFile string, cm client.Object, cmFile string) {
Expect(loadResource(report, path.Join(testdataResourceDir, reportFile))).Should(Succeed())
if report.GetNamespace() != kubeSystemNamespace {
report.SetNamespace(WorkloadNamespace)
}
if cm != nil {
Expect(loadResource(cm, path.Join(testdataResourceDir, cmFile))).Should(Succeed())
if cm.GetNamespace() != kubeSystemNamespace {
cm.SetNamespace(WorkloadNamespace)
}
Expect(k8sClient.Create(ctx, cm)).Should(Succeed())
}
Expect(k8sClient.Create(ctx, report)).Should(Succeed())
caLookupKey := client.ObjectKeyFromObject(report)
createdVulnerabilityReport := &v1alpha1.VulnerabilityReport{}
time.Sleep(2 * time.Second)
// We'll need to retry getting this newly created Job, given that creation may not immediately happen.
Eventually(func() error {
return k8sClient.Get(ctx, caLookupKey, createdVulnerabilityReport)
}, timeout, interval).ShouldNot(Succeed())
},
Entry("Should delete vulnerability report", &v1alpha1.VulnerabilityReport{}, "vulnerability-ttl.yaml", nil, ""),
Entry("Should delete config audit report", &v1alpha1.ConfigAuditReport{}, "config-audit-ttl-historical.yaml", nil, ""),
Entry("Should delete config audit report", &v1alpha1.ConfigAuditReport{}, "config-audit-ttl.yaml", &corev1.ConfigMap{}, "policy.yaml"),
)
WDYT?

@tom1299
Copy link
Contributor Author

tom1299 commented Jan 29, 2025

@simar7 Adding a test case like above is a very good idea. I will have a deeper look into the env tests and try to add a test case for the ttl jobs.

@tom1299
Copy link
Contributor Author

tom1299 commented Jan 31, 2025

@simar7 So I had a look at the env tests. And the main issue for adding a test case to the env tests is that the ScanJobController can not be easily added to the env test setup here. Since the ScanJobController contains the method to test, it will never be executed.
I also think that even if it would be possible to add the controller, the JobHasAnyCondition Predicate would prevent the Controller from processing the job since the Jobs created in the env tests do not seem to have a status.
Here is the predicate code:

var JobHasAnyCondition = predicate.NewPredicateFuncs(func(obj client.Object) bool {
	if job, ok := obj.(*batchv1.Job); ok {
		return len(job.Status.Conditions) > 0
	}
	return false
})

And it is registered with ScanJobController here:

func (r *ScanJobController) SetupWithManager(mgr ctrl.Manager) error {
	var predicates []predicate.Predicate
	if !r.ConfigData.VulnerabilityScanJobsInSameNamespace() {
		predicates = append(predicates, InNamespace(r.Config.Namespace))
	}
	predicates = append(predicates, ManagedByTrivyOperator, IsVulnerabilityReportScan, JobHasAnyCondition)
	return ctrl.NewControllerManagedBy(mgr).
		For(&batchv1.Job{}, builder.WithPredicates(predicates...)).
		Complete(r.reconcileJobs())
}

By the way, there seems to be a redundant implementation in the ScanJobController itself here:

if len(job.Status.Conditions) == 0 {
	log.V(1).Info("Ignoring Job without conditions")
	return ctrl.Result{}, nil
}

Which means that the when the Predicate would be removed, the ScanJobController would still not process the job. For me this might also be an issue that needs to be fixed.

So I think it would be rather hard to test the changes in the env tests. Or maybe you could give me another hint how to add a test ? Otherwise I can suggest looking into the integration tests. Maybe we could add a test case here ?

@tom1299
Copy link
Contributor Author

tom1299 commented Feb 2, 2025

As an alternative I added a integration test here 932e1a8. the tests validates that if ttl is set for scan jobs, they are not deleted immediately. What do you think about that ? Would that be a way to to the testing ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vulnerability ScanJob is immediately deleted
3 participants