Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent busy-waiting when using kubernetes_manifest.wait block #2614

Open
alyssaruth opened this issue Nov 4, 2024 · 0 comments · May be fixed by #2615
Open

Inconsistent busy-waiting when using kubernetes_manifest.wait block #2614

alyssaruth opened this issue Nov 4, 2024 · 0 comments · May be fixed by #2615
Assignees
Labels

Comments

@alyssaruth
Copy link

alyssaruth commented Nov 4, 2024

Affected Resource(s)

  • kubernetes_manifest

Terraform Configuration Files

We are using kubernetes_manifest to provision an imagecache CRD. The details of what this is aren't particularly relevant, except to say that the wait condition is expected to take on the order of minutes to complete:

resource "kubernetes_manifest" "image_cache" {
  manifest = {
    "apiVersion" = "kubefledged.io/v1alpha2"
    "kind"       = "ImageCache"
    "metadata" = {
      "name"      = "${var.name}-cache"
      "namespace" = var.namespace
      "labels"    = {
        "app": "kubefledged"
        "kubefledged": "imagecache"
      }
    }
    "spec" = {
      "cacheSpec" = [
        {
          "images" : var.images
        }
      ]
    }
  }

  dynamic "wait" {
    for_each = var.wait_for_completion ? [1] : []
    
    content {
      fields = {
        "status.completionTime" = "^\\S.*$" # Any non-empty value
      }
    }
  }
}

Panic Output

We are intermittently seeing this fail with errors communicating to the kubernetes API, e.g.

      │ Get 
      │ "https://<IP>:16443/apis/kubefledged.io/v1alpha2/namespaces/pkb/imagecaches/pubsub-emulator-cache":
      │ dial tcp <IP>:16443: connect: connection refused

We don't see these errors for other resource types. My theory is that it's being caused by excessive calls to the API.

Expected Behavior

Waiting should retry with a backoff pattern, the same way that e.g. kubernetes_deployment_v1 works.

Actual Behavior

The condition(s) are evaluated continuously every second until the resource times out or a hard failure condition is reached.

Important Factoids

References

See TODO in the code here: https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/manifest/provider/waiter.go#L227

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@alyssaruth alyssaruth added the bug label Nov 4, 2024
alyssaruth added a commit to alyssaruth/terraform-provider-kubernetes that referenced this issue Nov 4, 2024
also add a couple of tests to cover the other timeout cases. Fixes hashicorp#2614
@alyssaruth alyssaruth linked a pull request Nov 4, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants