Inconsistent busy-waiting when using `kubernetes_manifest.wait` block #2614

alyssaruth · 2024-11-04T17:09:37Z

Affected Resource(s)

kubernetes_manifest

Terraform Configuration Files

We are using kubernetes_manifest to provision an imagecache CRD. The details of what this is aren't particularly relevant, except to say that the wait condition is expected to take on the order of minutes to complete:

resource "kubernetes_manifest" "image_cache" {
  manifest = {
    "apiVersion" = "kubefledged.io/v1alpha2"
    "kind"       = "ImageCache"
    "metadata" = {
      "name"      = "${var.name}-cache"
      "namespace" = var.namespace
      "labels"    = {
        "app": "kubefledged"
        "kubefledged": "imagecache"
      }
    }
    "spec" = {
      "cacheSpec" = [
        {
          "images" : var.images
        }
      ]
    }
  }

  dynamic "wait" {
    for_each = var.wait_for_completion ? [1] : []
    
    content {
      fields = {
        "status.completionTime" = "^\\S.*$" # Any non-empty value
      }
    }
  }
}

Panic Output

We are intermittently seeing this fail with errors communicating to the kubernetes API, e.g.

      │ Get 
      │ "https://<IP>:16443/apis/kubefledged.io/v1alpha2/namespaces/pkb/imagecaches/pubsub-emulator-cache":
      │ dial tcp <IP>:16443: connect: connection refused

We don't see these errors for other resource types. My theory is that it's being caused by excessive calls to the API.

Expected Behavior

Waiting should retry with a backoff pattern, the same way that e.g. kubernetes_deployment_v1 works.

Actual Behavior

The condition(s) are evaluated continuously every second until the resource times out or a hard failure condition is reached.

Important Factoids

References

See TODO in the code here: https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/manifest/provider/waiter.go#L227

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

also add a couple of tests to cover the other timeout cases. Fixes hashicorp#2614

alyssaruth added the bug label Nov 4, 2024

github-actions bot assigned jrhouston Nov 4, 2024

alyssaruth added a commit to alyssaruth/terraform-provider-kubernetes that referenced this issue Nov 4, 2024

use RetryContext to eliminate busy-waiting

2fd6436

also add a couple of tests to cover the other timeout cases. Fixes hashicorp#2614

alyssaruth linked a pull request Nov 4, 2024 that will close this issue

Use RetryContext to eliminate busy-waiting #2615

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent busy-waiting when using `kubernetes_manifest.wait` block #2614

Inconsistent busy-waiting when using `kubernetes_manifest.wait` block #2614

alyssaruth commented Nov 4, 2024 •

edited

Loading

Inconsistent busy-waiting when using kubernetes_manifest.wait block #2614

Inconsistent busy-waiting when using kubernetes_manifest.wait block #2614

Comments

alyssaruth commented Nov 4, 2024 • edited Loading

Affected Resource(s)

Terraform Configuration Files

Panic Output

Expected Behavior

Actual Behavior

Important Factoids

References

Community Note

Inconsistent busy-waiting when using `kubernetes_manifest.wait` block #2614

Inconsistent busy-waiting when using `kubernetes_manifest.wait` block #2614

alyssaruth commented Nov 4, 2024 •

edited

Loading