Unable to create k8s service account for Workload Identity Federation on a GKE private cluster #2533

diguida · 2024-06-24T11:09:07Z

Terraform version, Kubernetes provider version and Kubernetes version

Terraform version: 1.8.5
Kubernetes Provider version: 2.31.0
Google Cloud Provider version: 5.34.0
Kubernetes version: 1.28.9-gke.1209000

Terraform configuration

resource "google_container_cluster" "my-cluster" {
  project            = var.GCP_PROJECT_ID
  name               = "my-cluster"
  location           = "europe-west8-a"
  # We can't create a cluster with no node pool defined, but we want to only use
  # separately managed node pools. So we create the smallest possible default
  # node pool and immediately delete it.
  remove_default_node_pool = true
  initial_node_count = 1
  network            = google_compute_network.network.name
  subnetwork         = google_compute_subnetwork.network_subnet.name
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = true
    master_ipv4_cidr_block  = "172.16.0.32/28"
  }
  ip_allocation_policy {
  }
  master_authorized_networks_config {
  }
  workload_identity_config {
    workload_pool = "${var.GCP_PROJECT_ID}.svc.id.goog"
  }
  logging_config {
    enable_components = [
      "SYSTEM_COMPONENTS",
      "APISERVER",
      "WORKLOADS"
    ]
  }
}

resource "google_container_node_pool" "my-nodes" {
  name       = "my-node-pool"
  location   = "europe-west8-a"
  cluster    = google_container_cluster.my-cluster.name
  node_count = 1

  node_config {
    preemptible  = true
    machine_type = "e2-standard-4"

    service_account = google_service_account.gke-service-account.email
    oauth_scopes = [
      "cloud-platform"
    ]

    shielded_instance_config {
      enable_secure_boot = true
    }

  }

}

module "my-workload-identity" {
  source     = "terraform-google-modules/kubernetes-engine/google//modules/workload-identity"
  name       = "my-identity"
  namespace  = "default"
  project_id = var.GCP_PROJECT_ID
  roles      = [
    "roles/logging.logWriter",
    "roles/cloudsql.client",
    "roles/artifactregistry.reader"
  ]
}

data "google_client_config" "current" {}

provider "kubernetes" {
  host                   = "https://${google_container_cluster.my-cluster.endpoint}"
  token                  = data.google_client_config.current.access_token
  cluster_ca_certificate = base64decode(google_container_cluster.my-cluster.master_auth.0.cluster_ca_certificate)
}

Question

Apologies if it is a double posting.
I am trying to configure a worload identity federation on a private GKE cluster using the code snippet above, which follows the documentation and the guidelines in https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/using_gke_with_terraform

The resources are deployed by a pipeline in a GitLab k8s runner hosted in GCP, but on a different project.

image:
  name: hashicorp/terraform:1.8.5
  entrypoint:
    - "/usr/bin/env"
    - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

before_script:
  - pwd
  - mkdir .gcp
  - echo $GCP_SERVICE_ACCOUNT > .gcp/credentials.json
  - export GOOGLE_APPLICATION_CREDENTIALS=".gcp/credentials.json"
  - rm -rf .terraform
  - terraform --version
  - terraform init

# ...

apply:
  stage: apply
  script:
    - export TF_LOG=DEBUG
    - terraform apply -input=false -auto-approve "planfile"
  dependencies:
    - plan
  only:
    - main
  needs:  
    - plan
  when: manual

after_script:
- rm .gcp/credentials.json

The GKE cluster was created smoothly.
Unfortunately, if I add the workload identity definition, the apply fails with this error:

module.my-workload-identity.kubernetes_service_account.main[0]: Still creating... [10s elapsed]
module.my-workload-identity.kubernetes_service_account.main[0]: Still creating... [20s elapsed]
module.my-workload-identity.kubernetes_service_account.main[0]: Still creating... [30s elapsed]
2024-06-22T12:27:37.333Z [ERROR] provider.terraform-provider-kubernetes_v2.31.0_x5: Response contains error diagnostic: @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:58 tf_proto_version=5.6 tf_provider_addr=registry.terraform.io/hashicorp/kubernetes tf_rpc=ApplyResourceChange @module=sdk.proto diagnostic_detail="" diagnostic_severity=ERROR diagnostic_summary="Post \"https://172.16.0.34/api/v1/namespaces/default/serviceaccounts\": context deadline exceeded" tf_req_id=9312e024-3cff-3a97-8799-9a54659b9c57 tf_resource_type=kubernetes_service_account timestamp=2024-06-22T12:27:37.333Z
2024-06-22T12:27:37.335Z [DEBUG] states/remote: state read serial is: 94; serial is: 94
2024-06-22T12:27:37.335Z [DEBUG] states/remote: state read lineage is: 1ee3af85-9da7-164a-413f-1b485a9fbda7; lineage is: 1ee3af85-9da7-164a-413f-1b485a9fbda7
2024-06-22T12:27:37.583Z [ERROR] vertex "module.my-workload-identity.kubernetes_service_account.main[0]" error: Post "https://172.16.0.34/api/v1/namespaces/default/serviceaccounts": context deadline exceeded
2024-06-22T12:27:37.584Z [DEBUG] states/remote: state read serial is: 95; serial is: 95
2024-06-22T12:27:37.584Z [DEBUG] states/remote: state read lineage is: 1ee3af85-9da7-164a-413f-1b485a9fbda7; lineage is: 1ee3af85-9da7-164a-413f-1b485a9fbda7
╷
│ Error: Post "https://172.16.0.34/api/v1/namespaces/default/serviceaccounts": context deadline exceeded
│ 
│   with module.my-workload-identity.kubernetes_service_account.main[0],
│   on .terraform/modules/my-workload-identity/modules/workload-identity/main.tf line 51, in resource "kubernetes_service_account" "main":
│   51: resource "kubernetes_service_account" "main" {
│ 
╵
2024-06-22T12:27:37.787Z [DEBUG] provider.terraform-provider-google_v5.34.0_x5: 2024/06/22 12:27:37 [DEBUG] [transport] [server-transport 0xc0003fdc80] Closing: Server.Stop called 
2024-06-22T12:27:37.788Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2024-06-22T12:27:37.794Z [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"

The cluster endpoint looks correct.

In the k8s API server logs, I cannot see any request coming from the terraform process.

Can you please help me understanding the issue, or redirect me to some other info channel? I am stuck on it since a few days.

Thanks in advance.

The text was updated successfully, but these errors were encountered:

sheneska · 2024-06-26T15:26:49Z

Hi @diguida, thanks for opening this issue. Could you try to apply this separately please?

diguida · 2024-06-26T18:25:46Z

Hi @sheneska, thanks for looking into this.
It is not clear to me what you are asking me with

try to apply this separately.

Should I run the apply command in a Compute Engine instance or on my laptop instead of the runner?

Thanks.

bwburch1023 · 2024-08-16T20:35:37Z

@diguida Just ran across the exact same issue i was able to get it to work by adding 0.0.0.0/0 to master authorized networks as a test, wouldn't recommend doing this. You can check the k8s api server log and see what IP is being used in the request. I'm trying to get the cidr block from Hashi since we are using Terraform cloud

diguida added the question label Jun 24, 2024

github-actions bot assigned sheneska Jun 24, 2024

sheneska added the waiting-response label Jun 26, 2024

github-actions bot removed the waiting-response label Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to create k8s service account for Workload Identity Federation on a GKE private cluster #2533

Unable to create k8s service account for Workload Identity Federation on a GKE private cluster #2533

diguida commented Jun 24, 2024 •

edited

Loading

sheneska commented Jun 26, 2024

diguida commented Jun 26, 2024

bwburch1023 commented Aug 16, 2024 •

edited

Loading

Unable to create k8s service account for Workload Identity Federation on a GKE private cluster #2533

Unable to create k8s service account for Workload Identity Federation on a GKE private cluster #2533

Comments

diguida commented Jun 24, 2024 • edited Loading

Terraform version, Kubernetes provider version and Kubernetes version

Terraform configuration

Question

sheneska commented Jun 26, 2024

diguida commented Jun 26, 2024

bwburch1023 commented Aug 16, 2024 • edited Loading

diguida commented Jun 24, 2024 •

edited

Loading

bwburch1023 commented Aug 16, 2024 •

edited

Loading