Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with HAProxy as Kubernetes Ingress Controller consul annotations transparent-proxy-exclude-inbound-ports not working 1042/healthz #21993

Open
Roxyrob opened this issue Dec 10, 2024 · 3 comments

Comments

@Roxyrob
Copy link

Roxyrob commented Dec 10, 2024

Overview of the Issue

This is my flow:
Browser → [AWS NLB] → [haproxy-ingress service] → [Pod (with connect-inject)]

I set haproxy deployment pods annotations as below:

consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "1024,1042,8080,8443"

haproxy pods (container "kubernetes-ingress-controller") never become started and ready and I see this events:

Readiness probe failed: dial tcp x.x.x.x:20000: connect: connection refused
Started container kubernetes-ingress-controller
Startup probe failed: Get "http://x.x.x.x:20500/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Startup probe failed: HTTP probe failed with statuscode: 503

If I set annotation 'consul.hashicorp.com/transparent-proxy-overwrite-probes: "false"'' I can see the issue for real haproxy probles:

Readiness probe failed: dial tcp x.x.x.x:20000: connect: connection refused
Startup probe failed: Get "http://x.x.x.x:1042/healthz": read tcp y.y.y.y:55412->x.x.x.x:1042: read: connection reset by peer
Startup probe failed: Get "http://x.x.x.x:1042/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Startup probe failed: Get "http://x.x.x.x:1042/healthz": dial tcp x.x.x.x:1042: connect: connection refused

Pods can start only if I set transparent-proxy to false, but doing so haproxy ingress service cannot authenticate (ACLS + consul intentions) through transparent proxy and returns "502 bad gateway"

consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/transparent-proxy: "false"
consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "1024,1042,6060,8080,8443"

Reproduction Steps

  1. Install haproxy ingress controller 1.42.0 with these annotations for service LoadBalancer:
annotations:
  'service.beta.kubernetes.io/aws-load-balancer-type': 'external'
  'service.beta.kubernetes.io/aws-load-balancer-scheme': 'internet-facing'
  'service.beta.kubernetes.io/aws-load-balancer-nlb-target-type': 'ip'
  'service.beta.kubernetes.io/aws-load-balancer-target-group-attributes': 'preserve_client_ip.enabled=true'                                                                                            
  'service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled': true
  1. Enable Consul Connect using annotations shown above in haproxy ingress controller chart stanza:
     controller:
       podAnnotations:
         consul.hashicorp.com/connect-inject: "true"
         consul.hashicorp.com/transparent-proxy-exclude-inbound-ports: "1024,1042,8080,8443"
  1. See issues for pods (kubectl -n ... describe pod/..., kubectl -n ... logs pod/... -c ...)

Consul info for both Client and Server

Server info
agent:
[13/119248]
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease =
        revision = 920cc7c6
        version = 1.20.1
        version_metadata =
consul:
        acl = enabled
        bootstrap = false
        known_datacenters = 5
        leader = false
        leader_addr = x.x.x.x:8300
        server = true
raft:
        applied_index = 435869
        commit_index = 435869
        fsm_pending = 0
        last_contact = 23.476537ms
        last_log_index = 435869
        last_log_term = 51
        last_snapshot_index = 426059
        last_snapshot_term = 51
        latest_configuration = [{Suffrage:Voter ID:0f0e40cd-33ab-ea1e-f3f8-6b5f50f1ddfe Address:x.x.x.x:8300} {Suffrage:Voter ID:d44171e8-e0e2-6abb-95c3-01f2fc99a918 Address:y.y.y.y:8300} {
Suffrage:Voter ID:d7d3f7d8-a3b5-12ac-8f09-ba8413757bcb Address:z.z.z.z:8300}]
        latest_configuration_index = 0
        num_peers = 2
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 51
runtime:
        arch = amd64
        cpu_count = 2
        goroutines = 449
        max_procs = 2
        os = linux
        version = go1.22.7
serf_lan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 20
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1027
        members = 3
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = true
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 8448
        members = 15
        query_queue = 0
        query_time = 1

Consul version

helm chart "consul" from repoUrl "https://helm.releases.hashicorp.com"
targetRevision: 1.6.1 (5 Nov, 2024) => consul v1.20.1 (https://github.com/hashicorp/consul/releases/tag/v1.20.1)

Chart configuration

enabled: true
global:
  enabled: true
  datacenter: sbox0milavpc1dc1
  federation:
    enabled: true
    createFederationSecret: true
  gossipEncryption:
    autoGenerate: true
  acls:
    enabled: true
    manageSystemACLs: true
    createReplicationToken: true
  enablePodSecurityPolicies: false
  tls:
    enabled: true
    verify: true
    httpsOnly: true
server:
  enabled: true
  replicas: 3
  storageClass: ebs-csi-gp3-encrypt-retain
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Delete
  resources: |
    requests:
      memory: "200Mi"
      cpu: "100m"
    limits:
      memory: "500Mi"
      cpu: "500m"
  storage: 10Gi
  disruptionBudget:
    enabled: false
ui:
  enabled: true
  service:
    type: ClusterIP
connectInject:
  enabled: true
  default: false
  transparentProxy:
    defaultEnabled: true
    defaultOverwriteProbes: true
  cni:
    enabled: true
    logLevel: info
    cniBinDir: "/opt/cni/bin"
    cniNetDir: "/etc/cni/net.d"
  disruptionBudget:
    enabled: false
meshGateway:
  enabled: true
  replicas: 2
  service:
    type: LoadBalancer
    annotations: |
      'service.beta.kubernetes.io/aws-load-balancer-name': "consul-mgw-sbox0milavpc1dc1-pri"
      'service.beta.kubernetes.io/aws-load-balancer-type': "external"
      'service.beta.kubernetes.io/aws-load-balancer-scheme': "internal"
      'service.beta.kubernetes.io/aws-load-balancer-nlb-target-type': "ip"
      'service.beta.kubernetes.io/aws-load-balancer-backend-protocol': "tcp"
      'service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled': "true"

Operating system and Environment details

Kubernetes on AWS EKS

Log Fragments

@Roxyrob
Copy link
Author

Roxyrob commented Jan 17, 2025

Any news on this issue ?

@apenadiazApk
Copy link

Hi, even us have the exact issue.

@eMedves
Copy link

eMedves commented Jan 17, 2025

I experienced the same issue, after 1 week of analysis and attempts I was unable to find a solution.
Can anyone give any insight?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants