Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Thanos Store] Guidance on best way to scale sharded and autoscaled Thanos Store with persistence enabled #7797

Open
kaiohenricunha opened this issue Oct 5, 2024 · 2 comments

Comments

@kaiohenricunha
Copy link

Thanos, Prometheus and Golang version used:

Thanos version: 0.34.1
Prometheus version: 2.45.0(I guess)
Golang version: 1.21.1

Object Storage Provider:

AWS S3

What happened:

I am working on a Thanos setup that includes sharded and autoscaled Thanos Store StatefulSets with persistent volumes for filesystem storage. My goal is to achieve the maximum performance and maximum availability possible.

The current architecture aims to have Thanos Store pods be able to be scheduled on any node across the cluster without needing to worry if the required PersistentVolumeClaim (PVC) is available on a specific node or AZ. This is crucial as I want to maintain flexibility and avoid any constraints regarding node or AZ affinity for the PVCs.

I have already enabled Thanos Query Frontend(in-memory) and Store(elasticache redis) caching, but my setup still struggles with sudden spikes.

What you expected to happen:

I am looking for guidance on the recommended Thanos Store configuration for such a setup that provides:

  1. High Performance: Optimized for fast responses.
  2. High Availability: Allowing the Thanos Store StatefulSet pods to be scheduled on any node or AZ without PVC availability issues.

How to reproduce it (as minimally and precisely as possible):

  1. Deploy a Thanos Store StatefulSet that utilizes PVCs for filesystem storage: https://github.com/bitnami/charts/blob/thanos/13.4.1/bitnami/thanos/values.yaml#L2939
  2. Set up autoscaling: https://github.com/bitnami/charts/blob/thanos/13.4.1/bitnami/thanos/values.yaml#L2987
  3. Set up sharding: https://github.com/bitnami/charts/blob/thanos/13.4.1/bitnami/thanos/values.yaml#L3217
  4. Generate some load on the Thanos Store by querying large long-term metrics.

Anything else we need to know:

Environment:

  • Object Storage: AWS S3
  • Kubernetes Cluster: EKS 1.29
  • Storage Solution: tried EBS(scheduling issues) and EFS(performance issues)
  • Deploy mode: bitnami helm chart version 13.4.1

Recently, I tried using Amazon EFS for its high availability, as it allows access from any node/AZ, but Thanos pods with persistence enabled took way too long to start up. I went back to EBS GP3 after noticing Thanos Store had been "fetching metadata" at startup for more than 30 minutes, all while consuming little CPU and memory. On EFS console, I noticed throughput was close to 100%.

I would greatly appreciate guidance on:

  1. The best practice for scaling Thanos Store besides S3.
  2. Any recommended StorageClass configurations that might fulfill the requirements of both high availability and high performance.

Thank you very much for your help!

Copy link

dosubot bot commented Oct 5, 2024

Olá @kaiohenricunha! Como posso ajudar você hoje?

I couldn't find specific guidance on configuring a custom StorageClass for high performance and high availability in a Thanos Store setup within the repository. However, here are some general recommendations that might help:

  1. StorageClass Configuration: Use a StorageClass with volumeBindingMode: WaitForFirstConsumer. This setting allows PVCs to be provisioned based on pod scheduling, which can help with dynamic scheduling across nodes and AZs.

  2. High-Performance Storage Options: Consider using Amazon EBS with provisioned IOPS (io1 or io2) for better performance. These types of volumes offer high throughput and low latency, which can be beneficial for your use case.

  3. Alternative Storage Solutions: Since you experienced issues with EFS, you might want to explore other cloud-native storage solutions like Amazon FSx for Lustre, which is designed for high-performance workloads.

  4. Caching: Ensure that your caching layers (Thanos Query Frontend and Store) are optimized. Adjust cache sizes and settings to handle your expected query load effectively.

  5. Autoscaling and Sharding: Make sure your autoscaling and sharding configurations are optimized. This includes setting appropriate resource requests and limits to handle load spikes and distributing the load evenly across Thanos Store instances.

These adjustments should help you achieve the desired high performance and high availability for your Thanos Store setup.

To continue talking to Dosu, mention @dosu.

@kaiohenricunha kaiohenricunha changed the title [Thanos Store] Guidance on Best StorageClass for Sharded and Autoscaled StatefulSets with Persistent Volumes [Thanos Store] Guidance on best way to scale sharded and autoscaled Thanos Store with persistence enabled Oct 7, 2024
@kaiohenricunha
Copy link
Author

Turns out I was having really slow responses from Store because of this issue.

A bug in Store caching. Changing bucket caching from redis to in-memory resolved this temporarily, but guidance on how to optimize Store is still welcome.

I'm having now latency around ~15seconds for SLIs dashboards.

Used this Store configuration:

    storegateway:
      enabled: "${THANOS_STOREGATEWAY_ENABLED:=true}"
      logLevel: "${THANOS_STOREGATEWAY_LOG_LEVEL:=info}"
      extraFlags:
        - --block-meta-fetch-concurrency=${THANOS_STOREGATEWAY_BLOCK_META_FETCH_CONCURRENCY:=100}         # Higher concurrency for metadata fetching
        - --block-sync-concurrency=${THANOS_STOREGATEWAY_BLOCK_SYNC_CONCURRENCY:=100}                      # Increase sync concurrency for better parallelism
        - ${THANOS_STOREGANTEWAY_CACHE_INDEX_HEADER:=--cache-index-header}                                # Cache index headers
        - --store.index-header-lazy-download-strategy=${THANOS_STOREGATEWAY_INDEX_HEADER_LAZY_DOWNLOAD_STRATEGY:=eager}  # Eagerly download index headers
        - --store.grpc.series-max-concurrency=${THANOS_STOREGATEWAY_GRPC_SERIES_MAX_CONCURRENCY:=100}      # Increase gRPC concurrency for better parallel queries
        - --sync-block-duration=${THANOS_STOREGATEWAY_SYNC_BLOCK_DURATION:=5m}                            # Sync blocks every minute
      autoscaling:
        enabled: "${THANOS_STOREGATEWAY_AUTOSCALING_ENABLED:=true}"
        minReplicas: "${THANOS_STOREGATEWAY_MIN_REPLICAS:=1}"
        maxReplicas: "${THANOS_STOREGATEWAY_MAX_REPLICAS:=3}"
        targetCPU: "${THANOS_STOREGATEWAY_TARGET_CPU:=200}"
        targetMemory: "${THANOS_STOREGATEWAY_TARGET_MEMORY:=100}"
      resources:
        requests:
          cpu: "${THANOS_STOREGATEWAY_CPU_REQUESTS:=2}"
          memory: "${THANOS_STOREGATEWAY_MEMORY_REQUESTS:=8Gi}"
        limits:
          memory: "${THANOS_STOREGATEWAY_MEMORY_LIMITS:=10Gi}"
      persistence:
        enabled: "${THANOS_STOREGATEWAY_PERSISTENCE_ENABLED:=true}"
        size: "${THANOS_STOREGATEWAY_PERSISTENCE_SIZE:=30Gi}"
        accessModes:
          - "${THANOS_STOREGATEWAY_PERSISTENCE_ACCESS_MODE:=ReadWriteOnce}"
      sharded:
        enabled: "${THANOS_STOREGATEWAY_SHARDED_ENABLED:=true}"
        # Partitioning by time: https://thanos.io/tip/components/store.md/#time-based-partitioning
        # Explanation about sharding: https://github.com/bitnami/charts/issues/4417
        timePartitioning: # each partition becomes a unique statefulset
          # One store for data older than 8 weeks
          - min: ""
            max: -8w
          # One store for data between 4 and 8 weeks
          - min: -8w
            max: -4w
          # One store for data between 3 and 4 weeks
          - min: -4w
            max: -3w
          # One store for data between 2 and 3 weeks
          - min: -3w
            max: -2w
          # One store for data between 1 and 2 weeks
          - min: -2w
            max: -1w
          # One store for data newer than 1 weeks
          - min: -1w
            max: ""
      # Service account disabled, created by the IRSA operator
      serviceAccount:
        create: false
        name: thanos
      service:
        additionalHeadless: true # https://github.com/bitnami/charts/blob/main/bitnami/thanos/templates/query/deployment.yaml#L137-L141

Along with this caching configuration:

indexCacheConfig: |
  type: REDIS
  config:
    addr: XXX
    db: 0
    password: XXX
    dial_timeout: 5s
    read_timeout: 3s
    write_timeout: 3s
    max_get_multi_concurrency: 50
    get_multi_batch_size: 50
    max_set_multi_concurrency: 50
    set_multi_batch_size: 50
    cache_size: 2GB
    max_async_buffer_size: 10000
    max_async_concurrency: 20
    tls_enabled: true
    tls_config:
      insecure_skip_verify: true
      server_name: XXX
    set_async_circuit_breaker_config:
      enabled: true
      half_open_max_requests: 10
      open_duration: 5s
      min_requests: 50
      consecutive_failures: 5
      failure_percent: 0.05
    enabled_items: [""]
    ttl: 24h0m0s

bucketCacheConfig: |
    type: IN-MEMORY
    config:
      max_size: 2GB
      validity: 15m

bucketCacheConfig used to point to the same redis instance, but on db 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant