Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS Blob Prefix and Blob Delimiter Configuration #6531

Open
Jayaram7 opened this issue Feb 7, 2025 · 11 comments
Open

GCS Blob Prefix and Blob Delimiter Configuration #6531

Jayaram7 opened this issue Feb 7, 2025 · 11 comments

Comments

@Jayaram7
Copy link

Jayaram7 commented Feb 7, 2025

Hi All,

I have create a scaledJob Spec with GCS trigger. I have GCS Bucket in us-central-1 which has "test" folder and "test/processed" folder. This processed folder is sub-folder of test. I want to trigger the ScaledJob when a new file arrives in "test/" folder but not in "test/processed/" folder. I tried to use Blob Prefix and Blob Delimiter to restrict the count of objects. But it is not working as expected. Please recommend the right config YAML.

@SpiritZhou
Copy link
Contributor

Did you try to set Prefix and delimiter at the same time? As the documentation describes, both Prefix and delimiter should be set.

https://cloud.google.com/storage/docs/samples/storage-list-files-with-prefix#storage_list_files_with_prefix-go

    // Prefixes and delimiters can be used to emulate directory listings.
// Prefixes can be used to filter objects starting with prefix.
// The delimiter argument can be used to restrict the results to only the
// objects in the given "directory". Without the delimiter, the entire tree
// under the prefix is returned.
//
// For example, given these blobs:
//   /a/1.txt
//   /a/b/2.txt
//
// If you just specify prefix="a/", you'll get back:
//   /a/1.txt
//   /a/b/2.txt
//
// However, if you specify prefix="a/" and delim="/", you'll get back:
//   /a/1.txt

@Jayaram7
Copy link
Author

Jayaram7 commented Feb 7, 2025

@SpiritZhou , I tried to follow the above link . I gave prefix as "test/" and delimiter as "/". Even after deleting the object , Keda is still scaling the job.

@Jayaram7
Copy link
Author

Any other approach we can follow ?

@JorTurFer
Copy link
Member

Even after deleting the object , Keda is still scaling the job.

What do you mean with this? Is the job ending and KEDA creating a new one or just the job isn't removed?

@JorTurFer JorTurFer moved this from To Triage to Proposed in Roadmap - KEDA Core Feb 13, 2025
@Jayaram7
Copy link
Author

Jayaram7 commented Feb 13, 2025

@JorTurFer , When the file is uploaded inside the GCS bucket test path , Keda trigger got activated and created a new Job for every polling interval as expected. But when I remove the file from the test path in GCS bucket, Keda still keep creating the new jobs for every polling interval , which is not the expected behavior.

@JorTurFer
Copy link
Member

Could you share KEDA operator logs?

@Jayaram7
Copy link
Author

@JorTurFer Added both operator and keda-config.yaml file.

keda-operator-logs.log

keda-config.txt

@JorTurFer
Copy link
Member

Based on the logs, it looks that there is at least 1 item in the queue and that's why KEDA is scaling to more jobs. Could you enable debug logs and send them to us again with debug enabled? https://github.com/kedacore/charts/blob/main/keda/values.yaml#L380

@Jayaram7
Copy link
Author

@JorTurFer , Attached the operator logs in debug mode. Please note while pulling the debug logs , there is no file in "test/" path but subfolder "processed" in test/ path in GCS bucket keda-test-sample.

keda-operator-log-debug.log

@JorTurFer
Copy link
Member

Based on logs, there are 2 items in the queue

gcp_storage_scaler	Counted 2 items with a limit of 1000

The behaviour of this scaler is covered by e2e tests, so I think that this could be an edge case in some scenarios. I'm going to test this on my own side, do you have any way to reproduce this? maybe a small script using gcould or so?

@Jayaram7
Copy link
Author

Hi , Below are the steps I followed to reproduce the issue,

  1. Create GKE Cluster - https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable_on_cluster
  2. Create GKE Node Pools - https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#option_1_node_pool_creation_with_recommended
  3. Create GCS Bucket - https://cloud.google.com/storage/docs/creating-buckets#command-line
  4. Authenticate the GKE Cluster - gcloud container clusters get-credentials CLUSTER_NAME
    --location=LOCATION
  5. Create test/ and test/processed folders inside the bucket ( via GCP Console)
  6. Install KEDA - https://keda.sh/docs/2.16/deploy/#helm
  7. Once KEDA is created , execute below commands
  • gcloud auth login
  • gcloud iam service-accounts create helloworld --project=<project_id>
  • kubectl annotate serviceaccount keda-operator --namespace keda iam.gke.io/gcp-service-account=helloworld@<project_id>.iam.gserviceaccount.com
  • gcloud projects add-iam-policy-binding <project_id> --member serviceAccount:helloworld@<project_id>.iam.gserviceaccount.com --role=roles/viewer
  • gcloud projects add-iam-policy-binding <project_id> --member "serviceAccount:helloworld@<project_id>.iam.gserviceaccount.com" --role "roles/iam.serviceAccountTokenCreator" --project=<project_id>
  • gcloud projects add-iam-policy-binding <project_id> --member "serviceAccount:helloworld@<project_id>.iam.gserviceaccount.com" --role "roles/storage.legacyBucketReader" --project=<project_id>
  • gcloud iam service-accounts add-iam-policy-binding helloworld@<project_id>.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member "serviceAccount:<project_id>.svc.id.goog[keda/keda-operator]" --project=<project_id>
  1. Execute kubectl apply -f keda-config.yaml ( i.e. https://github.com/user-attachments/files/18809099/keda-config.txt)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Proposed
Development

No branches or pull requests

3 participants