Skip to content
This repository has been archived by the owner on Jul 6, 2022. It is now read-only.

Redis deployment remains in "Provisioning" even after deployed #695

Open
kyschouv opened this issue Apr 19, 2019 · 11 comments
Open

Redis deployment remains in "Provisioning" even after deployed #695

kyschouv opened this issue Apr 19, 2019 · 11 comments

Comments

@kyschouv
Copy link

My redis cache doesn't seem to want to deploy through osba 1.5.0 anymore. It seems to get stuck in the "Provisioning" state, even though the redis cache has been created and transitioned to "running" in Azure. There's not many logs, bug I did see this in the osba logs:

time="2019-04-18T01:52:58Z" level=error msg="error executing job; not submitting any follow-up tasks" error="error executing provisioning step \"deployARMTemplate\" for instance \"66014c34-6176-11e9-a1ac-ea831bbae743\": error executing provisioning step: error deploying ARM template: error deploying \"d8dece67-d004-4883-a1d3-2194dcec8777\" in resource group \"my-test-rg\": error while waiting for deployment to complete: Future#WaitForCompletion: context has been cancelled: StatusCode=200 -- Original Error: context deadline exceeded" job=executeProvisioningStep taskID=22ebad5d-54e0-4a46-9522-7c813cd643a9

I've been trying again, but have waited over an hour for a couple deployments to transition out of provisioning (even though their Azure resources have transitioned to running), and they aren't changing and there are no new logs. I tried to deploy another resource type, and it works just fine.

@zhongyi-zhang
Copy link
Contributor

Azure Go SDK has a timeout on waiting for ARM template. The resource could be ready at the last meanwhile the ARM deployment already hits timeout... It is a known issue due to that new Redis Cache instance costs long time to be ready and sometimes even longer.

@kyschouv
Copy link
Author

Is there a workaround for this? It used to work (late last year).

@zhongyi-zhang
Copy link
Contributor

The lifecycle test for this service can pass with big chance. I suppose it is transient or maybe Azure location / tier related. Here is the lifecycle cases FYI: https://github.com/Azure/open-service-broker-azure/blob/master/tests/lifecycle/rediscache_cases_test.go#L18-L87

@kyschouv
Copy link
Author

I haven't had it work in westus2 for at least the last couple of days, though I haven't tried it before that in a few weeks. Interestingly, the second time I tried today the actual redis cache resource allocated in Azure pretty quickly, but osba still didn't pick it up.

@zhongyi-zhang
Copy link
Contributor

Please also watch the provisioning state of the new Redis instance. You can also take a look at the "deployments" of the resource group. That's what Azure Go SDK waits for.

@kyschouv
Copy link
Author

Is there some workaround for this? Trying it again today the same behavior is happening. The Redis cache eventually enters a Running state, but it just sits in Pending even afterwards, eventually transitioning to "OrphanMitigation". The deployment took almost 50 minutes, but did eventually complete - osba just isn't ever picking that completion up.

@zhongyi-zhang
Copy link
Contributor

That's expected for OSBA. The max polling duration for an ARM deployment is set 45 mins: https://github.com/Azure/open-service-broker-azure/blob/master/pkg/boot/modules.go#L69. Does the creation performance of Azure Redis Cache get worse?

@kyschouv
Copy link
Author

It was over 50 minutes last time I tried it in westus2. It seems to vary quite a bit, but is consistently taking too long for OSBA to pick up the successful deployment once it does finish.

@zhongyi-zhang
Copy link
Contributor

That's not possible because a failure was already returned to Service Catalog. There is no "pick up" in the Open Service Broker Spec for service instances failed in polling. So Service Catalog doesn't do "pick up", then OSBA who is driven by Service Catalog doesn't do that, either.

@kyschouv
Copy link
Author

Sorry, I'm not completely familiar with the internals of how it works. Is there no way to fix this or work around it then? It seems like I can't rely on OSBA for this?

@zhongyi-zhang
Copy link
Contributor

We can just prolong max polling duration to solve it for now if we can confirm this is expected for Azure Redis Cache. What about submitting a support ticket in Portal to ask for it? Some services do have worse creation performance in some regions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants