Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pagerduty_service eventual consistency issue on CREATE #273

Closed
jwlusby opened this issue Oct 8, 2020 · 4 comments · Fixed by #677
Closed

pagerduty_service eventual consistency issue on CREATE #273

jwlusby opened this issue Oct 8, 2020 · 4 comments · Fixed by #677

Comments

@jwlusby
Copy link

jwlusby commented Oct 8, 2020

I am experiencing an intermittent issue with the PagerDuty Terraform provider where the creation of a pagerduty_service or pagerduty_escalation_policy resource often results in the following message

Error: Provider produced inconsistent result after apply

When applying changes to module.cvi-alerting.pagerduty_service.p1, provider
"pagerduty" produced an unexpected new value for was present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

This appears to be a result of the PagerDuty API being eventually consistent. When running Terraform Apply with the environment variable TF_LOG=DEBUG set, I can observe that the resource is created successfully (204 Created after a POST call). However, the PagerDuty Provider immediately issues a GET for the resource that was just created and the API returns a 404 Not Found. However, manually making the same GET request some time later, returns the expected result with a 200 response.

I dug into the provider code a bit and found that there are retries built in to all resource GETs:


However, in the event that the API returns a 404, it doesn't bother to retry:
func handleNotFoundError(err error, d *schema.ResourceData) error {
.
I think that's where the issue lies. Since the PagerDuty API seems to be eventually consistent, the provider should retry with some delay even in the event of a 404 Not Found, in the event that the resource was created, but isn't yet available to be read.

I've created the following PR as an example of adding a number of retries in the event of a 404 Not Found.
#274
I'm not a GO developer so this functionality could probably be done in a cleaner way. But I was able to build and verify that this does solve the issue described above.

Terraform Version

Terraform v0.12.8

Affected Resource(s)

  • pagerduty_service
  • pagerduty_escalation_policy
  • Possibly others, but I've only experiences the issue with these two resources.

Terraform Configuration Files

resource "pagerduty_service" "p1" {
  name                    = "cvi-dvs-jeremiah-us-east-1-p1"
  escalation_policy       = data.pagerduty_escalation_policy.escalation-policy.id
  alert_creation          = "create_alerts_and_incidents"
  auto_resolve_timeout    = var.pagerduty_service_auto_resolve_timeout
  acknowledgement_timeout = "null"

  // Set the ugency for incidents
  incident_urgency_rule {
    type    = "constant"
    urgency = "high"
  }
}

Debug Output

https://gist.github.com/jwlusby/db83a25e0e40fe641c1fa2c7366bf817

Expected Behavior

Provider should have retried after a short delay to retrieve the resource.

Actual Behavior

Provider received a 404 and gave up immediately

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply
@twang817
Copy link

twang817 commented Oct 9, 2020

I ran into this issue too, except with pagerduty_service_integration. If there's anyone out there that has run into this issue and does not want to wait for a fix/build your own provider, here is a short-term fix:

First, I saw that the service integration (in my case, a CloudWatch integration) was indeed created. I looked at the tfstate of a different integration that happened to succeed:

$ terraform state pull > successful
$ cat successful | grep pagerduty_service_integration -A10
      "type": "pagerduty_service_integration",
      "name": "hi",
      "each": "list",
      "provider": "provider.pagerduty",
      "instances": [
        {
          "index_key": 0,
          "schema_version": 0,
          "attributes": {
            "html_url": "....",
            "id": "ABCD123",   # <-- this value here

Next, I tried to manually import that state. Note, I had to adjust my plan and comment out any resources that depended on this just to get the import to successfully run:

$ terraform import pagerduty_service_integration.my_integration ABCD123

This produced a failure that states import IDs for pagerduty_service_integration should be in the format of <service_id>.<integration_id>. You can again refer to an existing state file, but I also found that these IDs are also available in the URL when you load the integration:

https://myorg.pagerduty.com/services/A12BC3D/integrations/ABCD123

So, try the import

$ terraform import pagerduty_service_integration.my_integration A12BC3D.ABCD123
pagerduty_service_integration.my_integration: Import prepared!
  Prepared pagerduty_service_integration for import
pagerduty_service_integration.my_integration: Refreshing state... [id=ABCD123]

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

Now, you can uncomment any other resources that you had commented out and continue with terraform apply.

@kevinsrose
Copy link

I am facing this same issue while trying to dynamically create multiple pagerduty_service objects by using count and a map of values defined in variables.tf. The initial tf plan output will show me that all services are going to be created, and running tf apply returns this error for a handful of the services:

Error: Provider produced inconsistent result after apply

When applying changes to pagerduty_service.ot_services_core[47], provider
"registry.terraform.io/-/pagerduty" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent result after apply

When applying changes to pagerduty_service.ot_services_core[23], provider
"registry.terraform.io/-/pagerduty" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

...
...

I see that all of the services were actually created successfully in PagerDuty despite this error message. However, since the terraform state is now "out of sync", any subsequent tf apply commands executed give me a 400 error. For example:

Error: POST API call to https://api.pagerduty.com/services failed 400 Bad Request. Code: 2001, Errors: [Name has already been taken], Message: Invalid Input Provided

  on services.tf line 19, in resource "pagerduty_service" "ot_services_core":
  19: resource "pagerduty_service" "ot_services_core" {

I have also tried a workaround, similar to the one mentioned above, to query the API for all service IDs and manually bring them into the terraform state using the terraform import commands, but I have been unable to have any success here as well:

pagerduty_service.ot_services_core: Importing from ID "PEPRUAJ"...
pagerduty_service.ot_services_core: Import prepared!
  Prepared pagerduty_service for import

Error: Resource already managed by Terraform

Terraform is already managing a remote object for
pagerduty_service.ot_services_core. To import to this address you must first
remove the existing object from the state.

My assumption here is that this is not handling that the pagerduty_service.ot_services_core is using the count interator to create multiple instances of the service (i.e pagerduty_service.ot_services_core[0]..pagerduty_service.ot_services_core[60]).

Any help in resolving this issue or assitance with a temporary workaround would be greatly appreciated.
Thanks!

@twang817
Copy link

twang817 commented Oct 14, 2020

Look through your state file and see what remote object is claiming the ID. I have a suspicion that you have the incorrect ID.

Failing that, you might have to go state file hacking. Create a count of something else and study its structure to see if you can manually create the right state. The attributes/values don't have to be perfect, as the next apply will generally bring them into sync. All YMMY, of course.

Also, this may be obvious -- but if you're creating something with count, you're remembering to put something to differentiate each instance of the object, right? The error says that the name was claimed -- make sure you're using the index or something so that your nth service isn't conflicting in name with your 1st.

eric-spence-code added a commit to eric-spence-code/terraform-provider-pagerduty that referenced this issue Aug 29, 2021
eric-spence-code added a commit to eric-spence-code/terraform-provider-pagerduty that referenced this issue Aug 29, 2021
eric-spence-code added a commit to eric-spence-code/terraform-provider-pagerduty that referenced this issue Aug 29, 2021
eric-spence-code added a commit to eric-spence-code/terraform-provider-pagerduty that referenced this issue Aug 29, 2021
…p, user contact, and notifications rules to have similar safeguards against eventual consistent reads.
@imjaroiswebdev
Copy link
Contributor

Hey @jwlusby the retries for handling eventual consistency on pagerduty_service was addressed on #380 and for the case of pagerduty_escalation_policy on #677. The later will be shortly available on the next release of the Provider ✌🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants