pagerduty_service eventual consistency issue on CREATE #273

jwlusby · 2020-10-08T20:58:44Z

I am experiencing an intermittent issue with the PagerDuty Terraform provider where the creation of a pagerduty_service or pagerduty_escalation_policy resource often results in the following message

Error: Provider produced inconsistent result after apply

When applying changes to module.cvi-alerting.pagerduty_service.p1, provider
"pagerduty" produced an unexpected new value for was present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

This appears to be a result of the PagerDuty API being eventually consistent. When running Terraform Apply with the environment variable TF_LOG=DEBUG set, I can observe that the resource is created successfully (204 Created after a POST call). However, the PagerDuty Provider immediately issues a GET for the resource that was just created and the API returns a 404 Not Found. However, manually making the same GET request some time later, returns the expected result with a 200 response.

I dug into the provider code a bit and found that there are retries built in to all resource GETs:

terraform-provider-pagerduty/pagerduty/resource_pagerduty_escalation_policy.go

Line 128 in e8c18e2

errResp := handleNotFoundError(err, d)

However, in the event that the API returns a 404, it doesn't bother to retry:

terraform-provider-pagerduty/pagerduty/provider.go

Line 85 in 8bdde9b

func handleNotFoundError(err error, d *schema.ResourceData) error {

.
I think that's where the issue lies. Since the PagerDuty API seems to be eventually consistent, the provider should retry with some delay even in the event of a 404 Not Found, in the event that the resource was created, but isn't yet available to be read.

I've created the following PR as an example of adding a number of retries in the event of a 404 Not Found.
#274
I'm not a GO developer so this functionality could probably be done in a cleaner way. But I was able to build and verify that this does solve the issue described above.

Terraform Version

Terraform v0.12.8

Affected Resource(s)

pagerduty_service
pagerduty_escalation_policy
Possibly others, but I've only experiences the issue with these two resources.

Terraform Configuration Files

resource "pagerduty_service" "p1" {
  name                    = "cvi-dvs-jeremiah-us-east-1-p1"
  escalation_policy       = data.pagerduty_escalation_policy.escalation-policy.id
  alert_creation          = "create_alerts_and_incidents"
  auto_resolve_timeout    = var.pagerduty_service_auto_resolve_timeout
  acknowledgement_timeout = "null"

  // Set the ugency for incidents
  incident_urgency_rule {
    type    = "constant"
    urgency = "high"
  }
}

Debug Output

https://gist.github.com/jwlusby/db83a25e0e40fe641c1fa2c7366bf817

Expected Behavior

Provider should have retried after a short delay to retrieve the resource.

Actual Behavior

Provider received a 404 and gave up immediately

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

terraform apply

The text was updated successfully, but these errors were encountered:

twang817 · 2020-10-09T20:14:36Z

I ran into this issue too, except with pagerduty_service_integration. If there's anyone out there that has run into this issue and does not want to wait for a fix/build your own provider, here is a short-term fix:

First, I saw that the service integration (in my case, a CloudWatch integration) was indeed created. I looked at the tfstate of a different integration that happened to succeed:

$ terraform state pull > successful
$ cat successful | grep pagerduty_service_integration -A10
      "type": "pagerduty_service_integration",
      "name": "hi",
      "each": "list",
      "provider": "provider.pagerduty",
      "instances": [
        {
          "index_key": 0,
          "schema_version": 0,
          "attributes": {
            "html_url": "....",
            "id": "ABCD123",   # <-- this value here

Next, I tried to manually import that state. Note, I had to adjust my plan and comment out any resources that depended on this just to get the import to successfully run:

$ terraform import pagerduty_service_integration.my_integration ABCD123

This produced a failure that states import IDs for pagerduty_service_integration should be in the format of <service_id>.<integration_id>. You can again refer to an existing state file, but I also found that these IDs are also available in the URL when you load the integration:

https://myorg.pagerduty.com/services/A12BC3D/integrations/ABCD123

So, try the import

$ terraform import pagerduty_service_integration.my_integration A12BC3D.ABCD123
pagerduty_service_integration.my_integration: Import prepared!
  Prepared pagerduty_service_integration for import
pagerduty_service_integration.my_integration: Refreshing state... [id=ABCD123]

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

Now, you can uncomment any other resources that you had commented out and continue with terraform apply.

kevinsrose · 2020-10-14T16:30:33Z

I am facing this same issue while trying to dynamically create multiple pagerduty_service objects by using count and a map of values defined in variables.tf. The initial tf plan output will show me that all services are going to be created, and running tf apply returns this error for a handful of the services:

Error: Provider produced inconsistent result after apply

When applying changes to pagerduty_service.ot_services_core[47], provider
"registry.terraform.io/-/pagerduty" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.


Error: Provider produced inconsistent result after apply

When applying changes to pagerduty_service.ot_services_core[23], provider
"registry.terraform.io/-/pagerduty" produced an unexpected new value for was
present, but now absent.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

...
...

I see that all of the services were actually created successfully in PagerDuty despite this error message. However, since the terraform state is now "out of sync", any subsequent tf apply commands executed give me a 400 error. For example:

Error: POST API call to https://api.pagerduty.com/services failed 400 Bad Request. Code: 2001, Errors: [Name has already been taken], Message: Invalid Input Provided

  on services.tf line 19, in resource "pagerduty_service" "ot_services_core":
  19: resource "pagerduty_service" "ot_services_core" {

I have also tried a workaround, similar to the one mentioned above, to query the API for all service IDs and manually bring them into the terraform state using the terraform import commands, but I have been unable to have any success here as well:

pagerduty_service.ot_services_core: Importing from ID "PEPRUAJ"...
pagerduty_service.ot_services_core: Import prepared!
  Prepared pagerduty_service for import

Error: Resource already managed by Terraform

Terraform is already managing a remote object for
pagerduty_service.ot_services_core. To import to this address you must first
remove the existing object from the state.

My assumption here is that this is not handling that the pagerduty_service.ot_services_core is using the count interator to create multiple instances of the service (i.e pagerduty_service.ot_services_core[0]..pagerduty_service.ot_services_core[60]).

Any help in resolving this issue or assitance with a temporary workaround would be greatly appreciated.
Thanks!

twang817 · 2020-10-14T16:51:29Z

Look through your state file and see what remote object is claiming the ID. I have a suspicion that you have the incorrect ID.

Failing that, you might have to go state file hacking. Create a count of something else and study its structure to see if you can manually create the right state. The attributes/values don't have to be perfect, as the next apply will generally bring them into sync. All YMMY, of course.

Also, this may be obvious -- but if you're creating something with count, you're remembering to put something to differentiate each instance of the object, right? The error says that the name was claimed -- make sure you're using the index or something so that your nth service isn't conflicting in name with your 1st.

…can return a 404 on create

…rduty services retry.

…p, user contact, and notifications rules to have similar safeguards against eventual consistent reads.

imjaroiswebdev · 2023-04-17T20:50:16Z

Hey @jwlusby the retries for handling eventual consistency on pagerduty_service was addressed on #380 and for the case of pagerduty_escalation_policy on #677. The later will be shortly available on the next release of the Provider ✌🏽

jwlusby mentioned this issue Oct 9, 2020

Handle eventual consistency for pagerduty_escalation_policy and pagerduty_service resources with retries. #274

Open

eric-spence-code added a commit to eric-spence-code/terraform-provider-pagerduty that referenced this issue Aug 29, 2021

[PagerDuty#273] Handle race condition where a newly created resource …

e5fd91a

…can return a 404 on create

eric-spence-code added a commit to eric-spence-code/terraform-provider-pagerduty that referenced this issue Aug 29, 2021

[PagerDuty#273] updating pagerduty addons and extensions. Fixing page…

7819cf8

…rduty services retry.

eric-spence-code added a commit to eric-spence-code/terraform-provider-pagerduty that referenced this issue Aug 29, 2021

[PagerDuty#273] adding servince now extension

0c53f96

eric-spence-code mentioned this issue Aug 29, 2021

Fix create race condition #380

Merged

imjaroiswebdev mentioned this issue Apr 17, 2023

Handle retries and state drift clean up for Escalation Policy #677

Merged

imjaroiswebdev closed this as completed in #677 Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pagerduty_service eventual consistency issue on CREATE #273

pagerduty_service eventual consistency issue on CREATE #273

jwlusby commented Oct 8, 2020 •

edited

Loading

twang817 commented Oct 9, 2020 •

edited

Loading

kevinsrose commented Oct 14, 2020

twang817 commented Oct 14, 2020 •

edited

Loading

imjaroiswebdev commented Apr 17, 2023

pagerduty_service eventual consistency issue on CREATE #273

pagerduty_service eventual consistency issue on CREATE #273

Comments

jwlusby commented Oct 8, 2020 • edited Loading

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

twang817 commented Oct 9, 2020 • edited Loading

kevinsrose commented Oct 14, 2020

twang817 commented Oct 14, 2020 • edited Loading

imjaroiswebdev commented Apr 17, 2023

jwlusby commented Oct 8, 2020 •

edited

Loading

twang817 commented Oct 9, 2020 •

edited

Loading

twang817 commented Oct 14, 2020 •

edited

Loading