Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Catalog share-portfolio-with-org randomly fails when more than one OU is targeted #601

Open
5 tasks
vanja-zecevic opened this issue Oct 10, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@vanja-zecevic
Copy link

Describe the bug
Service Catalog portfolios that are deployed via LZA, and that are shared with multiple Organizational Units, do not always have all OU shares appear.

To Reproduce
Add a portfolio with a product to serviceCatalogPortfolios in customizations-config.yaml and share it with multiple OUs:

  serviceCatalogPortfolios:
    - name: test-portfolio
      provider: LZA
      account: Management
      regions:
        - *HOME_REGION
      shareTargets:
        organizationalUnits:
          - test-ou-1
          - test-ou-2
          - test-ou-3
          - test-ou-4
      portfolioAssociations:
        - type: PermissionSet
          name: target-permission-set
          propagateAssociation: true
      products:
        - name: test-product
          description: this is a test product
          distributor: LZA
          versions:
            - name: v1
              template: servicecatalog-templates/product-template.yaml
              deprecated: false

The bug is more likely to occur with increasing number of share targets however we have seen it occur with as few as two target OUs.

Expected behavior
Would expect to see the SC portfolio and product appear in each account under the shared OUs (test-ou-1,2,3,4). When this problem occurs, one or more OUs will not have the portfolio available. The following error appears in CloudWatch logs for the lambda that performs the provisioning. The lambda is named
AWSAccelerator-Customizat-CustomSharePortfolioWith-XXXXXXXXXXXX

ERROR	InvalidStateException: Cannot process more than one portfolio share action at the same time. Try again later.
ERROR	Failed to Create portfolio share for portfolio port-abcdefghijklm with organizational unit ou-abcd-abcdefgh
ERROR	Error: Error while trying to create portfolio share with organization resource id: ou-abcd-abcdefgh with portfolio id: port-abcdefghijklm
INFO	submit response to cloudformation {
  Status: 'SUCCESS',
  Reason: 'SUCCESS',
  ...
}

Because the response to CloudFormation is a success, CF assumes the resource was created successfully and continues without throwing any error. Any subsequent deployments of the customizations stack don't attempt to update these shares because they appear to have been created successfully from CF perspective.

Please complete the following information about the solution:

  • Version: 1.8.1 although this has been occurring in previous versions and I believe it will also occur with the latest version 1.9.2 since the code that is responsible for this behaviour is stil the same.
  • Region: ap-southeast-2
  • Was the solution modified from the version published on this repository? No
  • Have you checked your service quotas for the services this solution uses? Yes, it is not related to service quotas.
  • Were there any errors in the CloudWatch Logs? Yes, see above.

Additional context
This problem is caused by code in the following file.
source/packages/@aws-accelerator/constructs/lib/aws-servicecatalog/share-portfolio-with-org/index.ts
In the function that creates the portfolio share (modifyPortfolioShare), there is a random delay that is intended to prevent this very problem from occurring.

  // Random delay to reduce the chance to process more than one portfolio share action at the same time which triggers InvalidStateException
  await delay(Math.floor(Math.random() * 5) * 5000);

Unfortunately this does not reliably prevent the problem from occurring and I do not believe using a random delay is the right approach for this type of issue. Also, it doesn't return an error to CloudFormation so the deployment just continues without acknowledging the failure. It took quite a bit of digging to find the cause for this problem

I have two suggestions for an alternative way to make this more robust.

  1. We know we cannot create more than one portfolio share in parallel, so do the processing sequentially. When there is more than one share, while creating the CloudFormation resources for the second and each subsequent share in CDK, create a dependency on the previous share. This way CloudFormation will process them in sequence and you can get rid of the random wait.
  2. Have an automatic retry mechanism when the 'InvalidStateException' error is encountered. The throttlingBackOff function is already used, however it does not retry for the aforementioned exception. I'm not sure if there is a way to modify the behaviour of throttlingBackOff.

Also would be good to include appropriate error handling in case the share still fails to be created for some reason.

Workaround
We have been able to work around this by adding OU shares one at a time, however this requires deploying the customizations stage multiple times and is not ideal.

@vanja-zecevic vanja-zecevic added the bug Something isn't working label Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant