Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with the retry logic in the OktaApiClient and OktaRetryApi classes (#61) #62

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

TriggerAu
Copy link
Contributor

Fix issues with the retry logic in the OktaApiClient and OktaRetryApi classes

  • The OktaApiClient class was not correctly handling a non 200 response code
  • The OktaRetryApi class was searching for a header field using incorrect case
  • The OktaRetryApi class was trying to parse a Unix Epoch number as a string date
  • Also added a verbose message so the user knows how long they waiting for the retry

Root cause was missing a switch on API calls and the retyr logic not matching the Okta API names and data types

Issue #61

…and OktaRetryApi classes

* The OktaApiClient class was not correctly handling a non 200 response code
* The OktaRetryApi class was searching for a header field using incorrect classes
* The OktaRetryApi class was trying to parse a Unix Epoch number as a string date
* Also added a verbose message so teh user knows how long they waiting for the retry
@laura-rodriguez
Copy link
Collaborator

Thank you @TriggerAu!

The team will review and prioritize this issue accordingly.

Internal Ref: OKTA-825949

@laura-rodriguez
Copy link
Collaborator

laura-rodriguez commented Nov 4, 2024

Hey @TriggerAu, I was looking into the existing rate limit/retry logic tests we already had to understand why they didn't catch the issue before. Can you confirm if they were failing for you before your changes, and if so, what PowerShell version are you using?

I understand that if the rate limit headers casing were consistently wrong everywhere, tests wouldn't catch the error, but I'm curious about your comment in #61 about the retry code being unreachable due to ErrorAction settings. If I understand this correctly, those tests should fail in your local env because the retry logic was never reached due to fatal errors. Can you please provide more details on how to reproduce the issues you faced so I can revisit the tests and docs regarding $ErrorActionPreference accordingly? Thanks!

Mock -ModuleName Okta.PowerShell Get-OktaConfiguration { return $Config } -Verifiable

$Response = [PSCustomObject]@{
Content = "[]"
Headers = @{
"Date" = @($Now)
"Content-Type" = @('application/json; charset=utf-8')
"X-Rate-Limit-Reset" = @($ResetDate)
"x-rate-limit-reset" = @($ResetDateEpoch)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to fix the casing (or make the appropriate changes to handle headers in a case-insensitive way) of additional headers such as x-okta-request-id? Is x-rate-limit-reset the only problematic one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting one, I didn't look at that one because it wasn't erroring - now im questioning my case sensitive testing for sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great question and I didnt do that one because it didn't throw errors on me. Im now questioning myself about the case behavior - Ill retest it on the weekend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay, I started in on this and it got weird, but I finally found it

Heres a screenshot of the debugging and some annotations,

So whats happening is that inside Invoke-OptaApiClient the $headers that come back from the Invoke are taken from the rawResponse.Headers. These are a Dictionary and the reference to the Dictionary keyvalue pairs is case insensitive as you can see in the debug checks using various casing in the box that points to 1

When this Dictionary is passed in to CalculateDelay the Dictionary is cast to a Hashtable which behaves in a case sensitive behavior - as you can see in the debug results in the box pointing to 2

image

So the short answer is the casing re x-okta-request-id works as is because the problem is occuring in the pass of the headers into the CalculateDelay method. I really wish I could figure out why this is the case, but the day job is chewing a lot of time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing your findings, and we appreciate the time you took to look into this!
I'm currently working on other priorities, but I'll get back to this as soon as possible.

@@ -163,7 +164,8 @@ function Invoke-OktaApiClient {
-SkipCertificateCheck `
-Proxy $Configuration["Proxy"].GetProxy($UriBuilder.Uri) `
-ProxyUseDefaultCredentials `
-UserAgent $OktaUserAgent
-UserAgent $OktaUserAgent `
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering your comment about IWR not reaching the Retry logic due to ErrorAction = Stop, I'm curious how tests still passed. I left a question for you to help me reproduce the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave that a look and the tests pass because the calls to IWR are being Mock'ed out. For example in this block

Mock -ModuleName Okta.PowerShell Get-OktaConfiguration { return $Config } -Verifiable
$Response = [PSCustomObject]@{
Content = "[]"
Headers = @{
"Date" = @($Now)
"Content-Type" = @('application/json; charset=utf-8')
"X-Rate-Limit-Reset" = @($ResetDate)
"X-Okta-Request-Id" = @("foo")
}
StatusCode = 429
}
Mock -ModuleName Okta.PowerShell Invoke-WebRequest {
return $Response
} -Verifiable
$Result = Invoke-OktaListApplications
Assert-MockCalled -ModuleName Okta.PowerShell Invoke-WebRequest -Times 2

Calls to Get-OktaConfiguration and Invoke-WebRequest are replaced by the script blocks in the Mock definition. When Invoke-OktaListApplications is called the inner calls to the two mocked functions run the script blocks not the actual commands so Invoke-WebRequest never gets called or a non-OK status response for IWR to throw an error

I cant think of an "easy" way to unit test IWR getting an actual 429 response thats not overly contrived - eg mock an internal part of that method - so powershells IWR method gets the 429, or call an actual weburl tat forcibly returns 429 - both seem more like integration tests than unit tests though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for clarifying!

@@ -163,7 +163,8 @@ function Invoke-{{{apiNamePrefix}}}ApiClient {
-ErrorAction Stop `
-UseBasicParsing `
-SkipCertificateCheck `
-UserAgent $OktaUserAgent
-UserAgent $OktaUserAgent `
-SkipHttpErrorCheck
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the SkipHttpErrorCheck param is supported by PS 7+, and the Okta PS module supports PS 6.2+, do you have any thoughts on how this should be handled in PS 6+?

Copy link
Contributor Author

@TriggerAu TriggerAu Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting one, I did look at Powershell 5.1, but didn't look at PS 6 vs 7 consideration apologies.

Off top of my head can think of three ways to handle it

  1. rewrite the code so it does try catches non-200 responses and in the catch it needs logic to separate 429, etc from other errors
  2. adjust it with a case for PS6 vs 7 - probably using try catch in PS^ as theres no other way around that IWR behvior and the new param in 7
  3. Consider moving to PS7 - Im unsure of whether you can see the mix of PSVersions in use from PSGallery though - so that may not be an easy one either (MS did end of support PS6 a whiles ago, but where customers are in their version journey is certainly a consideration)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with these three options. I'm inclined towards 1, but I would also consider number 3. We'll have to revisit the status of previous PS versions starting from 6.2.

@TriggerAu
Copy link
Contributor Author

I am 100% questioning my sanity at this point

Hey @TriggerAu, I was looking into the existing rate limit/retry logic tests we already had to understand why they didn't catch the issue before. Can you confirm if they were failing for you before your changes, and if so, what PowerShell version are you using?

I understand that if the rate limit headers casing were consistently wrong everywhere, tests wouldn't catch the error, but I'm curious about your comment in #61 about the retry code being unreachable due to ErrorAction settings. If I understand this correctly, those tests should fail in your local env because the retry logic was never reached due to fatal errors. Can you please provide more details on how to reproduce the issues you faced so I can revisit the tests and docs regarding $ErrorActionPreference accordingly? Thanks!

Hey Laura, Think I covered these in the code comments the version I'm using is PS7 in the issue report

To create the failure I used the following script - I did try parallelising it to be faster, but moved away from that due to parallel requests causing the api to return resource not found. So I just ran this and wait for it to hit the call limit

This is using the fix in #53 for setting the ApiKey value set

zDTTest.ps1.txt

Write-output "Importing Module..."
import-module ".\src\Okta.PowerShell\" -force

Write-output "Setting Config..."
$configuration = $null
$configuration = Get-OktaConfiguration

Set-OktaConfiguration -BaseUrl "https://INSTANCE.oktapreview.com"
Set-OktaConfiguration -MaxRetries 2
Set-OktaConfiguration -RequestTimeout 30000
Set-OktaConfiguration -ApiKey "BLAHBLAHBLAH"

Write-output "Base URL: $($configuration.BaseUrl)"
Write-output "Max Retries: $($configuration.MaxRetries)"
Write-output "Request Timeout: $($configuration.RequestTimeout)"

write-output "Looping for doom..."
1..150 | foreach-object {
    "Attempt: $_"
    $apps = Invoke-OktaListApplications -Limit 1 -verbose  
}

return

@laura-rodriguez
Copy link
Collaborator

I am 100% questioning my sanity at this point

Hey @TriggerAu, I was looking into the existing rate limit/retry logic tests we already had to understand why they didn't catch the issue before. Can you confirm if they were failing for you before your changes, and if so, what PowerShell version are you using?
I understand that if the rate limit headers casing were consistently wrong everywhere, tests wouldn't catch the error, but I'm curious about your comment in #61 about the retry code being unreachable due to ErrorAction settings. If I understand this correctly, those tests should fail in your local env because the retry logic was never reached due to fatal errors. Can you please provide more details on how to reproduce the issues you faced so I can revisit the tests and docs regarding $ErrorActionPreference accordingly? Thanks!

Hey Laura, Think I covered these in the code comments the version I'm using is PS7 in the issue report

To create the failure I used the following script - I did try parallelising it to be faster, but moved away from that due to parallel requests causing the api to return resource not found. So I just ran this and wait for it to hit the call limit

This is using the fix in #53 for setting the ApiKey value set

zDTTest.ps1.txt

Write-output "Importing Module..."
import-module ".\src\Okta.PowerShell\" -force

Write-output "Setting Config..."
$configuration = $null
$configuration = Get-OktaConfiguration

Set-OktaConfiguration -BaseUrl "https://INSTANCE.oktapreview.com"
Set-OktaConfiguration -MaxRetries 2
Set-OktaConfiguration -RequestTimeout 30000
Set-OktaConfiguration -ApiKey "BLAHBLAHBLAH"

Write-output "Base URL: $($configuration.BaseUrl)"
Write-output "Max Retries: $($configuration.MaxRetries)"
Write-output "Request Timeout: $($configuration.RequestTimeout)"

write-output "Looping for doom..."
1..150 | foreach-object {
    "Attempt: $_"
    $apps = Invoke-OktaListApplications -Limit 1 -verbose  
}

return

Thank you for providing this sample. I'll also investigate if there's a way to mock an internal IWR and get the same behavior.

@TriggerAu
Copy link
Contributor Author

TriggerAu commented Jan 7, 2025

Added another commit that fixes an issue caused by AddRetryHeaders affecting the pipeline with return values

@laura-rodriguez
Copy link
Collaborator

Hey @TriggerAu, I just wanted to give you a quick update.

To facilitate testing, I've submitted a PR that added an HttpListener, which will allow us to test the IWR behavior better. As you mentioned in this thread, mocking IWR with Pester prevents us from detecting several issues. In that PR, you will see that I added a unit test that allowed us to verify that, as you described, without the flag -SkipHttpErrorCheck IWR throws so the retry logic is omitted. Once we have that PR approved and merged, we'll revisit your PR.

Again, thanks for all the feedback provided!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants