Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] AsyncOpenSearch Authorization Error When Using AWSV4SignerAsyncAuth #698

Open
vcastane opened this issue Mar 19, 2024 · 9 comments
Open
Labels
bug Something isn't working

Comments

@vcastane
Copy link
Contributor

What is the bug?

Generating an AsyncOpenSearch client with AWSV4SignerAsyncAuth/AsyncHttpConnection causes authorization errors despite valid credentials using an assumed session

How can one reproduce the bug?

Authorization differences can be seen when comparing a sync opensearch client to an async opensearch client:

Sync Opensearch:

# Sync

import boto3
from opensearchpy import OpenSearch, AWSV4SignerAuth, RequestsHttpConnection

arn_to_assume = "<your-arn-here>"
region = "<region>"
role_session_name = "<role-session-name>"
opensearch_endpoint = "<my-dev-endpoint.amazonaws.com>"
port = int(443)  # it's probably 443

base_session = boto3.Session()
sts = base_session.client('sts')

sts_response = sts.assume_role(RoleArn=arn_to_assume, RoleSessionName=role_session_name)

assumed_session = boto3.Session(
    aws_access_key_id=sts_response['Credentials']['AccessKeyId'],
    aws_secret_access_key=sts_response['Credentials']['SecretAccessKey'],
    aws_session_token=sts_response['Credentials']['SessionToken']
)

credentials = assumed_session.get_credentials()

auth = AWSV4SignerAuth(credentials, region)
client = OpenSearch(
    hosts = [{'host': opensearch_endpoint, 'port': port}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)

client.ping()

This gets the expected 200 response and a True

When attempting to recreate the same behavior with the AsyncOpenSearch client as below:

# Async

import boto3
from opensearchpy import AsyncOpenSearch, AWSV4SignerAsyncAuth, AsyncHttpConnection

arn_to_assume = "<your-arn-here>"
region = "<region>"
role_session_name = "<role-session-name>"
opensearch_endpoint = "<my-dev-endpoint.amazonaws.com>"
port = int(443)  # it's probably 443

base_session = boto3.Session()
sts = base_session.client('sts')

sts_response = sts.assume_role(RoleArn=arn_to_assume, RoleSessionName=role_session_name)

assumed_session = boto3.Session(
    aws_access_key_id=sts_response['Credentials']['AccessKeyId'],
    aws_secret_access_key=sts_response['Credentials']['SecretAccessKey'],
    aws_session_token=sts_response['Credentials']['SessionToken']
)

credentials = assumed_session.get_credentials()
auth = AWSV4SignerAsyncAuth(credentials, region)

client = AsyncOpenSearch(
    hosts = [{'host': opensearch_endpoint, 'port': port}],
    http_compress=True,
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = AsyncHttpConnection
)

await client.ping()

I get the following error:

{"message": "GET https://my-opensearch-url.region.es.amazonaws.com/ [status:403 request:0.035s]", "level": "WARNING", "timestamp": "2024-03-19T09:14:17.506400Z"}
False

I believe I am using the various tooling correctly, but there appears to be a problem with the AsyncAuthSigner.

What is the expected behavior?

Async generated auths are valid, same as sync auths.

What is your host/environment?

Linux
version 2.4.2

Do you have any screenshots?

N/A

Do you have any additional context?

N/A

@vcastane vcastane added bug Something isn't working untriaged Need triage labels Mar 19, 2024
@dblock
Copy link
Member

dblock commented Mar 19, 2024

I don't see anything immediately obvious here - I have a working sample both sync and async in https://github.com/dblock/opensearch-python-client-demo, try running that one, and we can then narrow down the differences? It doesn't use assume_role, so it would be good to know whether it's related. Also it has detailed logging, let's get that from your run as well and compare.

@dblock dblock removed the untriaged Need triage label Mar 19, 2024
@vcastane
Copy link
Contributor Author

I went through both the sync and async examples and simplified the calls to just make the ping() call.
Unfortunately due to our setup, I need to use the assume_role code to have access to our dev cluster.

Once again the sync session works, and the async session fails in the same way, with a 403.

The only changes I made were to skip using urlparse, opting to just pass the host url directly, and removing calls other than ping()

(async) (base) host:~/opensearch-python-client-demo/sync$ pipenv run python example.py
Using opensearch-py 2.4.1
INFO:HEAD https://<my-opensearch-host>.eu-west-1.es.amazonaws.com:443/ [status:200 request:0.038s]

vs.

(async) (base) host:~/opensearch-python-client-demo/async$ pipenv run python example.py
Using opensearch-py 2.4.1
WARNING:GET https://<my-opensearch-host>.eu-west-1.es.amazonaws.com:443/ [status:403 request:0.026s]

One things I do notice with this is that the ping() command generates a different request method using the async client vs the sync client: GET in the async client vs HEAD in the sync client.

While that may be related to the issue I am having (permissions to make get requests vs head requests), it seems like undesired behavior for the package.

also @dblock : in the async example Pipfile file, I believe you need to add the async extras for all the example to run:
opensearch-py = {extras = ["async"], version = "2.4.1"}

@vcastane
Copy link
Contributor Author

I will double check and confirm the GET vs HEAD permission with my ops team and update this thread. Hopefully this is something small/silly on my end.

@vcastane
Copy link
Contributor Author

Update: turns out our security team locked down GET permissions and we are not allowed to make naked requests without an index specified in the request. As such the async ping(), which makes a GET request instead of a HEAD request was the source of my confusion.

@dblock feel free to close this issue, unless you want to use it to track the different in behavior between the sync/async services.

@saimedhi
Copy link
Collaborator

Hello @vcastane , I observed that in opensearch-py 2.4.2, the ping API uses the HEAD method for both sync and async. Please correct me if I've misunderstood the issue. Thanks!

@vcastane
Copy link
Contributor Author

That is bizarre. I double checked my version and it shows 2.4.1. Although that is not the latest, the ping code in the async client has not been touched in 7 years (according to blame). Have you been able to run the async ping() and confirm that it calls HEAD for you? @saimedhi

@saimedhi
Copy link
Collaborator

saimedhi commented Mar 20, 2024

@vcastane, you're right that the GET method is being used instead of HEAD across all async API calls. This happens due to the AsyncHttpConnection and AIOHttpConnection classes automatically changing HEAD requests to GET. This adjustment was initially made to work around a bug in aiohttp that prevented connection re-use with HEAD requests. However, this bug has been resolved (see PR aio-libs/aiohttp#5012), so now there's no need to convert HEAD to GET anymore. Would you be interested in applying this fix?

For the code causing this issue, see the links above.

@vcastane
Copy link
Contributor Author

vcastane commented Mar 20, 2024

@saimedhi Sure! Here's my stab at a simple fix for the issue identified: #699

Let me know what else needs to be done and I can work on it.

@dblock
Copy link
Member

dblock commented Mar 20, 2024

I'm glad we narrowed it down! Let's get the PR finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants