Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get Environment Settings from OpenSearch #87

Merged
merged 14 commits into from
Nov 22, 2023

Conversation

dbwiddis
Copy link
Member

@dbwiddis dbwiddis commented Nov 20, 2023

Description

Implements request/response to get environment settings from OpenSearch, corresponding to this call in the Java SDK initialization

Implemented:

  • ExtensionTransportRequest, corresponding to the Java ExtensionRequest class.
    • This takes an enum RequestType
    • This was easy to implement with protobuf (other than I don't think python understands the enum so I had to copy the enum to its own class)
    • Had to change the name because I couldn't duplicate the protobuf class
  • ExtensionSettingsResponseHandler and ExtensionSettingsResponse class
  • Settings class and some of the read/write implementation
  • Updates to the DiscoveryExtensionsRequestHandler to sequence this call after the sending of REST handlers but before sending the final response.

The changes to DiscoveryExtensionsRequestHandler seem somewhat awkward, but I've managed to get them to work with a little bit of tweaking to existing classes. There's somewhat of a "stack" that I create and then unwind manually now, it can probably be improved and generalized. There are some inconsistencies between how requests or responses are sent (with or without args) that I can make more consistent, but wanted some directional feedback before going down that road. A preferred solution here would be a more async approach as originally envisioned in #7. I closed that issue because we'd addressed most of the "solution you'd like" bits, and I don't really know how to implement it the way I'd really like (CompletableFuture.allOf equivalent) but in reality what I'd like is to

  1. Get an init request
  2. Send multiple requests back to OpenSearch
  3. Wait until I get replies to all the requests
  4. Finally respond to the init request

Debug output after handshakes

Received init request (id=16)

INFO:root:< prefix=b'ES', version=2.12.0.99, type=['request'], message=482 byte(s), id=16, ctx=req={'_system_index_access_allowed': 'false'}, res={}, None, features=[], action=internal:discovery/extensions

Sent register rest actions request (id=7)

INFO:root:> prefix=b'ES', version=2.12.0.99, type=['request'], message=172 byte(s), id=7, ctx=req={'_system_index_access_allowed': 'false'}, res={}, node=, id=None, features=[], action=internal:discovery/registerrestactions, size=178 byte(s)

Got ack from OpenSearch (id=7)

INFO:root:b'ES\x00\x00\x007\x00\x00\x00\x00\x00\x00\x00\x07\x01\x08 Y\xa3\x00\x00\x00%\x01\x1c_system_index_access_allowed\x05false\x00\x01'
INFO:root:< response prefix=b'ES', version=2.12.0.99, type=['response'], message=55 byte(s), id=7, ctx=req={'_system_index_access_allowed': 'false'}, res={}, None, features=[]

Sent settings request (id=6)
(ID is lower because it was generated first!)

INFO:root:> prefix=b'ES', version=2.12.0.99, type=['request'], message=98 byte(s), id=6, ctx=req={'_system_index_access_allowed': 'false'}, res={}, node=, id=None, features=[], action=internal:discovery/enviornmentsettings, size=104 byte(s)

Got settings response from OpenSearch (id=6)

INFO:root:b'ES\x00\x00\x04\xef\x00\x00\x00\x00\x00\x00\x00\x06\x01\x08 Y\xa3\x00\x00\x00%\x01\x1c_system_index_access_allowed\x05false\x00\x1a\x0bclient.type\x00\x04node%cluster.initial_cluster_manager_nodes\x07\x01\x00\trunTask-0\x0ccluster.name\x00\x07runTask5cluster.routing.allocation.disk.watermark.flood_stage\x00\x021b.cluster.routing.allocation.disk.watermark.high\x00\x021b-cluster.routing.allocation.disk.watermark.low\x00\x021b\x1fdiscovery.initial_state_timeout\x00\x020s\x14discovery.seed_hosts\x00\x0e127.0.0.1:9300\x18discovery.seed_providers\x00\x04file\thttp.port\x00\x049200\x11http.type.default\x00\x06netty4%indices.breaker.total.use_real_memory\x00\x05false+logger.org.opensearch.action.support.master\x00\x05DEBUG*logger.org.opensearch.cluster.coordination\x00\x05DEBUG)node.attr.shard_indexing_pressure_enabled\x00\x04true\x12node.attr.testattr\x00\x04test\tnode.name\x00\trunTask-0\x0enode.portsfile\x00\x04true\tpath.data\x07\x01\x00D/Users/danielwiddis/git/OpenSearch/build/testclusters/runTask-0/data\tpath.home\x00k/Users/danielwiddis/git/OpenSearch/distribution/archives/darwin-tar/build/install/opensearch-3.0.0-SNAPSHOT\tpath.logs\x00D/Users/danielwiddis/git/OpenSearch/build/testclusters/runTask-0/logs\tpath.repo\x07\x01\x00D/Users/danielwiddis/git/OpenSearch/build/testclusters/runTask-0/repo\x10path.shared_data\x00J/Users/danielwiddis/git/OpenSearch/build/testclusters/runTask-0/sharedData$script.disable_max_compilations_rate\x00\x04true\x0etransport.port\x00\x049300\x16transport.type.default\x00\x06netty4'
INFO:root:< response prefix=b'ES', version=2.12.0.99, type=['response'], message=1263 byte(s), id=6, ctx=req={'_system_index_access_allowed': 'false'}, res={}, None, features=[]

Debug output

INFO:root:>>>>> Reading 26 settings
INFO:root:>>>>> Setting 0: Reading key client.type
INFO:root:>>>>> Setting 0: Value is node
INFO:root:>>>>> Setting 1: Reading key cluster.initial_cluster_manager_nodes
INFO:root:>>>>> Setting 1: Value is ['runTask-0']
INFO:root:>>>>> Setting 2: Reading key cluster.name
INFO:root:>>>>> Setting 2: Value is runTask
INFO:root:>>>>> Setting 3: Reading key cluster.routing.allocation.disk.watermark.flood_stage
INFO:root:>>>>> Setting 3: Value is 1b
INFO:root:>>>>> Setting 4: Reading key cluster.routing.allocation.disk.watermark.high
INFO:root:>>>>> Setting 4: Value is 1b
INFO:root:>>>>> Setting 5: Reading key cluster.routing.allocation.disk.watermark.low
INFO:root:>>>>> Setting 5: Value is 1b
INFO:root:>>>>> Setting 6: Reading key discovery.initial_state_timeout
INFO:root:>>>>> Setting 6: Value is 0s
INFO:root:>>>>> Setting 7: Reading key discovery.seed_hosts
INFO:root:>>>>> Setting 7: Value is 127.0.0.1:9300
INFO:root:>>>>> Setting 8: Reading key discovery.seed_providers
INFO:root:>>>>> Setting 8: Value is file
INFO:root:>>>>> Setting 9: Reading key http.port
INFO:root:>>>>> Setting 9: Value is 9200
INFO:root:>>>>> Setting 10: Reading key http.type.default
INFO:root:>>>>> Setting 10: Value is netty4
INFO:root:>>>>> Setting 11: Reading key indices.breaker.total.use_real_memory
INFO:root:>>>>> Setting 11: Value is false
INFO:root:>>>>> Setting 12: Reading key logger.org.opensearch.action.support.master
INFO:root:>>>>> Setting 12: Value is DEBUG
INFO:root:>>>>> Setting 13: Reading key logger.org.opensearch.cluster.coordination
INFO:root:>>>>> Setting 13: Value is DEBUG
INFO:root:>>>>> Setting 14: Reading key node.attr.shard_indexing_pressure_enabled
INFO:root:>>>>> Setting 14: Value is true
INFO:root:>>>>> Setting 15: Reading key node.attr.testattr
INFO:root:>>>>> Setting 15: Value is test
INFO:root:>>>>> Setting 16: Reading key node.name
INFO:root:>>>>> Setting 16: Value is runTask-0
INFO:root:>>>>> Setting 17: Reading key node.portsfile
INFO:root:>>>>> Setting 17: Value is true
INFO:root:>>>>> Setting 18: Reading key path.data
INFO:root:>>>>> Setting 18: Value is ['/Users/danielwiddis/git/OpenSearch/build/testclusters/runTask-0/data']
INFO:root:>>>>> Setting 19: Reading key path.home
INFO:root:>>>>> Setting 19: Value is /Users/danielwiddis/git/OpenSearch/distribution/archives/darwin-tar/build/install/opensearch-3.0.0-SNAPSHOT
INFO:root:>>>>> Setting 20: Reading key path.logs
INFO:root:>>>>> Setting 20: Value is /Users/danielwiddis/git/OpenSearch/build/testclusters/runTask-0/logs
INFO:root:>>>>> Setting 21: Reading key path.repo
INFO:root:>>>>> Setting 21: Value is ['/Users/danielwiddis/git/OpenSearch/build/testclusters/runTask-0/repo']
INFO:root:>>>>> Setting 22: Reading key path.shared_data
INFO:root:>>>>> Setting 22: Value is /Users/danielwiddis/git/OpenSearch/build/testclusters/runTask-0/sharedData
INFO:root:>>>>> Setting 23: Reading key script.disable_max_compilations_rate
INFO:root:>>>>> Setting 23: Value is true
INFO:root:>>>>> Setting 24: Reading key transport.port
INFO:root:>>>>> Setting 24: Value is 9300
INFO:root:>>>>> Setting 25: Reading key transport.type.default
INFO:root:>>>>> Setting 25: Value is netty4

Final init response (id=16)

INFO:root:> prefix=b'ES', version=2.12.0.99, type=['response'], message=96 byte(s), id=16, ctx=req={'_system_index_access_allowed': 'false'}, res={}, name=hello-world-py, interfaces=['Extension', 'ActionExtension'], features=[], size=102 byte(s)

Issues Resolved

Part of #84
Fixes #88

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Daniel Widdis <[email protected]>
Signed-off-by: Daniel Widdis <[email protected]>
Signed-off-by: Daniel Widdis <[email protected]>
Copy link

codecov bot commented Nov 20, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4e2af35) 100.00% compared to head (d213f8b) 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##              main       #87    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           55        61     +6     
  Lines         1488      1627   +139     
==========================================
+ Hits          1488      1627   +139     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Daniel Widdis <[email protected]>
Signed-off-by: Daniel Widdis <[email protected]>
Signed-off-by: Daniel Widdis <[email protected]>
Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Some smallish stuff.

src/opensearch_sdk_py/transport/stream_input.py Outdated Show resolved Hide resolved
src/opensearch_sdk_py/settings/settings.py Outdated Show resolved Hide resolved
src/opensearch_sdk_py/settings/settings.py Show resolved Hide resolved
src/opensearch_sdk_py/settings/settings.py Outdated Show resolved Hide resolved
src/opensearch_sdk_py/settings/settings.py Outdated Show resolved Hide resolved
@dbwiddis
Copy link
Member Author

Why are we returning str(None) anyway? What's that? Why isn't it None?

Well I saw a toString(get()) and assumed that would always be a String but in fact it does return null... the reality is I was fighting against mypy just to get something to pass for a precommit check to pass and quit when it worked.

Call this read_from?
Call this write_to?

As mentioned in item 2 in my opening comment. My first goal was to faithfully port the Java to get it working and optimize later.

camelCase snakedIn :)

I think you meant camelCase snaked_in ;)

Although the number of times I got an AttributeError for using snake case on the protobuf attributes may have contributed to that...

Are settings typed? If so this interface should only have get and return the right type of value, the caller shouldn't know about what to ask for.

Again the quick answer to "shouldn't know what to ask for" is "I'm porting the Java class first and I'll fine tune later." But this opens a bigger conversation.

For "Are Settings typed", this is something I'm still struggling with and haven't quite figured out how to deal with. Quick answer to the question:

  • There are two classes involved here, Settings which is an elaborate holder for a Map<String, Object> containing the current values, were Object has a type (more easily obtainable), and a Setting<T> representing the configuration with min/max, default values, validators, etc, but the type of <T> is not accessible at runtime due to type erasure.
  • Getting a value (return type T) from a Setting<T> involves passing the Settings object to its getter.

Longer answer:

The problem with extensions in general and Java in particular is type erasure. The only way I could figure out the type T at runtime was by looking at the type of the default, represented in an enum, which is my intention to do for the python Setting class when I get to it. A slight limitation but workable.

On the OpenSearch side for this Settings class, it's a lot more flexible. Just about any class will do as long as you build a Writer for it in a registry.

But in both cases we're looking at the type of an object to determine what to do with it. And in Python we have an int that represents all sorts of different integer types (unless we decide we want to use numpy). I haven't even looked at floating point types and float vs. double.

I'll keep hacking away at this, but this whole system isn't trivial.

@dblock
Copy link
Member

dblock commented Nov 21, 2023

Thanks for the detailed explanation. At a high level I would 1) keep the interfaces as simple as possible (e.g. get over get_int and get_string), 2) not implement the parts that aren't needed by consumers yet (e.g. hierarchy of types that woudn't be used).

@dbwiddis dbwiddis marked this pull request as ready for review November 22, 2023 02:30
@dbwiddis dbwiddis changed the title [WIP] Get Environment Settings from OpenSearch Get Environment Settings from OpenSearch Nov 22, 2023
@dbwiddis
Copy link
Member Author

image

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All suggestions below are should/nice to have refactoring that can be done in future PRs, merging.



class EnvironmentSettingsResponseHandler(ResponseHandler):
def __init__(self, next_handler: RequestResponseHandler) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably sink next_handler into either the response handler (or a new class), or manage the sequence of handlers outside of the handler itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably sink next_handler into either the response handler (or a new class), or manage the sequence of handlers outside of the handler itself.

100% agree, primarily with the "outside of the handler itself" part. I'm convinced there's got to be a better, more general way to "wait for this set of responses to complete".

Related, RequestHandler and ResponseHandler have differing arguments for send() but they ought to be the same.

@@ -18,14 +18,15 @@


class RegisterRestActionsResponseHandler(ResponseHandler):
def __init__(self, next_handler: RequestResponseHandler) -> None:
def __init__(self, next_handler: RequestResponseHandler, request: OutboundMessageRequest) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce OutboundResponseHandler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce OutboundResponseHandler?

Not sure what you mean here. Then again, this whole set of code is somewhat spaghetti so I'm not sure what I mean either. I cobbled enough together to get it working and will try to figure out a better way to do all this.

settings: dict[str, Union[str, Dict]] = {},
# TODO: Secure Settings
) -> None:
self.settings = settings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When initialized with a Dict I think you don't get a copy, does it behave like so?

initial = { 'x': 'y' }
settings = Settings(initial)
settings.read_from(...)
print(initial['Z']) # yields results

If it does, we should copy the dictionary or remove initializing with an external argument altogether.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial static method implementation didn't have that problem. :)

We could always just empty the map on read_from.

from opensearch_sdk_py.transport.stream_output import StreamOutput


class Settings:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would inherit from Dict.

I considered that but

  • there are actually two internal map-like structures. There's a secure_settings (not yet implemented) which acts similarly to a string-to-string map (including a get() method) and it's reasonable to keep them on the same level for consistency
  • we use a get() method which would conflict with the Dict get() method (or at least return unexpected results)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first one I wonder whether "secure setting" should be expressed as "a kind of setting".

In what way are we reimplementing get? I would expect that populating settings changes, but retrieval should be a direct lookup, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first one I wonder whether "secure setting" should be expressed as "a kind of setting".

It's actually an interface with multiple implementations. Here's one: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/settings/KeyStoreWrapper.java

I'd not want to stray too far from the existing implementation.

In what way are we reimplementing get?

Our version returns a String (str) not an object. So an integer or boolean returns its string representation; an array/list object returns a string with square brackets; a map returns a key-value map with curly braces (basically JSON), a time value will include time units, byte sizes will include byte units, etc.

self.assertIsNone(settings.get("bar"))
self.assertEqual(settings.get("bar", "baz"), "baz")
self.assertEqual(settings.get("foo"), "bar")
self.assertEqual(settings.get("foo", "baz"), "bar")
self.assertEqual(settings.get("baz"), "42")
self.assertEqual(settings.get("qux"), "True")
self.assertEqual(settings.get("bytes"), "b'test'")
self.assertEqual(settings.get("list"), "['a', 'b', 'c']")

These Strings will be parsed by our future implementation of the Setting<T> class, each type having its own parser for the str.

@dblock dblock merged commit ef9a6b0 into opensearch-project:main Nov 22, 2023
7 checks passed
@dbwiddis dbwiddis deleted the settings branch November 22, 2023 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] StreamInput integer types are unsigned
2 participants