feat: Adding RestrictedFeatures Support to the Python Frontend Bindings #7775

KrishnanPrash · 2024-11-08T00:44:26Z

What does the PR do?

This PR provides Restricted Features support to the python frontend bindings, a.k.a tritonfrontend.
This PR introduces 3 new classes to the tritonfrontend package:

Feature: 1-to-1 mapping of RestrictedCategory Enum
FeatureGroup: Pydantic @dataclass that stores (key, value, features) information while performing type validation.
RestrictedFeatures: Store collections of FeatureGroup instances, apply an additional layer of validation to check for collisions (ensuring that each feature belongs to only one group), and serialize the data into a JSON string.

With these changes, we can pass a RestrictedFeatures to the KServeHttp and KServeGrpc frontends, which looks along the lines of:

>>> import tritonserver
>>> from tritonfrontend import KServeHttp, KServeGrpc
>>> from tritonfrontend import RestrictedFeatures
>>> server = tritonserver.Server(...).start(wait_until_ready=True)

>>> rf = RestrictedFeatures(...)

>>> http_options = KServeHttp.Options(restricted_features=rf)
>>> http_service = KServeHttp(server, http_options)
>>> http_service.start()

>>> grpc_options = KServeGrpc.Options(restricted_features=rf)
>>> grpc_service = KServeGrpc(server, grpc_options)
>>> grpc_service.start()

A RestrictedFeatures instance can be created with:

>>> from tritonfrontend import Feature, FeatureGroup, RestrictedFeatures
>>> admin_group = FeatureGroup(key="admin-key", value="admin-value", features=[Feature.HEALTH, Feature.METADATA])
>>> infer_group = FeatureGroup("infer-key", "infer-value", [Feature.INFERENCE])

>>> rf = RestrictedFeatures([admin_group, infer_group])

>>> model_group = FeatureGroup("model-key", "model-value", [Feature.MODEL_REPOSITORY])
>>> rf.add_feature_group(model_group)

>>> rf.create_feature_group("stat-key", "stat-value", [Feature.STATISTICS])
>>> rf
[  
   {"key": "admin-key", "value": "admin-value", "features": ["health"]},
   {"key": "admin-key", "value": "admin-value", "features": ["metadata"]},  
   {"key": "infer-key", "value": "infer-value", "features": ["inference"]}, 
   {"key": "model-key", "value": "model-value", "features": ["model-repository"]}, 
   {"key": "stat-key", "value": "stat-value", "features": ["statistics"]}
]

Where should the reviewer start?

Take a look at api/_restricted_features.py and tritonfrontend.h

Test plan:

In L0_python_api/test_kserve.py added:

test_correct_rf_parameters() and test_wrong_rf_parameters(): Tests for valid/invalid arguments passed/selected for Feature, FeatureGroup and RestrictedFeature
To test if restricted features are working correctly with the frontends:
test_restricted_features(): Restricts inference. Then, sends an inference request with a correct and incorrect header to verify proper functionality.

CI Pipeline ID: 20189649

Background

For more information about restricted features support in Triton, please take a look at this: [Link]

Additonal Changes Made in this PR

Cleaned up includes in tritonfrontend.h and tritonfrontend_pybind.cc.
Added support to @handle_triton_error for non-void functions.
Added support to pass headers in send_and_test_inference_identity() so that we can re-use utility function for restricted features testing.
Skipping tests that use KServeGrpc and tritonclient.grpc because of lack of fork() support provided by python cygrpc. More information in DLIS-7215.

* Wait for HTTP connection when shutting down * Add test for shutdown with live HTTP connection * Use TRITONSERVER_ServerSetExitTimeout() API * Variable name update * Stop HTTP service immediately after all connections close * Remove unused include * Remove accept new connection check * Adjust test for 'Remove accept new connection check' and add testing for countdown restart * Fix non exit timeout supported endpoints * Improve existing shutdown test reliability on extra http shutdown delay * Fix gap between decided to close socket and actually close socket * Start rejecting new connections once shutdown polling start * Group checking logic

* Eanable autodocs for python client library * Fixing the format and spelling

* Deprecate dynamic log file * Update error message

…7093)

* Add async execute decoupled test * Add decoupled bls async exec test * Enhance test with different durations for concurrent executes

Add trace_mode and trace_config to getTraceSettingsAPI --------- Co-authored-by: Ryan McCormick <[email protected]>

…nds (#7083) Validate that memory offset and byte size requested is not out of bounds of registered memory. Previously in #6914 we checked out of bounds offset for shared memory requests. This PR also adds more testing to verify the block of memory is in fact in bounds. Client change: triton-inference-server/client#565

)

Testing for: triton-inference-server/client#387 Client PR: triton-inference-server/client#465

* Fix state complete_ race condition * Add delay and error checking to StreamInferResponseComplete * Add test for gRPC decoupled infer complete flag

…rom server (#7129)

* Re-enable PA trace testing but remove setting trace file * Address feedback * Address feedback * Address feedback * Fix bug * Address feedback

Fixes windows build issues caused by: #7083

) * Lower concurrency with more repetition * Use larger window

* Clarify instance group documentation for ensemble * Review comment

* Add metrics model namespacing label test * Add test for namespace off

* Fix TensorRT-LLM (#7142) * TRT-LLM build * Update versions * Remove statment, as unused * Remove cache * add cmake option to set CXX11 ABI * Mchornyi krish 24.04 (#7149) * Enable TensorRT-LLM build outside of CMake * TensorRT-LLM requires lower version of cuDNN * Format --------- Co-authored-by: krishung5 <[email protected]> * Update README and versions for 2.45.0 / 24.04 (#7096) * Update README and versions for 2.45.0 / 24.04 * Update ONNX Runtime version - 1.17.3 --------- Co-authored-by: krishung5 <[email protected]>

* Validate CUDA SHM region registration size * Add test * Refactor * Fix error message * Address comment * Address comment * Address comment * Make detailed error internal. Only pass the general error message to the client * Replace 64byte with DEFAULT_SHM_BYTE_SIZE * Fix L0_grpc * Update comments * Update comments

Fix python client Shm Leak

* Add test for sequence state after cancellation * Regroup infer calls * Remove unused variable

…7871)

…ts) (#7855)

Co-authored-by: GuanLuo <[email protected]>

…7849)

Co-authored-by: Ryan McCormick <[email protected]>

Co-authored-by: Meenakshi Sharma <[email protected]> Co-authored-by: Kyle McGill <[email protected]>

rmccorm4 · 2024-12-20T19:39:41Z

qa/L0_python_api/test.sh

+# TODO: [DLIS-7215] Run tritonclient.grpc as separate process
+# Currently, when we run tritonclient.grpc with tritonserver in the same process,
+# When a fork() is called by tritonserver on model load, lack of support in cygrpc
+# causes the python process to abort/crash without being able to be caught by pytest.


Can you elaborate or share references? I don't understand how the client is related to server/model fork, and what "lack of support in cygrpc" means

Added a comment providing more context into how cygrpc relates to directly imported python packages. Added link for further reading and reference.

src/python/examples/example_model_repository/identity/1/model.py

qa/L0_python_api/test.sh

rmccorm4 · 2024-12-20T22:52:13Z

qa/L0_python_api/test.sh

+# it will non-deterministically abort/crash without being able to be caught by pytest.
+# This is because fork() is called by tritonserver on model load,
+# which attempts to fork the imported libraries and their internal states,
+# and cygrpc (dependency of tritonclient.grpc) does not officially support fork().


However, if the application only instantiate gRPC Python objects after calling fork(), then fork() will work normally, since there is no C extension binding at this point.

If the claim is that we're calling fork() during model load, aren't we instantiating grpc (client) after the fork() if server/service have already started up?

Can you ever reproduce this by only running a single test case repeatedly? If not, it's possible it's coming from the threshold between test cases.

From, my understanding it is an internal state set upon import:

import tritonclient.grpc as grpcclient while True: server = utils.startup_server() sleep(1) utils.teardown_server(server)

GuanLuo · 2024-12-20T23:06:43Z

qa/L0_python_api/test.sh

+# This is because fork() is called by tritonserver on model load,
+# which attempts to fork the imported libraries and their internal states,
+# and cygrpc (dependency of tritonclient.grpc) does not officially support fork().
+# Reference: https://github.com/grpc/grpc/blob/master/doc/fork_support.md


Couldn't you just set env var as mentioned here so that the tests work with fork? https://github.com/grpc/grpc/blob/master/doc/fork_support.md#current-status

Attempted to set and run test suite locally, but the test cases were still failing.

GuanLuo · 2024-12-20T23:12:32Z

qa/L0_python_api/test_kserve.py

+            rf.create_feature_group(
+                key=42, value="Secret to the Universe", features=[Feature.HEALTH]
+            )
+        with pytest.raises(Exception):


Check if exception enclose informative message

src/python/examples/example_model_repository/identity/config.pbtxt

…btxt Co-authored-by: GuanLuo <[email protected]>

kthui and others added 30 commits April 8, 2024 18:33

Enable autodocs for python client library API documentation (#7082)

10f1c8d

* Eanable autodocs for python client library * Fixing the format and spelling

Updated vllm version (#7095)

5e20ef6

Disable Dynamic Log File (#7092)

52f97b5

* Deprecate dynamic log file * Update error message

Validate system shared memory region size when registering a region (#…

159b060

…7093)

Decoupled Async Execute (#7062)

196caf0

* Add async execute decoupled test * Add decoupled bls async exec test * Enhance test with different durations for concurrent executes

Add trace mode and trace config entries in trace settings API (#7050)

5b739db

Add trace_mode and trace_config to getTraceSettingsAPI --------- Co-authored-by: Ryan McCormick <[email protected]>

Update 'main' to track development of 2.46.0 / 24.05 (#7105)

0a4c87b

Add copyright for tritonclient_api (#7109)

b889687

Disable dynamic trace file (#7106)

7529f0e

Update L0_logging to reflect error when trying to update log_file (#7112

e116a2a

)

Add new cached channel test (#7123)

8e88f2c

Testing for: triton-inference-server/client#387 Client PR: triton-inference-server/client#465

Fix gRPC frontend race condition (#7110)

e965287

* Fix state complete_ race condition * Add delay and error checking to StreamInferResponseComplete * Add test for gRPC decoupled infer complete flag

Remove client testing of server trace to match discontinued support f…

233c4b2

…rom server (#7129)

Re-enable PA trace testing but remove setting trace file (#7131)

2de09ee

* Re-enable PA trace testing but remove setting trace file * Address feedback * Address feedback * Address feedback * Fix bug * Address feedback

Fix windows build for shared memory bound checking(#7137)

dba31c2

Fixes windows build issues caused by: #7083

Fix test for cached channels (#7130)

09b34be

Use a lower concurrency with more repetition for L0_memory_growth (#7127

1da454c

) * Lower concurrency with more repetition * Use larger window

Replace deprecated tritongrpcclient package (#7061)

f243276

Avoid the HTTP Error 403: rate limit exceeded error (#7155)

365b86a

Clarify instance group documentation for ensemble (#7162)

987deaa

* Clarify instance group documentation for ensemble * Review comment

Add extra footer to documentation (#7163)

d432266

Add metrics model namespacing label test (#7141)

5239ff0

* Add metrics model namespacing label test * Add test for namespace off

Remove meetup note now that the event has completed (#7179)

3c99c95

Fix python client Shm Leak (#7172)

ee6d238

Fix python client Shm Leak

Add test for sequence state after cancellation (#7167)

c724193

* Add test for sequence state after cancellation * Regroup infer calls * Remove unused variable

Rename triton_tensorrtllm_worker -> trtllmExecutorWorker (#7194)

27c2142

rmccorm4 and others added 18 commits December 11, 2024 13:29

fix: Lock httpx version to fix L0_openai--trtllm test failures (#7870)

c87259a

fix: Remove .Server subclass to reflect 24.12 tritonfrontend version (#…

440c827

…7871)

test: Fix requested output deleting extra outputs (#7866)

11af829

Update generated Dockerfile (#7876)

fc0fe6b

build: Adding b64 dependency to relevant targets (fix L0_build_varian…

e8a6090

…ts) (#7855)

Merge branch 'main' into kprashanth-tritonfrontend-rfeatures

8af3a38

Update qa/L0_python_api/test_kserve.py

44cbfad

Co-authored-by: GuanLuo <[email protected]>

removing redundant testing

8b3aa4e

Testing invalid value and no headers

141a440

fix: Handle dict type for content field in Chat Completions endpoint (#…

fedcfac

…7849)

ci: Fix Windows CI Errors (#7837)

587f877

Co-authored-by: Ryan McCormick <[email protected]>

docs: Re-structure User Guides for Discoverability (#7807)

9758344

Co-authored-by: Meenakshi Sharma <[email protected]> Co-authored-by: Kyle McGill <[email protected]>

error_mapping comment

e7642ee

comment formatting

4509ceb

update and remove functionality added to RF

16b4d1b

Adding testing

b52b29c

Skipping grpc tests

157e76a

Merge branch 'main' into kprashanth-tritonfrontend-rfeatures

9555a48

rmccorm4 reviewed Dec 20, 2024

View reviewed changes

Updating restricted features docs and adding comments

ae5d55a

github-advanced-security bot found potential problems Dec 20, 2024

View reviewed changes

src/python/examples/example_model_repository/identity/1/model.py Fixed Show fixed Hide fixed

KrishnanPrash added 3 commits December 20, 2024 12:32

removed unused import and added comment

bf51bc1

Fix no endpoint/no file-system build

47ab2af

Fix for Metrics/RestrictedFeatures path

3ed8edb

rmccorm4 reviewed Dec 20, 2024

View reviewed changes

qa/L0_python_api/test.sh Outdated Show resolved Hide resolved

rmccorm4 reviewed Dec 20, 2024

View reviewed changes

correcting ticket number

ab34c9d

GuanLuo reviewed Dec 20, 2024

View reviewed changes

Update src/python/examples/example_model_repository/identity/config.p…

793d826

…btxt Co-authored-by: GuanLuo <[email protected]>

pvijayakrish force-pushed the kprashanth-tritonfrontend-rfeatures branch from d2f0f9b to 793d826 Compare January 15, 2025 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adding RestrictedFeatures Support to the Python Frontend Bindings #7775

feat: Adding RestrictedFeatures Support to the Python Frontend Bindings #7775

KrishnanPrash commented Nov 8, 2024 •

edited

Loading

rmccorm4 Dec 20, 2024

KrishnanPrash Dec 20, 2024

rmccorm4 Dec 20, 2024 •

edited

Loading

KrishnanPrash Dec 20, 2024

GuanLuo Dec 20, 2024

KrishnanPrash Dec 20, 2024

GuanLuo Dec 20, 2024

feat: Adding RestrictedFeatures Support to the Python Frontend Bindings #7775

Are you sure you want to change the base?

feat: Adding RestrictedFeatures Support to the Python Frontend Bindings #7775

Conversation

KrishnanPrash commented Nov 8, 2024 • edited Loading

What does the PR do?

Where should the reviewer start?

Test plan:

Background

Additonal Changes Made in this PR

rmccorm4 Dec 20, 2024

Choose a reason for hiding this comment

KrishnanPrash Dec 20, 2024

Choose a reason for hiding this comment

rmccorm4 Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

KrishnanPrash Dec 20, 2024

Choose a reason for hiding this comment

GuanLuo Dec 20, 2024

Choose a reason for hiding this comment

KrishnanPrash Dec 20, 2024

Choose a reason for hiding this comment

GuanLuo Dec 20, 2024

Choose a reason for hiding this comment

KrishnanPrash commented Nov 8, 2024 •

edited

Loading

rmccorm4 Dec 20, 2024 •

edited

Loading