-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added capability to customize read and connection timeout for bedrock models #5529
base: 0.2
Are you sure you want to change the base?
Conversation
@akshay20t please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
To quick fix the formatting error. Run |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 0.2 #5529 +/- ##
==========================================
+ Coverage 29.26% 29.50% +0.23%
==========================================
Files 117 117
Lines 13038 13040 +2
Branches 2473 2473
==========================================
+ Hits 3816 3847 +31
+ Misses 8876 8846 -30
- Partials 346 347 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks @ekzhu! Done with reformatting as per pre-commit configuration of the project. |
Please agree to the contributor license |
Why are these changes needed?
With existing AutoGen implementation of Bedrock client class
BedrockClient
, the Amazon Bedrock inference request gets timed out when the input token size is large (>~10,000 tokens). This happens because the current implementation uses default read and connection timeout values (60 seconds) fromboto3
while initializing theConfig
class. Due to this default value thecreate
method inBedrockClient
class raisesRuntimeError
even though bedrock model is still generating response (as monitored in AWS CloudWatch).With increasing context window size of LLM in today's Generative AI era, it is important to introduce capability for user-defined read timeouts to support large input tokens for agents.
Related issue number
Closes #5505
Checks