Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: feat: Configurable timeout and retry for custom endpoints #5568

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jameslamine
Copy link
Contributor

Summary

This PR adds two new configuration options for custom endpoints:

  • timeout: Sets request timeout in milliseconds (default: 10 minutes)
  • maxRetries: Sets maximum retry attempts for failed requests (default: 2)

These settings map directly to the OpenAI SDK configuration options and help prevent hanging requests and improve error handling through automatic retries.

Fixes #5567

Example Usage:

endpoints:
  custom:
    - name: "Example"
      timeout: 5000      # 5 second timeout
      maxRetries: 3      # Retry failed requests up to 3 times

Change Type

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Testing

  1. Configure a custom endpoint with timeout and maxRetries:
endpoints:
  custom:
    - name: "Test Endpoint"
      timeout: 5000
      maxRetries: 3
  1. Test scenarios:
  • Normal API calls work as expected
  • Set a low timeout (1ms) and saw error as expected

Checklist

  • I have performed a self-review of my own code
  • My changes do not introduce new warnings
  • Local unit tests pass with my changes

@jameslamine jameslamine changed the title feat: Configurable timeout and retry for custom endpoints WIP: feat: Configurable timeout and retry for custom endpoints Jan 31, 2025
@jameslamine
Copy link
Contributor Author

After reading the openai SDK, it looks like this timeout is applied to the entire streaming request. Ideally this would just be a timeout between chunks instead.

I'm going to re-think this. It might be better to set the underlying socket timeout rather than the OpenAI client's higher-level timeout and retry setting.

@danny-avila danny-avila marked this pull request as draft January 31, 2025 16:22
@danny-avila
Copy link
Owner

danny-avila commented Jan 31, 2025

timeout between chunks instead

This can be simulated at least on the receiving end with streamRate

@jameslamine
Copy link
Contributor Author

jameslamine commented Jan 31, 2025

timeout between chunks instead

This can be simulated at least on the receiving end with streamRate

Thanks! Can you provide more details on what that setting does? The issue I'm trying to solve is we want to abort TCP connections which are hanging or failing to establish. From a user-experience perspective, if the underlying TCP connection to the LLM endpoint is hanging or the initial TCP connection handshake is hanging it should fail fast and show an error to the user so they can re-try.

It seems like the streamRate controls how quickly tokens are streamed from LibreChat server to the frontend browser. I don't think it would help with hanging TCP connections. Is my understanding correct?

@danny-avila
Copy link
Owner

Got it, misunderstood what you meant. AFAIK there is no timeout between chunks configuration, something like this would have to be custom-built

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants