You need a supported version of python >= 3.10.12, < 3.12.0
pyenv install 3.11.8
and make sure poetry is using it, usually:
poetry env use ~./.pyenv/versions/3.11.8/bin/python
poetry install
poetry run python run.py
We maintian three kinds of tests, http level tests (with both sync and async clients), openai sdk tests (for components that don't require db authentication), and astra-assistants client library tests for end to end functionality.
poetry run pytest --disable-warnings
The client library itself also has it's own test suite.
The assistant-api-server repo is a python server app that heavily depends on fastAPI, pydantic, and the DataStax python driver. It relies on LiteLLM for third party LLM support and we've been very happy with their responsiveness in github as well as their ability to quickly add new models as the AI landscape evolves.
The app is mostly stateless (with the exception of a db connection cache) and all authentication tokens and LLM provider configuration are passed as http headers. The astra-assistants python library makes it easy for users to just store these configurations as environment variables and it takes care of the rest. We serve the app in production using uvicorn and scale it in kubernetes using HPA.
The app consists of both generated and hand written code. The generated code is based on OpenAI's openapi spec and generated with openapi-generator-cli from openapi-generator.tech. It mostly lives in the openapi_server
directory. Leveraging the openapi spec was one of the first design decisions we made and it was a no brainer, OpenAI's spec is of very high quality (they use it to generate their SDKs) and using it ensures that all the types for all the endpoints are built correctly and enforced by pydantic.
We keep track of what version of the openai openapi spec we're working with in OPEN_API_SPEC_HASH
The hand written code takes the method stubs from open-api-server/apis
and implements them using the types from openapi-server/models
and openapi-server_v1/models
inside of impl/routes
and impl/routes_v2
. The third party LLM support is abstracted in impl/services/inference_utils.py
and the database interactions occur in impl/astra_vector.py
. We collect throughput, duration, and payload size metrics and export them with a prometheus exporter which is exposed with a /metrics
endpoint. The prometheus exporter is configured to work using prom's multi-process collector to support our multi process uvicorn production deployment.
Finally in the tests
directory we have implemented tests and CI using both an http client directly (originally generated by openapi-generator.tech and tweaked manually) and custom tests that use both the openai SDK and our astra-assistants library directly.
In impl/main.py we disambiguate between v1 and v2 openai headers and route accordingly.
The client openai sdk wrapper lives in client and is implemented as a single file python script.