Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(langchain): add support for langchain==0.1 (#8563)
Fixes #8212. ### For Reviewers **NOTE: Please disregard the LOC count in this PR, 90% of the lines of code changed are cassette test data / moved files / snapshots files / requirements lockfiles.** The important files to review are: - `ddtrace/contrib/langchain/patch.py`: version gating which methods to patch - `releasenotes/notes/feat-support-langchain-0-1-0d8e0ddd6248c4ed.yaml` for release note - `tests/contrib/langchain/test_langchain_community.py` for testing langchain >= 0.1 - `tests/contrib/langchain/test_langchain_patch.py` for testing which methods are traced ## Change Summary This PR adds support for `langchain>=0.1`, which has deprecated multiple traced methods from `langchain<=0.0.354` for removal in the upcoming `langchain==0.2.0` release. Whereas the older version of langchain was all contained in one library `langchain`, the new version has split its library into separate subpackages: - `langchain_core`: Core base classes for llms, chat models - `langchain_community`: Community subpackage that contains integrations for different libraries (i.e. `cohere, anthropic, etc...`) - `langchain_<partner_name>`: Individual subpackages for each partner integration such as `langchain_openai`, `langchain_pinecone`. It appears that the ultimate goal is for all of the code in `langchain_community` to get extracted to these partner subpackages. - `langchain`: The old library which contains still a lot of base methods such as Chains. tldr; the functionality in this PR adds version gating the langchain integration to patch the correct methods/import paths with the correct traced functions. No functionality has been added. ### Risks Currently a design flaw of the langchain integration is that it keeps a static list of embeddings/vectorstore class names to then patch `langchain_community.embeddings/vectorstores.<CLASS_NAME>`. However with langchain's apparent push for moving integrations to their individual partner subpackages, this will break our patching for embeddings/vectorstores since now we have to patch each subpackage individually. Currently only pinecone/openai (which our integration specifically patches) and elasticsearch/vertexAI (not yet patched) have their own subpackages, so the risk is low since there will be no breaking changes (and [Langchain promises no more breaking changes on minor releases](https://blog.langchain.dev/week-of-1-22-24-langchain-release-notes/#:~:text=No%20more%20breaking%20changes%20on%20a%20minor%20version%20release.)). Our proposed approach of resolving this issue is to offer limited tracing support for embeddings/vectorstores with individual subpackages (openai/pinecone for now) at first, then gradually expand our support once the others become available. We will likely need to redesign/refactor the integration in that case. ## Testing This PR adds a separate venv for `langchain-community/langchain-core/langchain-openai/langchain-pinecone`, and only tests with python 3.10 and over. The reason is that python 3.9 and under requires a separate set of test cassette files, and that is an unnecessary number of files/tests. ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. - [x] If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from `@DataDog/security-design-and-guidance`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) --------- Co-authored-by: Yun Kim <[email protected]>
- Loading branch information