anthropic cache #423

skull8888888 · 2025-03-02T15:39:20Z

Important

Refactor cost estimation and token handling to support Anthropic cache tokens, updating cost functions and span attributes accordingly.

Cost Estimation:
- Introduced estimate_input_cost and estimate_output_cost in mod.rs to handle input and output token cost estimation separately.
- Added handling for input_cached_price_per_million and additional_prices in LLMPriceEntry.
- Updated calculate_cost in utils.rs to use i64 for tokens.
Token Handling:
- Replaced TokensKind with InputTokens struct in mod.rs and runner.rs.
- InputTokens struct now includes regular_input_tokens, cache_write_tokens, and cache_read_tokens.
- Updated input_tokens() in spans.rs to handle Anthropic cache tokens.
Span Attributes:
- Added constants for Anthropic cache tokens in span_attributes.rs.
- Updated get_llm_usage_for_span in utils.rs to calculate costs using new token structure.
Miscellaneous:
- Removed TokensKind enum and related logic from mod.rs and runner.rs.
- Minor refactoring in spans.rs to accommodate new token handling logic.

^{This description was created by}^{for cf87620. It will automatically update as commits are pushed.}

ellipsis-dev

❌ Changes requested. Reviewed everything up to cf87620 in 2 minutes and 42 seconds

More details

Looked at 395 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 11 drafted comments based on config settings.

1. app-server/src/language_model/costs/mod.rs:50

Draft comment:
In estimate_output_cost, the cache hit branch returns price.input_price_per_million instead of price.output_price_per_million. This might be a logic error.
Reason this comment was not posted:
Comment was on unchanged code.

2. app-server/src/traces/span_attributes.rs:129

Draft comment:
When subtracting cache tokens from the regular input tokens for the anthropic provider, ensure the result does not go negative. Consider using saturating subtraction.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. app-server/src/traces/utils.rs:45

Draft comment:
Using input_tokens.total() for the total number of input tokens assumes cache tokens are part of total usage. Verify this is the intended behavior and document it if so.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

4. app-server/src/language_model/providers/utils.rs:22

Draft comment:
The change of calculate_cost to use i64 for token counts is consistent, but ensure that all callers correctly transform token counts to match this type.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

5. app-server/src/language_model/costs/mod.rs:49

Draft comment:
On cache hit, 'estimate_output_cost' uses 'price.input_price_per_million' instead of 'price.output_price_per_million'. This likely mismatches the intended output cost calculation.
Reason this comment was not posted:
Comment was on unchanged code.

6. app-server/src/language_model/costs/mod.rs:71

Draft comment:
Caching is currently disabled in 'estimate_input_cost' (cache_res set to None). If caching is intended here, consider re-enabling it or adding a proper TODO comment.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The code clearly shows caching being intentionally disabled through commenting out the cache lookup and setting cache_res = None. This seems like a deliberate choice rather than an oversight, since it required explicit code changes. The function still writes to the cache even though it doesn't read from it, suggesting this may be a temporary measure or part of a larger change. Without more context about why caching was disabled, we can't assume this is a mistake.
I could be wrong about this being intentional - commented out code with no explanation is often a sign of work in progress or forgotten changes. The asymmetry between reading and writing to cache is suspicious.
While the code pattern is unusual, we don't have enough context to know if this is actually a problem. The author clearly made this change deliberately, and may have good reasons we can't see.
The comment should be removed since it questions what appears to be an intentional change without strong evidence that it's actually a problem.

7. app-server/src/traces/span_attributes.rs:141

Draft comment:
When subtracting cache tokens from regular input tokens for 'anthropic', ensure the result does not become negative.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

8. app-server/src/traces/span_attributes.rs:180

Draft comment:
Comparing provider name with a newly created String ('anthropic'.to_string()) can be optimized; consider comparing in a case-insensitive manner without allocation.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

9. app-server/src/traces/span_attributes.rs:3

Draft comment:
There appears to be a typo in the URL on line 3. The word 'openllmetry' might be intended to be 'opentelemetry'. Please verify and correct if needed.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

10. app-server/src/traces/span_attributes.rs:34

Draft comment:
The constant 'GEN_AI_ANTHROPIC_CACHE_WRITE_INPUT_TOKENS' is assigned the string literal 'gen_ai.usage.cache_creation_input_tokens'. The use of 'cache_creation' in the string seems inconsistent with the 'WRITE' in the constant name. Please verify if this mismatch is intentional or if it should be aligned (e.g., 'cache_write_input_tokens').
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment is about a newly added constant. The inconsistency between "WRITE" and "creation" is real, but is this worth fixing? The string value is likely what gets sent in telemetry, while the constant name is just for internal code reference. Changing either could potentially break things. The comment is asking for verification rather than making a strong case for why this needs to change.
The naming inconsistency could indicate a deeper semantic issue that I'm missing. Maybe there's an important distinction between "write" and "creation" in this context.
Without more context about the telemetry system and why these specific terms were chosen, we can't be confident that this inconsistency is actually a problem that needs fixing.
The comment violates the rule about not asking for verification ("Please verify if..."). It's also not making a strong case that this is definitely an issue that needs fixing.

11. app-server/src/traces/spans.rs:631

Draft comment:
There's an extra trailing comma in the call to format! on line 631. While Rust may allow trailing commas in macro invocations, consider removing it to keep the code clear and consistent.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_p7rVWhqWyWamoC94

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-03-02T15:42:09Z

app-server/src/language_model/costs/mod.rs

-            _input_cached_price_per_million: value.input_cached_price_per_million,
-            _additional_prices: serde_json::from_value(value.additional_prices).unwrap(),
+            input_cached_price_per_million: value.input_cached_price_per_million,
+            additional_prices: serde_json::from_value(value.additional_prices).unwrap(),


Using unwrap() when deserializing additional_prices (line 34) could cause a panic on malformed data. Consider handling the error gracefully.

dinmukhamedm

Many nits,

~~one major comment: I'd need to check what we do for openai on the instrumentation side, and see if we need to match against anthropic provider name in this PR at all~~

Nevermind, looks like anthropic is the only instrumentation where this is used. Then LGTM overall

dinmukhamedm · 2025-03-02T17:11:06Z

app-server/src/traces/span_attributes.rs

+
+pub const GEN_AI_ANTHROPIC_CACHE_WRITE_INPUT_TOKENS: &str =
+    "gen_ai.usage.cache_creation_input_tokens";
+pub const GEN_AI_ANTHROPIC_CACHE_READ_INPUT_TOKENS: &str = "gen_ai.usage.cache_read_input_tokens";


nit, non-blocking: I think we decided with nirga that this is not named anthropic for that some other providers charge using similar pricing model. So we could remove "anthropic" from variable names

dinmukhamedm · 2025-03-02T17:12:50Z

app-server/src/traces/spans.rs

+                0
+            };
+
+        if self.provider_name() == Some("anthropic".to_string()) {


nit (again non-blocking) we may want to test this matching with opentelemetry-instrumentation-langchain with langchain-anthropic

dinmukhamedm · 2025-03-02T17:14:31Z

app-server/src/traces/spans.rs

+            cache_read_tokens: 0,
+        };
+
+        let regular_input_tokens =


I'd probably rename this variable to something like total_input_tokens, because regardless if there is cache gen_ai.usage.input_tokens (or gen_ai.usage.prompt_tokens) is meant to represent the total number of input tokens

dinmukhamedm · 2025-03-02T17:15:54Z

app-server/src/traces/spans.rs

+        let mut input_tokens = InputTokens {
+            regular_input_tokens: 0,
+            cache_write_tokens: 0,
+            cache_read_tokens: 0,
+        };


I'm probably being too picky, but I'd prefer if this was immutable for readability and more idiomatic rust. I.e. let's construct the struct at return (in each of if/else)

dinmukhamedm · 2025-03-02T17:16:58Z

app-server/src/traces/spans.rs

+        if let Some(name) = name {
+            Some(name.to_lowercase().trim().to_string())
+        } else {
+            None


super nit: can do with name.map(|name| name.to_lowercase()...)

anthropic cache

cf87620

ellipsis-dev bot reviewed Mar 2, 2025

View reviewed changes

dinmukhamedm reviewed Mar 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anthropic cache #423

anthropic cache #423

skull8888888 commented Mar 2, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Mar 2, 2025

dinmukhamedm left a comment •

edited

Loading

dinmukhamedm Mar 2, 2025

dinmukhamedm Mar 2, 2025

dinmukhamedm Mar 2, 2025

dinmukhamedm Mar 2, 2025

dinmukhamedm Mar 2, 2025

anthropic cache #423

Are you sure you want to change the base?

anthropic cache #423

Conversation

skull8888888 commented Mar 2, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Mar 2, 2025

Choose a reason for hiding this comment

dinmukhamedm left a comment • edited Loading

Choose a reason for hiding this comment

dinmukhamedm Mar 2, 2025

Choose a reason for hiding this comment

dinmukhamedm Mar 2, 2025

Choose a reason for hiding this comment

dinmukhamedm Mar 2, 2025

Choose a reason for hiding this comment

dinmukhamedm Mar 2, 2025

Choose a reason for hiding this comment

dinmukhamedm Mar 2, 2025

Choose a reason for hiding this comment

skull8888888 commented Mar 2, 2025 •

edited by ellipsis-dev bot

Loading

dinmukhamedm left a comment •

edited

Loading