-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
anthropic cache #423
base: dev
Are you sure you want to change the base?
anthropic cache #423
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to cf87620 in 2 minutes and 42 seconds
More details
- Looked at
395
lines of code in6
files - Skipped
0
files when reviewing. - Skipped posting
11
drafted comments based on config settings.
1. app-server/src/language_model/costs/mod.rs:50
- Draft comment:
Inestimate_output_cost
, the cache hit branch returnsprice.input_price_per_million
instead ofprice.output_price_per_million
. This might be a logic error. - Reason this comment was not posted:
Comment was on unchanged code.
2. app-server/src/traces/span_attributes.rs:129
- Draft comment:
When subtracting cache tokens from the regular input tokens for the anthropic provider, ensure the result does not go negative. Consider using saturating subtraction. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. app-server/src/traces/utils.rs:45
- Draft comment:
Usinginput_tokens.total()
for the total number of input tokens assumes cache tokens are part of total usage. Verify this is the intended behavior and document it if so. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
4. app-server/src/language_model/providers/utils.rs:22
- Draft comment:
The change ofcalculate_cost
to usei64
for token counts is consistent, but ensure that all callers correctly transform token counts to match this type. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
5. app-server/src/language_model/costs/mod.rs:49
- Draft comment:
On cache hit, 'estimate_output_cost' uses 'price.input_price_per_million' instead of 'price.output_price_per_million'. This likely mismatches the intended output cost calculation. - Reason this comment was not posted:
Comment was on unchanged code.
6. app-server/src/language_model/costs/mod.rs:71
- Draft comment:
Caching is currently disabled in 'estimate_input_cost' (cache_res set to None). If caching is intended here, consider re-enabling it or adding a proper TODO comment. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
The code clearly shows caching being intentionally disabled through commenting out the cache lookup and setting cache_res = None. This seems like a deliberate choice rather than an oversight, since it required explicit code changes. The function still writes to the cache even though it doesn't read from it, suggesting this may be a temporary measure or part of a larger change. Without more context about why caching was disabled, we can't assume this is a mistake.
I could be wrong about this being intentional - commented out code with no explanation is often a sign of work in progress or forgotten changes. The asymmetry between reading and writing to cache is suspicious.
While the code pattern is unusual, we don't have enough context to know if this is actually a problem. The author clearly made this change deliberately, and may have good reasons we can't see.
The comment should be removed since it questions what appears to be an intentional change without strong evidence that it's actually a problem.
7. app-server/src/traces/span_attributes.rs:141
- Draft comment:
When subtracting cache tokens from regular input tokens for 'anthropic', ensure the result does not become negative. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. app-server/src/traces/span_attributes.rs:180
- Draft comment:
Comparing provider name with a newly created String ('anthropic'.to_string()) can be optimized; consider comparing in a case-insensitive manner without allocation. - Reason this comment was not posted:
Confidence changes required:50%
<= threshold50%
None
9. app-server/src/traces/span_attributes.rs:3
- Draft comment:
There appears to be a typo in the URL on line 3. The word 'openllmetry' might be intended to be 'opentelemetry'. Please verify and correct if needed. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. app-server/src/traces/span_attributes.rs:34
- Draft comment:
The constant 'GEN_AI_ANTHROPIC_CACHE_WRITE_INPUT_TOKENS' is assigned the string literal 'gen_ai.usage.cache_creation_input_tokens'. The use of 'cache_creation' in the string seems inconsistent with the 'WRITE' in the constant name. Please verify if this mismatch is intentional or if it should be aligned (e.g., 'cache_write_input_tokens'). - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
The comment is about a newly added constant. The inconsistency between "WRITE" and "creation" is real, but is this worth fixing? The string value is likely what gets sent in telemetry, while the constant name is just for internal code reference. Changing either could potentially break things. The comment is asking for verification rather than making a strong case for why this needs to change.
The naming inconsistency could indicate a deeper semantic issue that I'm missing. Maybe there's an important distinction between "write" and "creation" in this context.
Without more context about the telemetry system and why these specific terms were chosen, we can't be confident that this inconsistency is actually a problem that needs fixing.
The comment violates the rule about not asking for verification ("Please verify if..."). It's also not making a strong case that this is definitely an issue that needs fixing.
11. app-server/src/traces/spans.rs:631
- Draft comment:
There's an extra trailing comma in the call to format! on line 631. While Rust may allow trailing commas in macro invocations, consider removing it to keep the code clear and consistent. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_p7rVWhqWyWamoC94
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
_input_cached_price_per_million: value.input_cached_price_per_million, | ||
_additional_prices: serde_json::from_value(value.additional_prices).unwrap(), | ||
input_cached_price_per_million: value.input_cached_price_per_million, | ||
additional_prices: serde_json::from_value(value.additional_prices).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using unwrap()
when deserializing additional_prices
(line 34) could cause a panic on malformed data. Consider handling the error gracefully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many nits,
one major comment: I'd need to check what we do for openai on the instrumentation side, and see if we need to match against anthropic
provider name in this PR at all
Nevermind, looks like anthropic is the only instrumentation where this is used. Then LGTM overall
|
||
pub const GEN_AI_ANTHROPIC_CACHE_WRITE_INPUT_TOKENS: &str = | ||
"gen_ai.usage.cache_creation_input_tokens"; | ||
pub const GEN_AI_ANTHROPIC_CACHE_READ_INPUT_TOKENS: &str = "gen_ai.usage.cache_read_input_tokens"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, non-blocking: I think we decided with nirga that this is not named anthropic for that some other providers charge using similar pricing model. So we could remove "anthropic" from variable names
0 | ||
}; | ||
|
||
if self.provider_name() == Some("anthropic".to_string()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (again non-blocking) we may want to test this matching with opentelemetry-instrumentation-langchain with langchain-anthropic
cache_read_tokens: 0, | ||
}; | ||
|
||
let regular_input_tokens = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably rename this variable to something like total_input_tokens
, because regardless if there is cache gen_ai.usage.input_tokens
(or gen_ai.usage.prompt_tokens
) is meant to represent the total number of input tokens
let mut input_tokens = InputTokens { | ||
regular_input_tokens: 0, | ||
cache_write_tokens: 0, | ||
cache_read_tokens: 0, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably being too picky, but I'd prefer if this was immutable for readability and more idiomatic rust. I.e. let's construct the struct at return (in each of if/else)
if let Some(name) = name { | ||
Some(name.to_lowercase().trim().to_string()) | ||
} else { | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: can do with name.map(|name| name.to_lowercase()...)
Important
Refactor cost estimation and token handling to support Anthropic cache tokens, updating cost functions and span attributes accordingly.
estimate_input_cost
andestimate_output_cost
inmod.rs
to handle input and output token cost estimation separately.input_cached_price_per_million
andadditional_prices
inLLMPriceEntry
.calculate_cost
inutils.rs
to usei64
for tokens.TokensKind
withInputTokens
struct inmod.rs
andrunner.rs
.InputTokens
struct now includesregular_input_tokens
,cache_write_tokens
, andcache_read_tokens
.input_tokens()
inspans.rs
to handle Anthropic cache tokens.span_attributes.rs
.get_llm_usage_for_span
inutils.rs
to calculate costs using new token structure.TokensKind
enum and related logic frommod.rs
andrunner.rs
.spans.rs
to accommodate new token handling logic.This description was created by
for cf87620. It will automatically update as commits are pushed.