Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anthropic cache #423

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

anthropic cache #423

wants to merge 1 commit into from

Conversation

skull8888888
Copy link
Collaborator

@skull8888888 skull8888888 commented Mar 2, 2025

Important

Refactor cost estimation and token handling to support Anthropic cache tokens, updating cost functions and span attributes accordingly.

  • Cost Estimation:
    • Introduced estimate_input_cost and estimate_output_cost in mod.rs to handle input and output token cost estimation separately.
    • Added handling for input_cached_price_per_million and additional_prices in LLMPriceEntry.
    • Updated calculate_cost in utils.rs to use i64 for tokens.
  • Token Handling:
    • Replaced TokensKind with InputTokens struct in mod.rs and runner.rs.
    • InputTokens struct now includes regular_input_tokens, cache_write_tokens, and cache_read_tokens.
    • Updated input_tokens() in spans.rs to handle Anthropic cache tokens.
  • Span Attributes:
    • Added constants for Anthropic cache tokens in span_attributes.rs.
    • Updated get_llm_usage_for_span in utils.rs to calculate costs using new token structure.
  • Miscellaneous:
    • Removed TokensKind enum and related logic from mod.rs and runner.rs.
    • Minor refactoring in spans.rs to accommodate new token handling logic.

This description was created by Ellipsis for cf87620. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to cf87620 in 2 minutes and 42 seconds

More details
  • Looked at 395 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 11 drafted comments based on config settings.
1. app-server/src/language_model/costs/mod.rs:50
  • Draft comment:
    In estimate_output_cost, the cache hit branch returns price.input_price_per_million instead of price.output_price_per_million. This might be a logic error.
  • Reason this comment was not posted:
    Comment was on unchanged code.
2. app-server/src/traces/span_attributes.rs:129
  • Draft comment:
    When subtracting cache tokens from the regular input tokens for the anthropic provider, ensure the result does not go negative. Consider using saturating subtraction.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
3. app-server/src/traces/utils.rs:45
  • Draft comment:
    Using input_tokens.total() for the total number of input tokens assumes cache tokens are part of total usage. Verify this is the intended behavior and document it if so.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
4. app-server/src/language_model/providers/utils.rs:22
  • Draft comment:
    The change of calculate_cost to use i64 for token counts is consistent, but ensure that all callers correctly transform token counts to match this type.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
5. app-server/src/language_model/costs/mod.rs:49
  • Draft comment:
    On cache hit, 'estimate_output_cost' uses 'price.input_price_per_million' instead of 'price.output_price_per_million'. This likely mismatches the intended output cost calculation.
  • Reason this comment was not posted:
    Comment was on unchanged code.
6. app-server/src/language_model/costs/mod.rs:71
  • Draft comment:
    Caching is currently disabled in 'estimate_input_cost' (cache_res set to None). If caching is intended here, consider re-enabling it or adding a proper TODO comment.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
    The code clearly shows caching being intentionally disabled through commenting out the cache lookup and setting cache_res = None. This seems like a deliberate choice rather than an oversight, since it required explicit code changes. The function still writes to the cache even though it doesn't read from it, suggesting this may be a temporary measure or part of a larger change. Without more context about why caching was disabled, we can't assume this is a mistake.
    I could be wrong about this being intentional - commented out code with no explanation is often a sign of work in progress or forgotten changes. The asymmetry between reading and writing to cache is suspicious.
    While the code pattern is unusual, we don't have enough context to know if this is actually a problem. The author clearly made this change deliberately, and may have good reasons we can't see.
    The comment should be removed since it questions what appears to be an intentional change without strong evidence that it's actually a problem.
7. app-server/src/traces/span_attributes.rs:141
  • Draft comment:
    When subtracting cache tokens from regular input tokens for 'anthropic', ensure the result does not become negative.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. app-server/src/traces/span_attributes.rs:180
  • Draft comment:
    Comparing provider name with a newly created String ('anthropic'.to_string()) can be optimized; consider comparing in a case-insensitive manner without allocation.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None
9. app-server/src/traces/span_attributes.rs:3
  • Draft comment:
    There appears to be a typo in the URL on line 3. The word 'openllmetry' might be intended to be 'opentelemetry'. Please verify and correct if needed.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. app-server/src/traces/span_attributes.rs:34
  • Draft comment:
    The constant 'GEN_AI_ANTHROPIC_CACHE_WRITE_INPUT_TOKENS' is assigned the string literal 'gen_ai.usage.cache_creation_input_tokens'. The use of 'cache_creation' in the string seems inconsistent with the 'WRITE' in the constant name. Please verify if this mismatch is intentional or if it should be aligned (e.g., 'cache_write_input_tokens').
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    The comment is about a newly added constant. The inconsistency between "WRITE" and "creation" is real, but is this worth fixing? The string value is likely what gets sent in telemetry, while the constant name is just for internal code reference. Changing either could potentially break things. The comment is asking for verification rather than making a strong case for why this needs to change.
    The naming inconsistency could indicate a deeper semantic issue that I'm missing. Maybe there's an important distinction between "write" and "creation" in this context.
    Without more context about the telemetry system and why these specific terms were chosen, we can't be confident that this inconsistency is actually a problem that needs fixing.
    The comment violates the rule about not asking for verification ("Please verify if..."). It's also not making a strong case that this is definitely an issue that needs fixing.
11. app-server/src/traces/spans.rs:631
  • Draft comment:
    There's an extra trailing comma in the call to format! on line 631. While Rust may allow trailing commas in macro invocations, consider removing it to keep the code clear and consistent.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_p7rVWhqWyWamoC94


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

_input_cached_price_per_million: value.input_cached_price_per_million,
_additional_prices: serde_json::from_value(value.additional_prices).unwrap(),
input_cached_price_per_million: value.input_cached_price_per_million,
additional_prices: serde_json::from_value(value.additional_prices).unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using unwrap() when deserializing additional_prices (line 34) could cause a panic on malformed data. Consider handling the error gracefully.

Copy link
Member

@dinmukhamedm dinmukhamedm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many nits,

one major comment: I'd need to check what we do for openai on the instrumentation side, and see if we need to match against anthropic provider name in this PR at all

Nevermind, looks like anthropic is the only instrumentation where this is used. Then LGTM overall

Comment on lines +33 to +36

pub const GEN_AI_ANTHROPIC_CACHE_WRITE_INPUT_TOKENS: &str =
"gen_ai.usage.cache_creation_input_tokens";
pub const GEN_AI_ANTHROPIC_CACHE_READ_INPUT_TOKENS: &str = "gen_ai.usage.cache_read_input_tokens";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, non-blocking: I think we decided with nirga that this is not named anthropic for that some other providers charge using similar pricing model. So we could remove "anthropic" from variable names

0
};

if self.provider_name() == Some("anthropic".to_string()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (again non-blocking) we may want to test this matching with opentelemetry-instrumentation-langchain with langchain-anthropic

cache_read_tokens: 0,
};

let regular_input_tokens =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably rename this variable to something like total_input_tokens, because regardless if there is cache gen_ai.usage.input_tokens (or gen_ai.usage.prompt_tokens) is meant to represent the total number of input tokens

Comment on lines +110 to +114
let mut input_tokens = InputTokens {
regular_input_tokens: 0,
cache_write_tokens: 0,
cache_read_tokens: 0,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably being too picky, but I'd prefer if this was immutable for readability and more idiomatic rust. I.e. let's construct the struct at return (in each of if/else)

Comment on lines +202 to +205
if let Some(name) = name {
Some(name.to_lowercase().trim().to_string())
} else {
None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: can do with name.map(|name| name.to_lowercase()...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants