-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
558 Implement Search relevance decay based on date filed #4849
Conversation
- Refactored logic for applying build_decay_relevance_score - Added tests for all search types
- Removed redundant test for RECAP Search
Alberto, this is all very impressive, thank you. Three quick thoughts that will help me understand things:
|
Sure here is the Jupiter notebook containing the code to generate the charts for each search type. And these are the ES queries used to retrieve the data: RECAP Documents:
Dockets:
Opinion Clusters:
Oral Arguments
Regarding the proposed scale and decay values shown in the charts: My reasoning was simple. The most recent documents experience little to no decay (~1), while the oldest documents are heavily penalized with a decay close to 0. From there, I tried to adjust the curve to fit our current document distribution. However, this was just an initial proposal to serve as a starting point for discussion about the best approach. The currently proposed parameters have both advantages and disadvantages, depending on the relevance logic we aim to achieve. For example, consider the RECAP Documents chart: In this chart, documents from 1990 and earlier have a decay close to 0. This means that if they match a search query and are ranked high due to their BM25 score, they should appear first in the results if no decay relevance is applied. However, after applying date based decay, these documents will always appear last, regardless of how well their terms match the query. This proposed approach represents an extreme case where older documents in the index are penalized with the highest decay possible. If we want to be more flexible with older documents, we could apply a larger decay value and scale. For instance: decay: 0.5 In this scenario, we observe a much slower decay. Even the oldest documents will never reach a decay close to 0, with the lowest decay for these documents being approximately Additionally, most documents in the index, particularly in the densest region (2000–2024), will have a decay ranging from Here an example: No decay original BM25 scores:
In this example, we see how two documents with similar BM25 scores behave:
Thus, it might be better to avoid extreme settings and start with a "medium" speed decay.
Yes, because this approach works by multiplying or replacing (in the case of filter-only queries) the original BM25 score generated by Elasticsearch. |
Thanks. Very helpful. Two more thoughts:
|
Good questions!
In parent-child queries, the So here when sorting by relevance (score desc), the There is one exception: the
Got it. Currently, the value shown in the I did some tests and found a couple of alternatives that might allow us to break down the scores in the API:
I removed a lot of details from this response for simplicity but the original response can be seen here: You can see the main score (composite) is However, the
Let me know what do you think. |
OK, sounds like adding the score to the API isn't great, so let's not do that. The composite is fine for now. Maybe in the future, we can add For scoring, thanks for all the detailed information. Let's get a PR review done and I'll continue thinking about what the right values are. Feels like we've entered the artisanal stage of relevancy! |
Back on the topic of the values, a couple things come to mind:
I think putting these together, we probably want a case law decay of about 50 years that flattens out at a score of 0.1 or 0.2? I think we want RECAP to have something similar, but over about 20 years instead? This is all very seat of the pants! |
Great! thank you for these insights they're helpful to determine when it's more appropriate to set the scale and decay as you described. To prevent scores for content older than 50 years (Case law/OA) or 20 years (RECAP) from being wiped out entirely, I’ve introduced a tweak to the decay function. Instead of converging to 0, it now converges to
Below are the updated charts that illustrate how the decay function behaves after these adjustments: Dockets: RECAP Documents: Case Law: Oral Arguments: decay_relevance_min_score.ipynb.txt After these changes, it was necessary to update a few dates in the factories to align with the new scales and minimum score. So far, everything is working as expected. |
Looks great. Once reviewed, let's ship and see how it feels! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details, @albertisfu! The code looks good. Let's merge after addressing my comment.
Thanks @ERosendo I've applied the suggested change. |
This PR introduces decay relevance based on the date filed, as described in #558, for all search types with a date filed.
These search types include
RECAP
,DOCKETS
,RECAP_DOCUMENT
(only V4 API),OPINIONS
, andORAL_ARGUMENT
.As pointed out in #558 we aim to combine this decay with the
bm25
scores returned by Elasticsearch when sorting byscore desc
.The formula described in #558 is:
weight = e^(-t / H)
H
is the half-life parameter, meaning the time (t) it takes for the weight to halve which is equivalent the time (t) to reach a decay of0.5
My first approach was to use the built in function scores available in Elasticsearch.
Similar to the formula above is the
exp
function.That looks something like:
weight =exp(λ⋅max( 0, | date_filed - origin| - offset))
Where
λ
is:λ = ln(decay)/scale
Both formulas can achieve the same decay behavior; however,
e^(-t / H)
is directly related to a decay of 0.5Solving H we have:
H = -t/ln(weight)
Decay of 0.5 in a
t
time:H = -t/ln(0.5)
In contrast, the Elasticsearch approach is:
weight =exp(λ⋅max( 0, | date_filed - origin| - offset))
Assuming we are not going to apply an offset and simplifying
max( 0, | date_filed - origin| )) = t
weight =exp(λ⋅t)
λ = ln(decay)/scale
We have:
weight = exp((ln(decay)/scale )⋅t)
Which is a more flexible approach where decay and scale can be easily controlled to adjust the curve shape.
However, when using the built-in function:
I encountered an issue similar to what we found when implementing other custom score functions. If
date filed
isNone
in a document, it is shown first. This doesn't seem correct, as it prioritizes documents with nodate filed
over recent documents.To solve this issue, I opted to implement the same
exp
function using a custom script in thebuild_decay_relevance_score
method, which accepts a value for missingdate_filed
(defaulting to1600-01-01
).This way, documents with a
null
date_filed
are considered to belong to this date or to a date we specify.Additionally, the
decay
andscale
parameters are configurable in this method. The scale is given in years, which makes more sense for our data.This will allow us to say for instance: Achieve a decay of 0.5 (decay) for documents that are 10 years old (scale).
The weight computed by this custom function score is combined with the original BM25 score using
boost_mode: "multiply"
, which multiplies the original score by the computedweight
, which can vary from 1 to ~ 0.For example, if the
weight
computed for a document is 0.5 and the original score for the document is 100, the new score will be 50.However, there is a problem with queries that don't return scores, such as those when the user doesn't provide a text query (only filters or a match-all query) so in these cases the
boost_mode
used is "replace". where only the decay relevance weight is used as the score, which is similar to sorting documents bydate filed
.I also refactored many methods to centralize the application of custom scores for this and previous usages within
apply_custom_score_to_main_query
. This allows us to easily add new function score methods in the future, such as a different relevance score for courts.Additionally, I applied further refactors to avoid sending the function score for percolator queries, where the function score leads to unexpected behavior.
I also tweaked count queries for main documents to avoid sending the function score, which is unnecessary for counts and to improve performance.
Added test classes to confirm that the decay relevance combined with BM25 scores behaves properly for all supported search types in the frontend and API v3 and v4.
To fine-tune the
decay
andscale
parameters, I gathered data from Elasticsearch so we can decide which type of decay to apply based on each document's distribution over time.In the following plots, you can see the document distribution over time and a proposal for the scale and decay parameters, with the curve shown in blue. In this approach the decay curve is adapted proportionally to the document distribution.
Dockets:
scale (years): 20
decay:0.2
RECAP Documents
scale (years): 15
decay:0.15
Case Law
scale (years): 30
decay:0.4
Oral Arguments
scale (years): 15
decay:0.3
However, we can propose a different decay behavior if it makes more sense to have a faster or slower decay from a specific date for each type of document.
Let me know what do you think.