diff --git a/en/phased-ranking.html b/en/phased-ranking.html index 8daf6c4bb3..12ed345cbd 100644 --- a/en/phased-ranking.html +++ b/en/phased-ranking.html @@ -18,17 +18,15 @@ query operators use simple scoring functions that are computationally cheap to evaluate over Vespa indexes. Using the expressiveness of the Vespa query language, developers can combine multiple retrievers in the query, and expose the union of retrieved documents into Vespa ranking phases. -
  • Per node ranking:The query specification retrieves documents and ranks them using declarative phases evaluated within the content nodes: Ranking in 3 phases + +

    First-phase ranking on content nodes

    Normally, you will always start by having one ranking expression @@ -67,6 +67,8 @@

    First-phase ranking on content nodes

    use retrieval operators that will expose only the top-k hits to the first-phase expression.

    + +

    Two-phase ranking on content nodes

    While some use cases only require one (simple) first-phase @@ -140,10 +142,10 @@

    Using a global-phase expression

    This phase is optimized for inference with ONNX models, taking some input data from the document and some from the query, and finding a score for how well they match. A typical use case is - re-ranking using cross-encoders. - - It's possibly to specify - gpu-device to get GPU-accelerated computation of the + re-ranking using cross-encoders. +

    +

    + It's possible to specify gpu-device to get GPU-accelerated computation of the model as well. You can compute or re-shape the inputs to the ONNX model in a function if necessary, and use the output in some further calculation to compute the final score. @@ -153,18 +155,17 @@

    Using a global-phase expression

    instead of an ONNX model, it's more efficient to use the highly optimized second-phase computation on content nodes. This is also true for sub-expressions that require lots of vector data, moving - vector data across the network is expensive.

    - + vector data across the network is expensive.

    {% include note.html content='You can force a sub-expression to be computed on the content nodes by making it a function and adding it to match-features' %} -

    +

    By adding the feature to match-features in the ranking profile, the global-phase expression can re-use the function output without the complexity of transferring the data across the network and performing inference in the stateless container (which is less optimized).

    -
    +
     schema myapp {
         document myapp {
             field per_doc_vector type tensor<float>(x[784]) {
    @@ -207,8 +208,12 @@ 

    Using a global-phase expression

    } }
    -

    In the above example, the my_expensive_function will be evaluated on the content nodes -for the 50 top ranking documents from the first-phase so that the global-phase does not need to re-evaluate.

    +

    + In the above example, the my_expensive_function will be evaluated on the content nodes + for the 50 top ranking documents from the first-phase so that the global-phase does not need to re-evaluate. +

    + +

    Cross-hit normalization including reciprocal rank fusion

    @@ -217,12 +222,14 @@

    Cross-hit norm is designed to make it easy to combine unrelated scoring methods into one final relevance score. The syntax looks like a special pseudo-function call: +

    • normalize_linear(my_function_or_feature)
    • reciprocal_rank(my_function_or_feature)
    • reciprocal_rank(my_function_or_feature, k)
    • reciprocal_rank_fusion(score_1, score_2 ...)
    +

    The normalization will be performed across the hits that global-phase reranks (see configuration above). This means that first, the input (my_function_or_feature) @@ -269,12 +276,16 @@

    Cross-hit norm The reciprocal_rank_fusion pseudo-function takes at least two arguments and expands to the sum of their reciprocal_rank; it's just a convenient way to write -
    -  reciprocal_rank(a) + reciprocal_rank(b) + reciprocal_rank(c) 
    - as -
    -  reciprocal_rank_fusion(a,b,c) 
    for example.

    +
    +reciprocal_rank(a) + reciprocal_rank(b) + reciprocal_rank(c)
    +
    +

    as

    +
    +reciprocal_rank_fusion(a,b,c)
    +
    for example. + +

    Stateless re-ranking

    @@ -299,6 +310,7 @@

    Stateless re-ranking

    +

    Top-K Query Operators

    If the first-phase ranking function can be approximated as a simple linear function, @@ -307,22 +319,25 @@

    Top-K Query Operators

    allows avoiding fully evaluating all the documents matching the query with the first-phase function. Instead, only the top-K hits using the internal wand scoring are exposed to the first-phase ranking expression. -

    +

    The nearest neighbor search operator is also a top-k retrieval operator and the two operators can be combined in the same query.

    + +

    Choosing phased ranking functions

    A good quality ranking expression will for most applications consume too much CPU to be runnable on all retrieved or matched documents within the latency budget/SLA. - The application ranking function should hence in most cases be a second phase function. The task then becomes to find a first phase function, which correlates sufficiently well with the second phase function.

    + +

    Rank phase statistics

    Use match-features