Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utility classes to make it easier to use sandbox facet API for most common cases #14237

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

epotyom
Copy link
Contributor

@epotyom epotyom commented Feb 13, 2025

In the initial sandbox facet module PR @gsmiller suggested adding helpers to make common tasks easier.

This implementation uses existing type FacetResult to return results. The type has limitations, e.g. it supports only one value per label. At the same time, it is simple, users are already familiar with it, and it is probably good enough for the most common tasks. Users who need more complex results can use the lower level API FacetCutter, FacetRecorder, etc to get multiple aggregations per label, to implement custom sort order or any other non-trivial faceting task.

It's a draft, if people are ok with the approach overall I will add more unit tests and javadocs.

Simplifies requesting taxonomy and long range facets
Makes sure we don't count docs two times for the same index field when there are multiple requests
Returns FacetResult - to make it easier to migrate for classic facets module users

TODOs:
- better names - maybe use names suggested in TK
- implement for all existing facet cutters
- add comprehensive tests
- fix total value where possible in tests
Copy link
Contributor

@stefanvodita stefanvodita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a glance at everything and didn't see anything objectionable. It's great if we're making this module easier to use. I have some non-blocking concerns that this is a lot of new code, with more abstractions, and it doesn't have a user waiting to use it. There are arguments against those concerns as well and I personally like the new aggregations, but I thought it was worth raising the point.

@@ -130,6 +135,88 @@ void index() throws IOException {
IOUtils.close(indexWriter, taxoWriter);
}

/**
* Example for {@link FacetBuilder} usage - simple API that provides results in a format very
* similar to classic facets module. It doesn't give all flexibility available with {@link
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really does look more familiar! Can we highlight it somewhere more prominent as well? Maybe in the faceting guide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll do that.

Copy link

@Shradha26 Shradha26 Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to have a bidirectional link for the shared examples in the classic and new aggregation modules?
Edit: nvm, I think the examples are a bit different.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating the FacetBuilder and FacetOrchestrator abstractions - they do really simplify the usage of the new API.

}

@Override
FacetCutter createFacetCutter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a problem that this is called create, giving the impression that we get a new cutter? Reusing the old one could be problematic, couldn't it, if the user expects a new cutter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you are right, this method doesn't have to create FacetCutter, e.g. FacetCutter might be created in advance and returned by the method. Renamed to getFacetCutter

* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.lucene.sandbox.facet.utils;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a package for the facet builders?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep *FacetOrchstrators in the same package as FacetBuilders, so that we can keep some FacetBuilder method package-private. So the only file that maybe doesn't belong in the same package is ComparableUtils.java. At the same time the idea of ComparableUtils is similar - make it easier for users to use the new module, so I thought we could reduce number of packages and keep meaningful package structure at the same time... But I don't feel strongly about it. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your argument. Let's keep it as is then.

@epotyom
Copy link
Contributor Author

epotyom commented Feb 17, 2025

Thanks for reviewing @stefanvodita !

I have some non-blocking concerns that this is a lot of new code, with more abstractions, and it doesn't have a user waiting to use it. There are arguments against those concerns as well and I personally like the new aggregations, but I thought it was worth raising the point.

I agree with the concern, at the same time it might be one of these chicken and egg cases when users avoid using the feature just because it doesn't have simple enough API for simple use cases?

One use case for the new utils I have in mind is luceneutil perf tests - it puts me off that we need 90 lines to test performance of the new API, while for existing facets module it is only two lines.

Copy link
Contributor

@stefanvodita stefanvodita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my comments! I'm up for merging this, while keeping in mind the concerns we discussed.

+ topDocs.scoreDocs.length);

//// Results
FacetResult autorResults = authorFacetBuilder.getResult();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] authorResults instead of autorResults?

.collect(query, ds);

//// Results
FacetResult autorResults = authorFacetBuilder.getResult();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] authorResults instead of autorResults?

"Price",
new LongRange("0-10", 0, true, 10, true),
new LongRange("10-20", 10, true, 20, true));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To maintain output consistency between different examples, we could add something equivalent to what we have in simpleFacetsWithSearch ? Or, we could remove it from there.

System.out.println(
        "Search results: totalHits: "
            + topDocs.totalHits
            + ", collected hits: "
            + topDocs.scoreDocs.length);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this intentionally, to demonstrate that there are methods to do facet only and facets with search (examples that haveWithSearch suffix also compute/print TopDocs). We can rename this example to simpleFacetsOnlyWithDrillSideways maybe?


//// Search and collect
TopDocs topDocs =
new FacetOrchestrator()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we handle cases where more than one FacetBuilder instances specify topN?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each FacetBuilder can have its own topN value set by withTopN(), which is used to compute FacetBuilder#getResult

import org.apache.lucene.sandbox.facet.labels.RangeOrdToLabel;

/** {@link FacetBuilder} factory for long range facets. */
public final class LongRangeFacetBuilder {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we make LongRangeFacetBuilder extend CommonFacetBuilder? The create method returning something of a different type is a bit confusing to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that CommonFacetBuilder can be used to compute results for long range facets, so there is no need to create a new FacetBuilder subclass. But I thought it would be nice to have a utility class (I should have added a private constructor to make it static-only, will do) that helps to get CommonFacetBuilder instance.

I can remove the class and let users rely on examples, or rename it to RangeFacetBuilder and add similar methods for double ranges. WDYT?

* Common {@link FacetBuilder} that works with any {@link FacetCutter} and {@link OrdToLabel}
* provided by user.
*/
public final class CommonFacetBuilder extends BaseFacetBuilder<CommonFacetBuilder> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when I should be using CommonFacetBuilder vs BaseFacetBuilder? Some examples, even in the documentation would help? For e.g, why can't TaxonomyFacetBuilder extend CommonFacetBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseFacetBuilder is abstract package-private class that implements common functionality that different FacetBuilder implementations can reuse.

CommonFacetBuilder is a final class, kind of generic facet builder that can be used with any FacetCutter, but as a consequence it lacks some facet type specific functionality. E.g. using it for taxonomy facets makes sense only if we don't need to specify parent path and don't need to compute overall value. The idea is that creating FacetBuilder should not be required for every facet type, just for the types that require some custom logic, such as taxonomy facets.

WDYT?

@@ -130,6 +135,88 @@ void index() throws IOException {
IOUtils.close(indexWriter, taxoWriter);
}

/**
* Example for {@link FacetBuilder} usage - simple API that provides results in a format very
* similar to classic facets module. It doesn't give all flexibility available with {@link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating the FacetBuilder and FacetOrchestrator abstractions - they do really simplify the usage of the new API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants