Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-optimal search result: "dev services" does not rank high enough the tutorial that talks most about dev services #40

Open
holly-cummins opened this issue Nov 9, 2023 · 11 comments

Comments

@holly-cummins
Copy link

If I search on https://quarkus-website-pr-1825-preview.surge.sh/guides/ for 'dev services', in an ideal world the 'my second application' guide would be the first result (IMO). At the very least, I'd hope it was in the results. The title doesn't mention dev services, but the slug and body feature dev services a lot.

image

(This kind of content is an ideal use case for improved search because the title of the tutorial doesn't mention dev services because it's aimed at people who don't know they need to know about dev services .... but a direct search should also find it because it's our main introduction to dev services.)

@yrodiere
Copy link
Member

yrodiere commented Nov 9, 2023

Thanks for the report.

At the very least, I'd hope it was in the results

FWIW "Your second Quarkus application" does appear in the results, but far down the list (you have to scroll and trigger loading of additional results).

in an ideal world the 'my second application' guide would be the first result (IMO)

I may be wrong, but "Dev Services Overview", at least, does seem more relevant than this guide when searching for "dev services"... ? Are we still talking about relevance here, or do you want some kind of "featured" list of results that always appear first?

In any case, I can offer the following immediate solutions to try to improve the relevance score of Your second Quarkus application:

In the longer term, we could consider adding a list of "featured guides" near the top of the search results. It would a short (3-4) list of matching guides that we cherry-picked and tagged through asciidoc metadata because we think are particularly important.
This list would be short and compact, so as not to interfere with "main" results, but could be highlighted in other ways (more vivid colors, bold font, colored background, ... don't ask me, my UIs are generally appalling). Think advertisement in web search engines :)
If you think this makes sense, I'll create a separate issue.

@yrodiere
Copy link
Member

yrodiere commented Nov 9, 2023

@gsmet
Copy link
Member

gsmet commented Nov 9, 2023

Topics are not keywords, they are topics. They are designed to look nice in a tag list or something. We could make it dev-services if you prefer.

@gsmet
Copy link
Member

gsmet commented Nov 9, 2023

you have to scroll and trigger loading of additional results

Given the number of results is finite, should we always display all results?

@holly-cummins
Copy link
Author

FWIW "Your second Quarkus application" does appear in the results, but far down the list (you have to scroll and trigger loading of additional results).

Ah, sorry, yes, I was being lazy!

in an ideal world the 'my second application' guide would be the first result (IMO)

I may be wrong, but "Dev Services Overview", at least, does seem more relevant than this guide when searching for "dev services"... ? Are we still talking about relevance here, or do you want some kind of "featured" list of results that always appear first?

Good questions. I did add 'ideal world' because I think getting to the 'Holly's ideal state' behaviour for this case may be non trivial and involve some icky tradeoffs. I should have said that more clearly! So here, I think part of the problem is that 'Dev Services Overview' is perhaps mis-titled. From the title, you'd think it's a page for people who want to know about dev services, and it's not. Obviously, that's not something we can blame the search engine for. :)

Well, and I'm being slightly unfair - the first few paragraphs are an overview, but then it becomes a reference.

In any case, I can offer the following immediate solutions to try to improve the relevance score of Your second Quarkus application:

* Customize analyzers so that "dev services" is considered a synonym of "devservices". This is important because we do appear to [use "devservices", without a space, in the `:topic:` metadata](https://github.com/quarkusio/quarkus/blob/b865f853b4400fd7ca0cee50aa98483529d5f2aa/docs/src/main/asciidoc/getting-started-dev-services.adoc#L12). CC @gsmet: was this on purpose?

Yes, I think this seems like a very good thing to do - perhaps also devservice? That will help other pages in this area.

* And/or we add a `:keywords:` metadata entry containing "dev services" to [`getting-started-dev-services.adoc`](https://github.com/quarkusio/quarkus/blob/b865f853b4400fd7ca0cee50aa98483529d5f2aa/docs/src/main/asciidoc/getting-started-dev-services.adoc)

This also seems useful, although I assume we don't want to have to replicate the topics in the keywords as a general pattern? (Me being lazy again :) )

In the longer term, we could consider adding a list of "featured guides" near the top of the search results. It would a short (3-4) list of matching guides that we cherry-picked and tagged through asciidoc metadata because we think are particularly important. This list would be short and compact, so as not to interfere with "main" results, but could be highlighted in other ways (more vivid colors, bold font, colored background, ... don't ask me, my UIs are generally appalling). Think advertisement in web search engines :) If you think this makes sense, I'll create a separate issue.

I like this, but I also think it seems like something we should do if we have to, and not before. The ideal search engine would magically rank everything correctly without any manual intervention. I hasten to add I'm not sure I've ever seen such an engine. :)

But I wonder if there are some other heuristics that we might want to apply that would replicate the effect of featuring guides, but without the manual curation, like "tend to rank tutorials above reference guides," or ... [drawing a blank]

Thanks for looking into it!

@yrodiere
Copy link
Member

yrodiere commented Nov 9, 2023

you have to scroll and trigger loading of additional results

Given the number of results is finite, should we always display all results?

"Finite" is still up to ~220 (worst case for a search for a particular version) and counting... and you asked me to include titles and even more info in the JSON. And people are already asking to integrate Quarkiverse in the results.

There's a compromise to be found, sure, but I don't think returning all hits is future-proof.

@yrodiere
Copy link
Member

yrodiere commented Nov 9, 2023

Topics are not keywords, they are topics. They are designed to look nice in a tag list or something.

My point was that we do match against a full-text "topics" field and we do apply a higher boost compared the the content of a guide. So we might want it to... actually match?

We could make it dev-services if you prefer.

That would work, but I'll probably need to work on analyzers anyway, be it just to handle users typing devservices in the search box.

@yrodiere
Copy link
Member

yrodiere commented Nov 9, 2023

Thanks @holly-cummins and @gsmet , then I'll look into improving relevance first, and we'll try to "feature" this guide when we work on page ranks (it won't be clear-cut because the relevance sort is necessarily fuzzy, but that should at least improve things a bit).

yrodiere added a commit to yrodiere/search.quarkus.io that referenced this issue Nov 9, 2023
To make it extra unlikely that the guide one's looking for is not in the
first page.

Compromise to address quarkusio#40 (comment)
@gsmet
Copy link
Member

gsmet commented Nov 9, 2023

Topics can be used to improve ranking but they are not designed for this purpose. That's what I was saying.

@yrodiere
Copy link
Member

Regarding this:

Customize analyzers so that "dev services" is considered a synonym of "devservices". This is important because we do appear to use "devservices", without a space, in the :topic: metadata.

These filters may be relevant (from most likely to help to least likely):

* https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-hyp-decomp-tokenfilter.html

* https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-dict-decomp-tokenfilter.html

* https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-common-grams-tokenfilter.html

* https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-graph-tokenfilter.html (would only work for e.g. `DevServices`, not `devservices`)

I created #59 to address this specifically.

@gsmet
Copy link
Member

gsmet commented Nov 23, 2023

@yrodiere note that IIRC, I changed things to dev-services in the topics now.

@yrodiere yrodiere changed the title Sub-optimal search result: "dev services" does not find the tutorial that talks most about dev services Sub-optimal search result: "dev services" does not rank high enough the tutorial that talks most about dev services Nov 27, 2023
@yrodiere yrodiere added help wanted Extra attention is needed and removed help wanted Extra attention is needed labels Dec 6, 2024
@yrodiere yrodiere modified the milestone: MVP Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants