-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search history completions #419
Comments
I would prefer to keep it entirely client-side. I do not want the server to store or have any access to search query history.
It's ultimately limited by the max query string length supported by Bandit. OpenSearch does not receive the query string directly--it is forwarded through a parser in the application to transform the query string into a query tree for sending to the search server, and for post-search evaluation of spoilers.
There are no metrics, since I have always been opposed to logging user search queries. |
I see. Anyhow, the metrics I'm talking about like "query length" don't require logging the query contents. It's just a numeric metric - a histogram or a heatmap, i.e. we can measure the number of queries with length It'll look something similar to this on a pie chart, for example: Otherwise, do you have any system metrics about the request latencies, system load (CPU load, memory usage spikes)? Something like this: These metrics are just simple in-memory counters on the backend side (Atomic Integers/floats) that a flushed to a time-series DB such as Prometheus or VictoriaMetrics retained for a specified period of time (e.g. only for the last month, half a year, year or more, depending on your storage availability). So would you be interested in having that instrumentation (if not already)? I can make a separate issue for that. Other than using client-side storage, I suppose there is nothing more to change? |
Motivation
Today custom autocompletions in the search input are disabled by default, mainly because they replace the native browser auto-fill. But the native auto-fill itself isn't probably the requirement.
Solution
The requirement is to have a functional search history in completions. Something similar to what Google does:
The history should be shown by default when the search input is empty and focused. It should also participate in completions search when the user has typed at least one character.
Ranking
We should rank the search history results by the number of times the search was invoked. The search history results should always take precedence over the default tag completions, since it's very likely the users have a set of "favourite" queries they repeat again and again.
Manual deletion
There should be a button somewhere on the search history completion item to remove it from history. Google example:
This is useful when you want to remove some search history item because it contains typos, or maybe you just want to hide something from somebody 😳.
Matching
Currently, tags completions are formed based on prefix match. I think that's rather inconvenient even for that purpose, but for the search history it will be even more of a problem because search queries may contain multiple tags and even complex conditions. I think a fuzzy search like the one
fzf
implements would be much nicer overall. This may be rather difficult to implement, and thus may be discussed/done separately.For the easy start we may just use a case-insensitive substring match i.e.
ILIKE %_%
logic.Server-side vs client-side
We may provide server-side search history storage for logged-in users only. The server-side storage will provide a shared history across all user devices. Mainly - desktop + phone or multiple desktops. This is how Google does it and it's very convenient.
Anonymous users will have to enjoy client-side
localStorage
solution only, and it'll serve them an additional incentive to create an account.However, logged-in users will also rely on client-side
localStorage
as a cache for search history until server-side history is loaded.Limits
We don't want the search history to grow unboundedly for the reasons of performance and fair storage usage.
Maximum search query length
What is the maximum search query length today? Is there a limit at all? If not, we should probably set it to a reasonably big number of characters to prevent misuse and DDoS attack vectors. Even if there is no official limit today, there is definitely an implicit system limit imposed by OpenSearch (I suppose this is what's used to implement search queries), at which point the backend fails with an error from OpenSearch max characters count.
As a more flexible solution - we may allow very long queries, but ignore the search queries exceeding some UTF8 length (e.g. 500 bytes) for the purposes of search history bookkeeping. I'm speculating on the best max length here, since I don't have any statistics about the average query lengths. Do we have any metrics about that? Are there any metrics time series instrumentation in the backend? I don't see any Prometheus/VictoriaMetrics nodes defined in the
docker-compose.yml
, so I suppose some of the existing statistics are collected via Postgres/OpenSearch?Maximum search items count
We should keep at most
N
most actively used search items. The number should be something reasonable to make sure that we both store enough history that all actively used queries always appear in completions, and rare/one-off/mistake queries are eventually removed from the storage.This number must correlate with the max search query length as their product defines the maximum storage space we need to allocate per user to store their search history.
Adoption. Migration to opt-out (again)
The eventual goal is to have custom completions enabled by default for everyone, including for people who objected to the feature for the reason of losing the browser auto-fill. To do that we may keep completions disabled by default for some time, but collect the search history regardless. For example, we can give it a month in this state, collect the search history from users, and then enable completions by default. On that day users will open Derpibooru and see the search history that is most likely already good enough, since it contains their most popular queries from the last month.
Unresolved Questions
Should we make the search history disableable? Will there be anyone who'd want to disable it? If so, why? Privacy concerns when sharing screen /derpi account maybe? Will they want to disable only the search history and not the entire autocompletions feature?
What's the best server-side API for this? Should we just load the entire history in one request and do client-side filtering? I guess in the worst case the history may be in the order of hundreds of kilobytes, so I'd vote for that instead of a debounced on-the-fly completion requests, that may be slower and incur more load.
What's the optimal DB and DB layout for this? If we speculate that the search history should not be bigger than a few hundreds of kilobytes, then we can store it in an embedded postgres array field. We may even store it inside of the users table, but we need to make sure the search history is not loaded unnecessarily (e.g. via
SELECT *
somewhere). However, could this decrease the performance of queries that don't load the search history from the users table? I'm not sure how Postgres stores columns (SoA or AoS?), need to check the perf. implications of that layout. Alternatively search history can be stored in a separate table with one-to-one relationship to users. WDYT? Or maybe a classic many-to-one will be fine here? I just think that storing the entire history in one column may yield better performance if we are going to return the entire history for client-side filtering anyway, but I don't have enough experience with Postgres yet to say for sure.The text was updated successfully, but these errors were encountered: