Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering large topics is extremely slow #4107

Closed
2 tasks done
andreaippo opened this issue Aug 10, 2023 · 6 comments
Closed
2 tasks done

Filtering large topics is extremely slow #4107

andreaippo opened this issue Aug 10, 2023 · 6 comments
Labels
scope/backend status/wontfix This will not be worked on type/enhancement En enhancement to an already existing feature

Comments

@andreaippo
Copy link

Issue submitter TODO list

  • I've searched for an already existing issues here
  • I'm running a supported version of the application which is listed here and the feature is not present there

Is your proposal related to a problem?

Filtering large topics is extremely slow

Describe the feature you're interested in

I have a topic with lots of messages (tens of millions, don't ask... LOL )
I have noticed that filtering is extremely slow when compared against Offset Explorer.
Maybe it's because OE allows to specify how many messages you want to fetch (e.g. newest/oldest 1k).
For some kind of analysis (e.g. duplicate search) you don't need a thorough search that yields ALL the messages, it could be enough to have a search performed on the last X messages.

So I guess that what I'm asking is, could we have a way to limit the scope of a search to the oldest (or newest) n (user-defined) messages of a topic?

Thanks

Describe alternatives you've considered

No response

Version you're running

56fa824 v0.7.1

Additional context

No response

@andreaippo andreaippo added status/triage Issues pending maintainers triage type/feature A new feature labels Aug 10, 2023
@github-actions
Copy link

Hello there andreaippo! 👋

Thank you and congratulations 🎉 for opening your very first issue in this project! 💖

In case you want to claim this issue, please comment down below! We will try to get back to you as soon as we can. 👀

@andreaippo
Copy link
Author

Screenshot from OE:

image

@Haarolean
Copy link
Contributor

@andreaippo can you take a look at the master-tagged image? Does it improve there?

@andreaippo
Copy link
Author

andreaippo commented Sep 11, 2023

@andreaippo can you take a look at the master-tagged image? Does it improve there?

Hi! Yes, thanks, it's considerably better!

For reference it took 48secs to Offset Explorer to do that search, with a limit on the newest 10k messages per partition (with 9 partitions on my topic that means 90k messages read, I suppose).

On kafka-ui it took also 48-49secs to complete the search, but it returned MORE results when compared to OE.

I think that OE forcing you to choose the amount of messages per partition beforehand will actually limit the scope of the search, which also means not seeing matching results. I like kafka-ui's approach better, since as far as I understand it will scan the full contents of all partitions for the selected topic.

Could you please confirm if that is the case? Since I don't see any option to specify the amount of messages to fetch (see previous screenshot from OE asking for Max Messages (per partition).

That would clearly make kafka-ui the superior choice :)

Also, what is the effect of Newest VS Oldest first when filtering? Is it just a matter of ordering the results?

Meanwhile thanks for your reply :)

@Haarolean
Copy link
Contributor

Yes, we do not have a straight-obvious limit like in other tools, rather we have a pagination (which is currently broken, to be solved within #3504). We've considered getting rid of pagination in favor of a fetch limit, which actually does make sense, it's not like you'd need to scroll through 500+ messages within one search, but that's another story.

Regarding the modes, no, it's not just ordering, it affects the search as well, oldest meaning we're gonna scan the topic from the oldest offsets to newest and vice versa for "newest first".
These modes on the right side are actually tied to the modes on the left, in v2 it's gonna look like this:
image

@Haarolean Haarolean closed this as not planned Won't fix, can't repro, duplicate, stale Sep 11, 2023
@Haarolean Haarolean added type/enhancement En enhancement to an already existing feature status/wontfix This will not be worked on scope/backend and removed status/triage Issues pending maintainers triage type/feature A new feature labels Sep 11, 2023
@andreaippo
Copy link
Author

Yes, we do not have a straight-obvious limit like in other tools, rather we have a pagination (which is currently broken, to be solved within #3504). We've considered getting rid of pagination in favor of a fetch limit, which actually does make sense, it's not like you'd need to scroll through 500+ messages within one search, but that's another story.

Regarding the modes, no, it's not just ordering, it affects the search as well, oldest meaning we're gonna scan the topic from the oldest offsets to newest and vice versa for "newest first". These modes on the right side are actually tied to the modes on the left, in v2 it's gonna look like this: image

Ok thanks a lot, it's clear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope/backend status/wontfix This will not be worked on type/enhancement En enhancement to an already existing feature
Projects
None yet
Development

No branches or pull requests

2 participants