High number of memory allocations when parsing timeago dates #1037

FireMasterK · 2023-03-06T03:18:10Z

The cause is this line:

NewPipeExtractor/extractor/src/main/java/org/schabi/newpipe/extractor/localization/TimeAgoParser.java

Line 76 in 5a9b6ed

.anyMatch(agoPhrase -> textualDateMatches(textualDate, agoPhrase)))

It looks like compiling a pattern makes a lot of memory allocations, and we should avoid it if possible.

Screenshot from profiler:

lrusso96 · 2023-07-02T13:47:49Z

Also pinging @AudricV since he recently discussed about refactoring the parser in #1068.

@FireMasterK the code is highly inefficient. The pattern is re-compiled every time one wants to check for a match (even more than 20 times per match!)
The easy fix would be to cache the Pattern and call multiple times the matches method on it.
However, since the pattern in this case is very simple, regex could be an overkill. Take a look at my fork where I implemented the regex-free (yet locale-aware) parsing. Seems like we save 10x memory allocations.
The source code is not yet very clean, but it is sufficient for tests :)

Some minor notes

I tried to not cheat and further optimize things just to make the comparison legit. Indeed, the code in the fork is right now semantically equivalent to the one in the dev branch
Note that I haven't modified parseTimeAgoAmount, which however relies on a regex behind the scenes. Right now, textualDate is parsed multiple times, but one could just parse it once and get the necessary information.

lrusso96 · 2023-10-14T06:24:28Z

@FireMasterK did you have time to take a look at it?
Also, recent changes in #1082 should share similar memory allocation issues.

FireMasterK added enhancement youtube service, https://www.youtube.com/ labels Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High number of memory allocations when parsing timeago dates #1037

High number of memory allocations when parsing timeago dates #1037

FireMasterK commented Mar 6, 2023

lrusso96 commented Jul 2, 2023

lrusso96 commented Oct 14, 2023

High number of memory allocations when parsing timeago dates #1037

High number of memory allocations when parsing timeago dates #1037

Comments

FireMasterK commented Mar 6, 2023

lrusso96 commented Jul 2, 2023

Some minor notes

lrusso96 commented Oct 14, 2023