Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jimaku provider] Search fails because of year in title #2704

Open
sdaqo opened this issue Oct 8, 2024 · 21 comments
Open

[Jimaku provider] Search fails because of year in title #2704

sdaqo opened this issue Oct 8, 2024 · 21 comments

Comments

@sdaqo
Copy link

sdaqo commented Oct 8, 2024

Describe the bug
Hi there! I am currently trying to use the jimaku provider, but I am having some problems. If the anidb/anilist refiner does not work, i.e. it is not able to extract a anilist ID for whatever reason, the provider (jimaku) will try to search by name. When it is searching by name however, the year is always included (e.g. Alya Sometimes Hides Her Feelings in Russian 2024), this is problematic, because the jimaku api will not give us any results as it does not expect any year numbers in the title.

I assume the reason why the ID can not be extracted sometimes is some kind of ANIDB ban, I hear they are notorious for their ancient rate limiting. Besides that I found an odd case where it did find an anidb ID but only for episodes 4,5, this happened for "Alya Sometimes Hides Her Feelings in Russian". I will attach some log files, but this issue is probably not related.

Another assumption would be that it does find an anidb id but is not able to match it to a anilist one (I asssume you are using this for mapping: https://github.com/Fribb/anime-lists), this could of course also result in no anilist id. That said, I think analyzing this is out of scope here.

To Reproduce
Steps to reproduce the behavior:

  1. Add the jimaku provider
  2. Try to search a subtitle for an episode/series
  3. ANIDB (and therefore anilist) id not found
  4. Jimaku searches for "< anime name > < year >"
  5. No results because jimaku does not have years in the name

Expected behavior
I expect the jimaku provider to remove the year when searching.

Software (please complete the following information):

  • Bazarr: v1.4.5
  • Radarr version: I don't use it
  • Sonarr version: 4.0.9.2244
  • OS: In a docker container on linux.

Additional context

Here are some logs:

  1. Episode 1 of alya: bazarr (alya episode 1).log - this fails because no anilist id and therefore search + year
    I want to point out line 44 in this one:
2024-10-08 16:01:30|INFO    |subliminal_patch.providers.jimaku|Will search for entry based on params: {'query': 'alya sometimes hides her feelings in russian 2024'}|
  1. Episode 4 of alya: bazarr (alta episode 4).log - this works because it can find a anidb and therfore anilist id
    Here we can see it searches by anilist id:
2024-10-08 16:01:47|INFO    |subliminal_patch.providers.jimaku|Will search for entry based on params: {'anilist_id': 162804}|
@anderson-oki
Copy link
Collaborator

anderson-oki commented Oct 9, 2024

@sdaqo

I assume the reason why the ID can not be extracted sometimes is some kind of ANIDB ban, I hear they are notorious for their ancient rate limiting. Besides that I found an odd case where it did find an anidb ID but only for episodes 4,5, this happened for "Alya Sometimes Hides Her Feelings in Russian". I will attach some log files, but this issue is probably not related.

Thank you for reporting!

First at all AniDB ban will be strictly printed on the Logs as "Banned" status and it will be throttled for 24 hours. Also if you use this HTTP Client only bazarr the AniDB, bazarr is very strict on respecting 190 requests per day and strong cache to prevent rate limit.

Secondly, Jimaku does not uses the anidb API and only the Github XML mapping as you pointed out. So the above comment wouldn't be an issue even if you are banned.

@anderson-oki
Copy link
Collaborator

anderson-oki commented Oct 9, 2024

@ThisIsntTheWay

By any chance, do you have any time to help investigating this issue, i believe there are 2 things:

1 - Just quickly check why the mapping didn't work? The logs seems to contain the anime name and the ids necessary to investigate.
2 - I guess the search for Jimaku could be wrong, the one we use for fallback.

@sdaqo
Copy link
Author

sdaqo commented Oct 11, 2024

Yeah I will look into it if there is some free time!
Edit: whoops, just noticed that you mentioned someone, I thought you meant me, haha!

@ThisIsntTheWay
Copy link
Contributor

ThisIsntTheWay commented Oct 12, 2024

@ThisIsntTheWay

By any chance, do you have any time to help investigating this issue, i believe there are 2 things:

1 - Just quickly check why the mapping didn't work? The logs seems to contain the anime name and the ids necessary to investigate. 2 - I guess the search for Jimaku could be wrong, the one we use for fallback.

Sorry for the late response, I should figure out why GH doesn't notify me when I'm mentioned somewhere...

Regarding point 1: I see that no series_anidb_* fields have been populated, so the anidb refiner failed to determine IDs (and by extension, anilist).
The reason that it failed to determine an AniDB ID is explained below, as the same issue has affected the name fallback as well.

Regarding point 2: The fallback name is assembled as <show_title> <season> because multi-season series on Jimaku follow the same scheme.
However, in this specific situation, the GuessIt module incorrectly identified this episodes year as the season.
As my fallback uses this field, the fallback name was incorrectly assembled.

  • The difference between the file name of episode 1 and episode 4 is that ep1 has its episode number as a single number 01 whereas ep4 uses S01E04.
  • *The AniDB refiner also uses the season property to parse the mapper XML

That said, we could make this fallback a bit more robust though. No show has 2024 seasons (except maybe One Piece in a few years :P)

@ThisIsntTheWay
Copy link
Contributor

ThisIsntTheWay commented Oct 12, 2024

@sdaqo I believe if you use the same naming scheme of episode 4 for episode 1 then you'll be able to match subtitles again.
I've also seen improvements with similar faulty guesses if the parent folder simply contained the show name without the year.

Seeing as guessit is an external module, we can only raise an issue regarding this inaccuracy in their repository (whose last commit was 10 months ago 🤔)

@anderson-oki
Copy link
Collaborator

anderson-oki commented Oct 12, 2024

Yeah i see, it is indeed related to the filename, there are even some other similar cases like this guessit-io/guessit#774.

@sdaqo
Copy link
Author

sdaqo commented Oct 12, 2024

@sdaqo I believe if you use the same naming scheme of episode 4 for episode 1 then you'll be able to match subtitles again. I've also seen improvements with similar faulty guesses if the parent folder simply contained the show name without the year.

Seeing as guessit is an external module, we can only raise an issue regarding this inaccuracy in their repository (whose last commit was 10 months ago 🤔)

All the media is managed by sonarr so naming should be the same?
Here is the listing of all the filenames of that particular show:

'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E01 - 001 - Alya Hiding Her Feelings in Russian [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E02 - 002 - What is a Childhood Friend Really [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E03 - 003 - And So They Met [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E04 - 004 - TBA [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E05 - 005 - TBA [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E06 - 006 - A Kiss of the Indirect Variety [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E07 - 007 - TBA [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E08 - 008 - The Student Council Assembly [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E09 - 009 - TBA [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E10 - 010 - TBA [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E11 - 011 - TBA [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'
'Alya Sometimes Hides Her Feelings in Russian (2024) - S01E12 - 012 - TBA [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv'

The parent folder is named: Alya Sometimes Hides Her Feelings in Russian (2024), this is also managed by sonarr...

I also noticed something weird in the api response for the episodes, for ep 4 and 5 (the ones that work properly) the field "sceneName" is null, is this maybe a hint?
api-res.json

Seeing as guessit is an external module, we can only raise an issue regarding this inaccuracy in their repository (whose last commit was 10 months ago 🤔)

I also tried running the filenames through guessit:
Episode 1:

For: /home/paul/drive-mount/homeserver-share/media/videos/shows/anime-sonarr-collection/Alya Sometimes Hides Her Feelings in Russian (2024)/Alya Sometimes Hides Her Feelings in Russian (2024) - S01E01 - 001 - Alya Hiding Her Feelings in Russian [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv
GuessIt found: {
    "title": "Alya Sometimes Hides Her Feelings in Russian",
    "year": 2024,
    "season": 1,
    "episode": 1,
    "episode_title": "001 - Alya Hiding Her Feelings in Russian",
    "source": "HDTV",
    "screen_size": "1080p",
    "color_depth": "8-bit",
    "video_codec": "H.264",
    "audio_codec": "AAC",
    "audio_channels": "2.0",
    "language": "Japanese",
    "release_group": "SubsPlease",
    "container": "mkv",
    "mimetype": "video/x-matroska",
    "type": "episode"
}

Episode 4:

For: /home/paul/drive-mount/homeserver-share/media/videos/shows/anime-sonarr-collection/Alya Sometimes Hides Her Feelings in Russian (2024)/Alya Sometimes Hides Her Feelings in Russian (2024) - S01E04 - 004 - TBA [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv
GuessIt found: {
    "title": "Alya Sometimes Hides Her Feelings in Russian",
    "year": 2024,
    "season": 1,
    "episode": 4,
    "episode_title": "004 - TBA",
    "source": "HDTV",
    "screen_size": "1080p",
    "color_depth": "8-bit",
    "video_codec": "H.264",
    "audio_codec": "AAC",
    "audio_channels": "2.0",
    "language": "Japanese",
    "release_group": "SubsPlease",
    "container": "mkv",
    "mimetype": "video/x-matroska",
    "type": "episode"
}

This seems correct to me?

@ThisIsntTheWay
Copy link
Contributor

Hmmm, I feel like your ep1 log has shown a different file?
Inspecting the relevant parts of your logs again, I can see...

2024-10-08 16:01:30|INFO    |subliminal_patch.core           |Determining basic video properties for '[SubsPlease] Tokidoki Bosotto Russia-go de Dereru Tonari no Alya-san - 01 (1080p) [2ABB8079].mkv' in '/mnt/media/videos/shows/anime-sonarr-collection/Alya Sometimes Hides Her Feelings in Russian (2024)'|
2024-10-08 16:01:30|DEBUG   |subliminal_patch.core           |GuessIt found: {
    "title": "Alya Sometimes Hides Her Feelings in",
    "language": "Russian",
    "year": 2024,
    "season": 2024,
    ...
}

Note the filename: [SubsPlease] Tokidoki Bosotto Russia-go de Dereru Tonari no Alya-san - 01 (1080p) [2ABB8079].mkv.
However, you are testing with: Alya Sometimes Hides Her Feelings in Russian (2024) - S01E01 - 001 - Alya Hiding Her Feelings in Russian [HDTV-1080p][8bit][x264][AAC 2.0][JA]-SubsPlease.mkv.

That one is using a naming scheme that yields higher guessit accuracy.

@sdaqo
Copy link
Author

sdaqo commented Oct 12, 2024

@ThisIsntTheWay That is why I think the api call I attached has something to do with it. What exactly is the "scenceName" property? Ok, I know that this is the original filename before being modified by sonarr, but where exactly does bazarr pull this from and why is there no "sceneName" for episode 4/5?

Edit:
If the sceneName property is present it also displays in the ui:
Here episode 4:
image
And episode 1:
image

Edit 2: Just to be clear, the file shown in the logs (/mnt/media/videos/shows/anime-sonarr-collection/Alya Sometimes Hides Her Feelings in Russian (2024)/[SubsPlease] Tokidoki Bosotto Russia-go de Dereru Tonari no Alya-san - 01 (1080p) [2ABB8079].mkv) does not exsist.

@morpheus65535
Copy link
Owner

why is there no "sceneName" for episode 4/5

It's a bug I've seen a couple of time in Sonarr. They simply do not keep it so we cannot reuse it.

@ThisIsntTheWay
Copy link
Contributor

Well, bazarr being fed a phantom path is new to me 🤔
Has a file with that very name ever existed to your knowledge?
Seems to me like its a file that was returned by an indexer...

Either way I'm not really sure why you're being trolled and I'd need to investigate further.

@sdaqo
Copy link
Author

sdaqo commented Oct 12, 2024

Has a file with that very name ever existed to your knowledge?

Well, yes, but this file is in a completely different directory.

Ok so I also checked a little bit and first of all as @morpheus65535 said, the sceneName property in sonarr is not populated for episode 4/5 in this case:
Screenshot of api call in sonarr:
image

I scoured the bazarr code a little bit and this line in the "get_video" function is probably the "cause", as the name of the path which is passed down to the guessit call gets alteredt to the "sceneName" here. That said, there is probably a good reason why this happens, but maybe something could be adjusted there.

@morpheus65535
Copy link
Owner

This is the expected behavior. We use the scenename to provide more informations about the original filename before the users strip them all to have something nice and clean to look at...

@sdaqo
Copy link
Author

sdaqo commented Oct 12, 2024

@morpheus65535 This makes sense, but in this case the modified filename has more info. Now, how can it figure out which of the names is better suited for parsing? I do not know, but since bazarr is already calling the sonarr/radarr api, why not get the info from there instead of the filename?

@morpheus65535
Copy link
Owner

I do not know, but since bazarr is already calling the sonarr/radarr api, why not get the info from there instead of the filename?

Because their API doesn't provide as much info as we get from the original filename.

@morpheus65535
Copy link
Owner

Is there anything left to fix here? I read the whole conversation again and I'm still not sure if something should be done...

@sdaqo
Copy link
Author

sdaqo commented Oct 21, 2024

Until guessit-io/guessit#774 is solved, how about a checkbox in the settings (e.g. in the sonarr settings) that forces the system to always use the current filename. I, for example, almost always have more info in the current filename then the original and considering Sonarr is so inconsistent when it comes to populating the scenename property I think it makes sense to have such an option.

@morpheus65535
Copy link
Owner

But anyway, if there's no scenename provided by Sonarr, Bazarr will use the filename instead so I'm not sur of the benefit here...

@sdaqo
Copy link
Author

sdaqo commented Oct 21, 2024

Yeah that's what I mean, I think it would be beneficial because such an option would always use the filename instead of the scenename. That would be more consistent as sonarr should always name the files in the same way and the user can tweak it themselves. I am aware that the scene name is necessary for some lookups (e.g. someone uploaded a subfile with the exact same name as the scene name), but as this is only for the info extraction I think it would make sense. Moreover there are providers, like jimaku that relies on the extracted information, and I think giving the user the option here would make sense.

morpheus65535 added a commit that referenced this issue Oct 24, 2024
@morpheus65535
Copy link
Owner

I've modified Bazarr to use both when available. It will now guess using file name first then, if available, guess using the scene name. We should get the better of the two worlds with that. Let me know if upcoming beta improve your results.

@sdaqo
Copy link
Author

sdaqo commented Oct 24, 2024

Awesome, that sounds like a perfect solution, will test it as soon as I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants