Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save_last_sync try to convert HTTP date header with a localized python date-time parser which is incorrect #916

Open
prahal opened this issue Sep 16, 2024 · 12 comments · May be fixed by #918

Comments

@prahal
Copy link

prahal commented Sep 16, 2024

Describe the bug
When the locale has a month name format "%b" which is not English save_last sync raise an exception because it cannot parse the server-date (server-date is the HTTP header "Date" of the HTTP request, from .kodi/addons/plugin.video.jellyfin/jellyfin_kodi/jellyfin/http.py: self.config.data["server-time"] = r.headers["Date"]).

HTTP Date has three format https://datatracker.ietf.org/doc/html/rfc9110.html#section-5.6.7, the "preferred format is a fixed-length and single-zone subset of the date and time specification used by the Internet Message Format [RFC5322]". That is https://datatracker.ietf.org/doc/html/rfc5322#section-3.3.

strptime is wrong as HTTP Date is not localized, we should use a dedicated function.

Seems the closest to parse HTTP date is to parse it as an email date with email.utils.parsedate_to_datetime. https://stackoverflow.com/questions/1471987/how-do-i-parse-an-http-date-string-in-python (not that I don't have enough reputation to comment on the wrong answers that is rfc822 which is not more appropriate that email.utils as rfc822 is the obsolete RFC for email dates (the current one is https://datatracker.ietf.org/doc/html/rfc2822#section-3.3 which is what email.utils.parse is about), not an RFC for HTTP dates. And the strptime answer that as this bug re'port shows does not work on localized systems, i.e. with an LC_TIME not English).

Also see the discussion in #288 (comment)
I disagree that ISO8601 is the way to go because server-date is about the HTTP date. We will have to parse the HTTP date format either way, so why not keep it the IMF format from RFC5322?

The issue is that we parse a non localized datetime string with a localized function, not that the format is wrong.

But do we agree to keep RFC5322 IMF date format?

Do we agree to use email.utils.parsedate_to_datetime (it is python 3.3 and above) or should I use email.utils.parse and then convert to datetime?

And is it fine to use email.utils.parsedate rfc2822 even if I believe it does not cope with the two obsolete HTTP date formats that the RFC9110 says we MUST be able to read?

  Sunday, 06-Nov-94 08:49:37 GMT   ; obsolete RFC 850 format
  Sun Nov  6 08:49:37 1994         ; ANSI C's asctime() format

Maybe test cases would be good too.
Probably a test case with a locale with non-English month names, one that 2 digit year, a non rfc822 date string. Do you see others?

Edit: wed could also include djjango parse_http_date codfe which seems straightforward:
https://github.com/django/django/blob/stable/5.1.x/django/utils/http.py#L97

MONTHS = "jan feb mar apr may jun jul aug sep oct nov dec".split()
__D = r"(?P<day>[0-9]{2})"
__D2 = r"(?P<day>[ 0-9][0-9])"
__M = r"(?P<mon>\w{3})"
__Y = r"(?P<year>[0-9]{4})"
__Y2 = r"(?P<year>[0-9]{2})"
__T = r"(?P<hour>[0-9]{2}):(?P<min>[0-9]{2}):(?P<sec>[0-9]{2})"
RFC1123_DATE = _lazy_re_compile(r"^\w{3}, %s %s %s %s GMT$" % (__D, __M, __Y, __T))
RFC850_DATE = _lazy_re_compile(r"^\w{6,9}, %s-%s-%s %s GMT$" % (__D, __M, __Y2, __T))
ASCTIME_DATE = _lazy_re_compile(r"^\w{3} %s %s %s %s$" % (__M, __D2, __T, __Y))

def parse_http_date(date):
    """
    Parse a date format as specified by HTTP RFC 9110 Section 5.6.7.

    The three formats allowed by the RFC are accepted, even if only the first
    one is still in widespread use.

    Return an integer expressed in seconds since the epoch, in UTC.
    """
    # email.utils.parsedate() does the job for RFC 1123 dates; unfortunately
    # RFC 9110 makes it mandatory to support RFC 850 dates too. So we roll
    # our own RFC-compliant parsing.
    for regex in RFC1123_DATE, RFC850_DATE, ASCTIME_DATE:
        m = regex.match(date)
        if m is not None:
            break
    else:
        raise ValueError("%r is not in a valid HTTP date format" % date)
    try:
        year = int(m["year"])
        if year < 100:
            current_year = datetime.now(tz=timezone.utc).year
            current_century = current_year - (current_year % 100)
            if year - (current_year % 100) > 50:
                # year that appears to be more than 50 years in the future are
                # interpreted as representing the past.
                year += current_century - 100
            else:
                year += current_century
        month = MONTHS.index(m["mon"].lower()) + 1
        day = int(m["day"])
        hour = int(m["hour"])
        min = int(m["min"])
        sec = int(m["sec"])
        result = datetime(year, month, day, hour, min, sec, tzinfo=timezone.utc)
        return int(result.timestamp())
    except Exception as exc:
        raise ValueError("%r is not a valid date" % date) from exc

To Reproduce

  1. Go to Jellyfin plugin in Kodi
  2. Go to your library
  3. Open the context menu on the library and click on Sync
  4. See error in the Kodi log .kodi/temp/kodi.log

Expected behavior

No error in Kodi log and probably a working last sync guard.

Logs

2024-09-16 12:40:45.749 T:740028    info <general>: JELLYFIN.jellyfin_kodi.library -> ERROR::jellyfin_kodi/library.py:536 time data '16 Sep 2024 10:40:33 GMT' does not match format '%d %b %Y %H:%M:%S GMT'
                                                   Traceback (most recent call last):
                                                     File "jellyfin_kodi/library.py", line 530, in save_last_sync
                                                       time_now = datetime.strptime(
                                                                  ^^^^^^^^^^^^^^^^^^
                                                     File "/usr/lib/python3.12/_strptime.py", line 554, in _strptime_datetime
                                                       tt, fraction, gmtoff_fraction = _strptime(data_string, format)
                                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                     File "/usr/lib/python3.12/_strptime.py", line 333, in _strptime
                                                       raise ValueError("time data %r does not match format %r" %
                                                   ValueError: time data '16 Sep 2024 10:40:33 GMT' does not match format '%d %b %Y %H:%M:%S GMT'
                                                   
2024-09-16 12:40:45.753 T:740028    info <general>: JELLYFIN.jellyfin_kodi.library -> INFO::jellyfin_kodi/library.py:541 --[ sync/2024-09-16T10:38:45z ]
2024-09-16 12:40:45.756 T:740028    info <general>: JELLYFIN.jellyfin_kodi.full_sync -> INFO::jellyfin_kodi/full_sync.py:227 Full sync completed in: 0:00:59
2024-09-16 12:40:45.759 T:740028    info <general>: JELLYFIN.jellyfin_kodi.full_sync -> INFO::jellyfin_kodi/full_sync.py:703 --<[ fullsync ]

Screenshots

System (please complete the following information):

  • OS: Debian
  • Jellyfin Version: 10.9.11
  • Kodi Version: 21.1.0
  • Addon Version: v1.0.5+py3
  • Playback Mode: Add-On

Additional context

I tested this that seems to work.

6 import threading
    7 from datetime import datetime, timedelta
    8 from email.utils import parsedate_to_datetime
    9 import queue
   10         


 527     def save_last_sync(self):
  528 
  529         try:
  530             time_now = parsedate_to_datetime(
  531                 self.server.config.data["server-time"].split(", ", 1)[1],
  532                 "%d %b %Y %H:%M:%S GMT",
  533             ) - timedelta(minutes=2)
  534         except Exception as error:
  535 
  536             LOG.exception(error)
  537             time_now = datetime.utcnow() - timedelta(minutes=2)
  538 
  539         last_sync = time_now.strftime("%Y-%m-%dT%H:%M:%Sz")
  540         settings("LastIncrementalSync", value=last_sync)
  541         LOG.info("--[ sync/%s ]", last_sync)
  542 
@oddstr13
Copy link
Member

I haven't dug quite deep enough yet, but using ISO time here would be best.
The requests the dates are coming from are presumably handled by the Jellyfin plugin, KodiSyncQueue.

If not already available somewhere in the API response, it should be added either as a custom header, or an additional field, in the plugin.

TODO: Investigate the relevant API responses and their structure, to see if ISO time is provided, or if the structure of the response allows for an additional field containing the server datetime.
Failing that, a response header with the ISO datetime should be added. X-ISO-8601 or some such if no (defacto) standard header for this exists.

Messing with localized time string shall be avoided if at all possible, only trouble lies down that path.

If I could decide, the whole world would switch to ISO 8601, UTC, with no DST. That is how big of a pain this is.

ISO 8601 was published on 06/05/88 and most recently amended on 12/01/04.
https://xkcd.com/1179/

@oddstr13
Copy link
Member

I should note that this failure to parse datetime shouldn't be an issue unless the server is more than 2min out of sync with the client, something that is highly unlikely (should be near impossible) if both the server and client are properly synced with a time server.

The consequence of the time being sufficiently out of sync (for this particular function anyways) is that content update might be missing during parial sync (events that happened in the window between server time and client time, minus the 2min tolerance that is put in the code here via a timedelta).

The error in the log is otherwise harmless, except from potentially causing slightly slower sync times and larger log files (the error is caught, then logged – should probably get changed to only log the message as a warning, not the whole stack-trace as an error).

@prahal
Copy link
Author

prahal commented Sep 17, 2024

I should note that this failure to parse datetime shouldn't be an issue unless the server is more than 2min out of sync with the client, something that is highly unlikely (should be near impossible) if both the server and client are properly synced with a time server.

OK. So I will lower its priority on my TODO.

Messing with localized time string shall be avoided if at all possible, only trouble lies down that path.

If I could decide, the whole world would switch to ISO 8601, UTC, with no DST. That is how big of a pain this is.

I don't understand. The date string from HTTP is not localized, and is already UTC/GMT (only it is not IS0 8601, but is that critical?).
strptime in save_last_sync expects the current month for a user session with a French locale to be sept. (%b) while the HTTP Date header returns a non localized "Sep"' for the month of September.
Thus, I get:
ValueError: time data '16 Sep 2024 10:40:33 GMT' does not match format '%d %b %Y %H:%M:%S GMT'
because strptime expect
16 sept. 2024 10:40:33 GMT for '%d %b %Y %H:%M:%S GMT
It bugs because the date is not localized, not the opposite.

I agree that with an ISO date we would be able to use a localized date parsing function like strptime without an error. Because there would be no string thus no localization (even though I am not confident numbers are always localized as indian/arabic numbers, then ISO would not help with strptime use).
But in my opinion, the issue is not the date format. It is about using strptime to parse a date that is not localized. I don't think these functions are intended to be used in this context (even for an ISO8601 date format).

But I confirm that with this bug, jellyfin-kodi still works. Might be slower.

I believe the best is to include a function to parse HTTP RFC9110 section 5.6.7 date format like django does. But I am uneasy copying and pasting it into jellyfin-kodi as django is BSD 3-Clause, not GPL v3.
We could ask one to rewrite a similar function based on general guidelines, i.e. parse the date with regex placeholders, support 2 digit years. Or even ask django developers for permission to reuse this code.
But now that I saw their code, I believe I cannot write this function myself (probably I should not have pasted the function here to avoid other having the same issue, do you want me to edit it out?).

In the meantime, we could the closest available function in python libraries, email.utils.parsedate_to_datetime if requiring python 3.3 or above is OK.
Or email.utils.parsedate then convert to datetime.
Are you fine with this option?
Or only an ISO 8601 patch would be merged?

I confirm there is no "8601" string in KodiSyncQueue, so probably no API that returns an X-ISO-8601 date.
But as far as I looked, KodiSyncQeue does add or modify HTTP headers.
I believe the HTTP Date header comes from Microsoft ASP dotnet.

@oddstr13
Copy link
Member

@prahal I realize that you probably wanted to work on this, so I apologize for jumping in and doing it myself 🫠

I've implemented a Server-Time header in KodiSyncQueue, in addition to (probably) fixing the parsing of the standard Date header.

Both PRs could do with some testing, especially the one for this repo, as I've yet to test the different code-paths of it, if you're interested in having a look at that.
Feel free to comment on my code too, if you have questions around any of it 🙂

@prahal
Copy link
Author

prahal commented Sep 19, 2024

@prahal I realize that you probably wanted to work on this, so I apologize for jumping in and doing it myself 🫠

I've implemented a Server-Time header in KodiSyncQueue, in addition to (probably) fixing the parsing of the standard Date header.

Both PRs could do with some testing, especially the one for this repo, as I've yet to test the different code-paths of it, if you're interested in having a look at that.
Feel free to comment on my code too, if you have questions around any of it 🙂

You use strptime in your PR which is wrong to me.
setlocale LC_TIME to mangle srtptime to be non localized is global in scope and is bad practice.
Also I see no added value to switching from IMF date to ISO 8601 date format. It only helps if using srtptime and as strptime use to parse non localized date is wrong, I only see that as making the code more complex for the sake of keeping the use of strptime.

You did not answer my comments about iso 8601 use not adding any value or to the use of strptime being wrong. So I cannot guess why you insist on using them. I am no expert especially on python. But this PR seems wrong to me.

@oddstr13
Copy link
Member

oddstr13 commented Sep 19, 2024

The HTTP date format does not make sense for software-to-software communication, it doesn't now, it didn't then.
ISO 8601 (and in this case a strict subset of) makes much more sense when transferring datetime information over the network.

We could use email.utils.parsedate_to_datetime, or the undocumented http.cookiejar.http2time, but I honestly don't trust them to be available.
We could further parse the string "manually", in a new local utility function like the above two, but I'd rather not.

We're not setting LC_TIME system global, only in the current python context (whatever that means in Kodi), and we re-set it to the previous value once done.
This is further intended as a transition mechanism, as people may be stuck with an old version of the KodiSyncQueue plugin, and I fully intend to remove that particular part some time in the future, relying only on the provided ISO time.

@mcarlton00 could I get your take on this?

@mcarlton00
Copy link
Member

Haven't dug super deeply, but from what I can gather from a quick search through the code this is only an issue for the save_last_sync() function, correct? In which case, it literally doesn't matter what format we save it as, or what format it starts as. It could be unix time for all the good it would do. It's just shoved into a config file and never visible by a user, so whatever is easiest and most consistent makes the most sense imo.

I agree HTTP date format doesn't make much sense here. Just more work to parse and has edge cases like this (though how nobody has ran into this issue in the last 5 years is somewhat shocking), plus the fact that no user will ever view this value. Not sure about the locale changing shenanigans in the current implementation. Feels hacky. But as far as I can tell it would also only take effect when the user is running a newer version of the addon with an older version of the server plugin. That bit of code could definitely use some more comments explaining the "why". To me, strptime() isn't the issue, the issue is what the source format is. So changing it to a different format where we only have numbers instead of month names solves that issue.

Using the email library's function should also work, if we for some reason want to keep the HTTP date format. It's in stdlib, so availability shouldn't ever be an issue. So it's basically the age old philosophical question: does it make more sense to import another library to work around something, or just change the something to not require a workaround?

Personally not a fan of copy/pasting django's function here. imo straight copying functions from other projects is a bad habit to get into, because there's always potential for licensing violations (though it seems fine in this case), or not giving proper credit to where we got it. Also using regex for parsing dates is kinda overkill when there's other options.

@oddstr13
Copy link
Member

Updated to use the email library (inline import).
It'll fall back to using client time anyways, if that fails for whatever reason.

@prahal
Copy link
Author

prahal commented Sep 21, 2024

@mcarlton00

Haven't dug super deeply, but from what I can gather from a quick search through the code this is only an issue for the save_last_sync() function, correct? In which case, it literally doesn't matter what format we save it as, or what format it starts as. It could be unix time for all the good it would do. It's just shoved into a config file and never visible by a user, so whatever is easiest and most consistent makes the most sense imo.

I agree HTTP date format doesn't make much sense here. Just more work to parse and has edge cases like this (though how nobody has ran into this issue in the last 5 years is somewhat shocking), plus the fact that no user will ever view this value. Not sure about the locale changing shenanigans in the current implementation. Feels hacky. But as far as I can tell it would also only take effect when the user is running a newer version of the addon with an older version of the server plugin. That bit of code could definitely use some more comments explaining the "why". To me, strptime() isn't the issue, the issue is what the source format is. So changing it to a different format where we only have numbers instead of month names solves that issue.

It requires to read the Kodi logs to see the error. I expect no user does so.
I disagree about HTTP Date being wrong ie my issue is not about last_sync date format saved to config file but about time_now parsing the date from the HTTP header call to the HTTP API. Ie time_now compute the HTTP call header date minus 2 minutes. It seems valid to me to not add extra code to the KodiSynQueue API not to use HTTP date for the sake of it.
Note that this HTTP date is never saved to config.

Only last_sync is saved and its format is orthogonal to the format of the KodiSyncQueue call HTTP header date used to compute time_now.
Ie time_now is a python datetime. Only last_sync is a date string (but its format is an internal choice of jellyfin-kodi that is unrelated to this issue. And even then its date format string seems to include no locale dependent element, even though using strftime locale dependent function for a non localized internal date format is not that good).

Note that for time_now we parse the HTTP date header coming from the ASPnet server KodiSyncQueue runs in. It is definitely wrong to use strptime on an HTTP date.
And I see no point to use another date format in the KodiSyncQueue API when HTTP does the job perfectly and other formats bring no advantages.

Using the email library's function should also work, if we for some reason want to keep the HTTP date format. It's in stdlib, so availability shouldn't ever be an issue. So it's basically the age old philosophical question: does it make more sense to import another library to work around something, or just change the something to not require a workaround?

Remains the question if we can use email.utils.parsedate_to_datetime (python >= 3.3) or if we should use email.utils.parsedate then convert it to datetime.
The first is more elegant but I don"t know if the plugin is expected to work with python 3.2 or below.

Personally not a fan of copy/pasting django's function here. imo straight copying functions from other projects is a bad habit to get into, because there's always potential for licensing violations (though it seems fine in this case), or not giving proper credit to where we got it. Also using regex for parsing dates is kinda overkill when there's other options.

One don't need to copy paste it. All in all it is a regex with placeholders. But to avoid contamination, one would need to have someone that did not see the django imploementation to reimplement it based on instruction. Cumbersome but still.
And there is no proper function to parse HTTP Date in python, even only IMF in python. I don't think django devs added this regex for compatibility reasons.
I find parsing a non localized string with a regex quite good.
In fact, what is great about the regex in django function is the use of regex placeholder to make the code heavily readable.

quiproquo:
I believe you are talking about the last_sync variable in save_last_sync format.
My bug is about the time_now format that is read in jellyfin/http.py from the HTTP header "Date" which is abviously in the standard HTTP date format ie IMF.
This HTTP date is converted to python datetime.
The bug is that we convert this date witrh strptime and "%d %b %Y %H:%M:%S GMT" which only works for user having an English locale (ie %b is the month in a local dependent form).
Ie we don't same this HTTP date string. We used it to compute the time_now as this date minus 2 minutes.

last_sync is also using a localized date function strftime which is in theory wrong but ... well. But as it is in the date format "%Y-%m-%dT%H:%M:%Sz" it should not have n adverse impact out of bad practice. Ie there is no string that can change between locales.

The format last_sync is saved in could have matter if it was localized. , and if the python code changed locale between calls to the API (which is one reason more, the mangling of LC_TIME via setlocale to get strptime to not work as localized would be bad. It could affect all other date localized calls in the Kodi space).
But it is not localized, a date format.

This is not the case for the strptime call to compute time_now which use %b.
All in all I thought it was more critical a bug, ie if the locale is not English an exception is raised and time_now falls back to be the current time minus 2 minutes instead of the HTTP call date time minus 2 minutes.

Still, it is confusing and hinder debugging works. Better get it fixed.
In fact, I stumbled upon while debugging an issue that is likely a jellyfin 10.9 one. But it looked like an easy hanging fruit and I believe it was more useful to fix then.

@prahal
Copy link
Author

prahal commented Sep 21, 2024

We're not setting LC_TIME system global, only in the current python context (whatever that means in Kodi), and we re-set it to the previous value once done.

Are you sure? I was confident setlocale effect was global and that was the whole point its use was heavily discouraged. Or do you mean the plugin runs in its own process.
Or do you mean it does not affect other software programs than Kodi on the system?

Note: email.utils.parsedate or regex parsing as django would work with previous KodiSyncQueue API. Switching to ISO 8601 is a goal but totally orthogonal to this bug report.
Better keep these as two separate PRs.

@prahal
Copy link
Author

prahal commented Sep 21, 2024

Could you make the two changes in two separate PRs?
That is the fix for time_now and the switch to IS8601?

The format we parse KodiSyncQueue date from and the format last_sync is stored as are different matters.
They could both be switched to ISO8601 but these are different functional changes.

@oddstr13
Copy link
Member

I see no benefit in splitting it up to separate PRs, and that would create extra work for @mcarlton00 and myself.
They are related changes, and there wouldn't be a separate addon release for one without the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants