Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

korlibs.time parsing millis precision issue #2197

Closed
ArtRoman opened this issue Mar 11, 2024 · 17 comments · Fixed by korlibs/korlibs#20
Closed

korlibs.time parsing millis precision issue #2197

ArtRoman opened this issue Mar 11, 2024 · 17 comments · Fixed by korlibs/korlibs#20
Labels

Comments

@ArtRoman
Copy link
Contributor

ArtRoman commented Mar 11, 2024

I've found an issue with parsing milliseconds using custom format with optional ([.S]).

0.1 sec is 100 ms, but parsing as 1 millisecond;
0.10 sec is 100 ms, but parsing as 10 milliseconds;
0.100 sec is 100 ms, parsing as 100 milliseconds, that's OK.

    @Test
    fun testMillisKorlibs() {
        //val format = ISO8601.DATETIME_UTC_COMPLETE_FRACTION // Can't set variable size for fraction of second
        val format = DateFormat("yyyy-MM-dd'T'HH:mm[:ss[.S]]Z").withOptional()
        val date = korlibs.time.DateTime(2020, 1, 1, 13, 12, 30, 100)

        assertEquals(1, format.parseUtc("2020-01-01T13:12:30.001Z").milliseconds) //1
        assertEquals(10, format.parseUtc("2020-01-01T13:12:30.010Z").milliseconds) //10
        assertEquals(100, format.parseUtc("2020-01-01T13:12:30.100Z").milliseconds) //100
        assertEquals(100, format.parseUtc("2020-01-01T13:12:30.1Z").milliseconds) // ❌ parsed as 1
        assertEquals(100, format.parseUtc("2020-01-01T13:12:30.10Z").milliseconds) // ❌ parsed as 10
        assertEquals(100, format.parseUtc("2020-01-01T13:12:30.100Z").milliseconds) //100

        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.1Z")) // ❌ parsed as 2020-01-01T13:12:30.001Z
        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.10Z")) // ❌ parsed as 2020-01-01T13:12:30.010Z
        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.100Z"))
        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.1000Z"))
        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.10000Z"))
        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.100000Z"))
        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.1000000Z"))
        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.10000000Z")) // ❌ parsed as 2020-01-01T13:12:30.099Z
        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.100000000Z"))
        // Out of precision
        assertEquals(date, format.parseUtc("2020-01-01T13:12:30.1000000000Z"))

        assertNotEquals(date, format.parseUtc("2020-01-01T13:12:30.01Z"))
        assertNotEquals(date, format.parseUtc("2020-01-01T13:12:30.001Z"))
        assertNotEquals(date, format.parseUtc("2020-01-01T13:12:30.0001Z"))
        assertNotEquals(date, format.parseUtc("2020-01-01T13:12:30.00001Z"))
        assertNotEquals(date, format.parseUtc("2020-01-01T13:12:30.000001Z"))
        assertNotEquals(date, format.parseUtc("2020-01-01T13:12:30.0000001Z"))
        assertNotEquals(date, format.parseUtc("2020-01-01T13:12:30.00000001Z"))
        assertNotEquals(date, format.parseUtc("2020-01-01T13:12:30.000000001Z"))
        // Out of precision
        assertNotEquals(date, format.parseUtc("2020-01-01T13:12:30.0000000001Z"))
    }

Tested on com.soywiz.korlibs.korio:korio:4.0.10, com.soywiz.korge:korlibs-time:5.3.2 and 5.4.0.

For example, Java Time and kotlinx.datetime same parsing is OK, but I used existing format for each test, because I can't create format from string with variable length for milliseconds:

    @Test
    fun testMillisJavaTime() {
        //val format = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm[:ss[.S]]X") // Can't set variable size for fraction of second
        val format = DateTimeFormatter.ISO_OFFSET_DATE_TIME
        val date = ZonedDateTime.of(2020, 1, 1, 13, 12, 30, 100_000_000, ZoneOffset.UTC) // 100 millis

        // ZonedDateTime:
        assertEquals(date, format.parse("2020-01-01T13:12:30.1Z", ZonedDateTime::from))
        assertEquals(date, format.parse("2020-01-01T13:12:30.10Z", ZonedDateTime::from))
        assertEquals(date, format.parse("2020-01-01T13:12:30.100Z", ZonedDateTime::from))
        assertEquals(date, format.parse("2020-01-01T13:12:30.1000Z", ZonedDateTime::from))
        assertEquals(date, format.parse("2020-01-01T13:12:30.10000Z", ZonedDateTime::from))
        assertEquals(date, format.parse("2020-01-01T13:12:30.100000Z", ZonedDateTime::from))
        assertEquals(date, format.parse("2020-01-01T13:12:30.1000000Z", ZonedDateTime::from))
        assertEquals(date, format.parse("2020-01-01T13:12:30.10000000Z", ZonedDateTime::from))
        assertEquals(date, format.parse("2020-01-01T13:12:30.100000000Z", ZonedDateTime::from))

        assertFailsWith<DateTimeParseException> {
            // Out of precision: Text '2020-01-01T13:12:30.1000000000Z' could not be parsed at index 29
            format.parse("2020-01-01T13:12:30.1000000000Z", ZonedDateTime::from)
        }

        assertNotEquals(date, format.parse("2020-01-01T13:12:30.01Z", ZonedDateTime::from))
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.001Z", ZonedDateTime::from))
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.0001Z", ZonedDateTime::from))
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.00001Z", ZonedDateTime::from))
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.000001Z", ZonedDateTime::from))
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.0000001Z", ZonedDateTime::from))
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.00000001Z", ZonedDateTime::from))
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.000000001Z", ZonedDateTime::from))

        assertFailsWith<DateTimeParseException> {
            // Out of precision: Text '2020-01-01T13:12:30.0000000001Z' could not be parsed at index 29
            format.parse("2020-01-01T13:12:30.0000000001Z", ZonedDateTime::from)
        }
    }

    @Test
    fun testMillisKotlinxDateTime() {
        //val format = DateTimeComponents.Format { byUnicodePattern("yyyy-MM-dd'T'HH:mm[:ss[.SSS]]X") } // Can't set variable size for fraction of second
        val format = DateTimeComponents.Formats.ISO_DATE_TIME_OFFSET
        val date = LocalDateTime(2020, 1, 1, 13, 12, 30, 100_000_000) //100 millis

        // DateTimeFormat:
        assertEquals(date, format.parse("2020-01-01T13:12:30.1Z").toLocalDateTime())
        assertEquals(date, format.parse("2020-01-01T13:12:30.10Z").toLocalDateTime())
        assertEquals(date, format.parse("2020-01-01T13:12:30.100Z").toLocalDateTime())
        assertEquals(date, format.parse("2020-01-01T13:12:30.1000Z").toLocalDateTime())
        assertEquals(date, format.parse("2020-01-01T13:12:30.10000Z").toLocalDateTime())
        assertEquals(date, format.parse("2020-01-01T13:12:30.100000Z").toLocalDateTime())
        assertEquals(date, format.parse("2020-01-01T13:12:30.1000000Z").toLocalDateTime())
        assertEquals(date, format.parse("2020-01-01T13:12:30.10000000Z").toLocalDateTime())
        assertEquals(date, format.parse("2020-01-01T13:12:30.100000000Z").toLocalDateTime())

        assertFailsWith<IllegalArgumentException> {
            // Out of precision: Failed to parse value from '2020-01-01T13:12:30.1000000000Z'
            assertEquals(date, format.parse("2020-01-01T13:12:30.1000000000Z").toLocalDateTime())
        }

        assertNotEquals(date, format.parse("2020-01-01T13:12:30.01Z").toLocalDateTime())
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.001Z").toLocalDateTime())
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.0001Z").toLocalDateTime())
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.00001Z").toLocalDateTime())
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.000001Z").toLocalDateTime())
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.0000001Z").toLocalDateTime())
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.00000001Z").toLocalDateTime())
        assertNotEquals(date, format.parse("2020-01-01T13:12:30.000000001Z").toLocalDateTime())

        assertFailsWith<IllegalArgumentException> {
            // Out of precision: Failed to parse value from '2020-01-01T13:12:30.0000000001Z'
            assertNotEquals(date, format.parse("2020-01-01T13:12:30.0000000001Z").toLocalDateTime())
        }
    }
@soywiz
Copy link
Member

soywiz commented Mar 13, 2024

Hi @ArtRoman thanks for reporting and for the highly detailed explanation with tests. It was really useful.

I have been investigating and it seems that klock was trying to incorrectly handle S to support multiple milliseconds. But the semantics couldn't work as expected.
As a single S is interpreted as the lowest part of the millis in Java.
After thinking a bit about it I have come with a solution that works keep the current semantics while still allows that usecase.
I have added a S* pattern to handle this usecase. Please, feel free to check the PR for details.
It could also be S+ to also be simila to how regexp work.
I have added this without extra flags as I believe * is not widely used in date formats, so I believe there won't be conflicts.
In the case conflicts could appear, we could add another flag as we did with the optional [ + ] support.

Can you check if this works for you before merging? We can still make adjustments if required.

Commit here: f3bf1c2

@ArtRoman
Copy link
Contributor Author

Hello Carlos. Sorry for the late reply. Yes, the issue was in incorrect handling of milliseconds while parsing, because you don't know the required precision for S. Your PR that adds S* and DateFormat.ISO_DATE_TIME_OFFSET fixes the issue 👍

But the issue with S still remains. I think .1 should be parsed as 100 msec and then encoded back to .1 or even .100. I'm talking about this situation:

        val formatMs = DateFormat(".S")
        assertEquals(1, formatMs.parseUtc(".001").milliseconds)
        assertEquals(10, formatMs.parseUtc(".01").milliseconds) // ❌ parsed as 1 ms
        assertEquals(100, formatMs.parseUtc(".1").milliseconds) // ❌ parsed as 1 ms

        assertEquals(".1", DateTime.fromUnixMillis(100).toString(formatMs))
        assertEquals(".1", formatMs.parseUtc(".1").toString(formatMs)) // ❌ outputs .0

The last example shows the broken idempotence, when the same pairs of operations like parse/stringify changes the content (formatting is OK, issue is only in parsing).

Or did you keep this intentionally for compatibility? I see your TODO mark. Both Java Time and kotlinx.datetime doesn't have that flexible ability to use single litter in pattern for various precision, so it will be a breaking change for the library. But it will also unify the behavior with others libraries 😀

I really don't know which solution would be better, especially in the long term.

@soywiz
Copy link
Member

soywiz commented Mar 25, 2024

Oh I see. In that case I'm okay with breaking compatibility if it was a bug. We could create a feature flag or something to enable the old behaviour, but maybe people is either not using it that way, or getting wrong results, so I think it makes sense to keep it simple and just break compatibility this time, specially with the idempotent point.

Also I believe the ISO8601 class that is providing the functionality similar to the Java and kotlinx-time you shown in the example, could be simplified to use the [:S*] thing from DateFormat.
Depending on the objective, this static could be used:

val DATETIME_UTC_COMPLETE_FRACTION = IsoDateTimeFormat("YYYYMMDDThhmmss.sssZ", "YYYY-MM-DDThh:mm:ss.sssZ")

Regarding compatibility with other libraries, I guess most people will switch to kotlinx-time at some point if didn't do already. But let's fix that at least here. I also added a DateFormat.Companion.ISO_DATE_TIME_OFFSET so it is more discoverable and similar to DateTimeComponents.Formats.ISO_DATE_TIME_OFFSET.

As an extra comment: Klock, the former korlibs-time, was initially designed using C# API as reference + some property extensions instead of the Java one as I believed the C# API was much better than the Java one. https://learn.microsoft.com/en-us/dotnet/api/system.datetime?view=net-8.0 . Then it was evolved using people's feedback. So it didn't pretend to be closer to the Java one but the C# one.

If the point on this is converging the S* might not be a good idea. I'm open to discuss it if you believe it makes sense.

@ArtRoman
Copy link
Contributor Author

In that case I'm okay with breaking compatibility if it was a bug

I see S* resolves the issue, but S continues to consume variable length, and handles it incorrectly. Why you handle S* and S...SSSSSSSSS in different ways? I see two variants:

  1. return S to handle only one digit, it will resolve parsing, but may break code for anyone who uses custom format;
  2. handle value the same way as for S* (like a fraction). It will unify the handling and will resolve the issue.

Also I believe the ISO8601 class that is providing the functionality similar to the Java and kotlinx-time you shown in the example, could be simplified to use the [:S*] thing from DateFormat.

I agree, it will be convenient to use for parsing and formatting, same as with DateFormat.ISO_DATE_TIME_OFFSET. I can't use ISO8601.DATETIME_UTC_COMPLETE_FRACTION now because it requires at least 3 digits for fraction of seconds. But as you remember, default DateTime converter in Newtonsoft.Json has 1-7 digits for millis without trailing zeroes.

If the point on this is converging the S* might not be a good idea.

S* fixes the issue above, and I think it's a good idea to satisfy all possible requests.

@soywiz
Copy link
Member

soywiz commented Apr 19, 2024

Sorry for the delay. I have backported the PR as it was to the new codebase.

I have been investigating how it works on the JVM:

image

@soywiz
Copy link
Member

soywiz commented Apr 19, 2024

So basically on the JVM it seems to completely ignore the number of S for parsing. And for parsing it is converted into an integer and that's the number of milliseconds, even if it overflows the 999.

So I guess this would fail on the JVM:

assertEquals(100, format.parseTime("13:12:30.1").millisecond) // ❌ parsed as 1

So maybe the S for milliseconds is not suitable and we might need to keep a S* with different semantics for the decimal part.

I'm going to be consistent with the JVM regarding S and will keep the S* for the fractional variant

@soywiz
Copy link
Member

soywiz commented Apr 19, 2024

Regarding formatting this is how JVM works:

image

@soywiz
Copy link
Member

soywiz commented Apr 19, 2024

@ArtRoman I have implemented here what I believe is the best option for what I have explored with the JVM:

korlibs/korlibs#20

  • S* interprets the value as a fraction as it is expected in ISO like a fraction of seconds, so ISO can be implemented with S*
  • S/SS/SSS... works as the JVM. It just parses as it as an integer and uses that as milliseconds so format S with the value 1 is not 100 milliseconds, but 1 millisecond. This is how the JVM behaves, and that's why it is not possible to use S for sub seconds like that.

I'll keep that PR open so you can validate if it makes sense or I missed something.

@soywiz
Copy link
Member

soywiz commented May 13, 2024

@ArtRoman how are you doing? I have plans to make a major release soon of the libraries. I would like to include this change on the release. Could you have a look?

@ArtRoman
Copy link
Contributor Author

@soywiz hello. Sorry for the delay, I've missed UI notifications here. I'm going to take a look now.

It's interesting that original issue of kotlinx-datetime has similar questions about consistency of serialization, with links to Java8 LocalDateTime, FasterXML and Python analogies. It also was a bug in Java8 with parsing millis in custom format. So it's a really complicated issue! 😀

@soywiz
Copy link
Member

soywiz commented May 14, 2024

So from my perspective from what I investigated, the original Java API is misleading. Milliseconds can be only from 0 to 999. Being able to provide just one or two digit for milliseconds is wrong, and more than 3 misleading.

For me the ideal solution in an ideal world would be to provide seconds as a "double" with fraction part, but the problem is that in some locales . or , might have different meanings regarding numeric separators (for 1000 or 10000) or the fractional separator, and likely other rules I'm not aware of.
So I decided to separate the fractional part in the format so different fraction separators can be used.

In the case of the korlibs-time, since we support optional parts, I believe ISO could be implemented with the formatter. Since format interoperability was not a goal here, I believe it should be fine.

@ArtRoman
Copy link
Contributor Author

  • S/SS/SSS... works as the JVM. It just parses as it as an integer and uses that as milliseconds so format S with the value 1 is not 100 milliseconds, but 1 millisecond. This is how the JVM behaves, and that's why it is not possible to use S for sub seconds like that.

Sorry, I didn't get that. Which classes do you mention here for parsing/formatting?

S in Java DateTimeFormatter is "fraction of second", acts as nanoseconds (not milliseconds), with truncating by default for output. It produses LocalDateTime as result of parsing using this formatter, and it has only nano getter, which can be from 0 to 999,999,999.

    @Test
    fun javaLocalDateTimeFormatterTest() {
        DateTimeFormatter.ISO_OFFSET_DATE_TIME.testJavaLocalDateTimeMs("2020-01-01T13:12:30.1Z") //100000000 ns = 100 ms
        DateTimeFormatter.ISO_OFFSET_DATE_TIME.testJavaLocalDateTimeMs("2020-01-01T13:12:30.01Z") //10000000 ns = 10 ms
        DateTimeFormatter.ISO_OFFSET_DATE_TIME.testJavaLocalDateTimeMs("2020-01-01T13:12:30.001Z") //1000000 ns = 1 ms
        DateTimeFormatter.ISO_OFFSET_DATE_TIME.testJavaLocalDateTimeMs("2020-01-01T13:12:30.0001Z") //100000 ns = 0,1 ms

        // Truncated zeroes while formatting:
        assertFailsWith<AssertionError> {
            DateTimeFormatter.ISO_OFFSET_DATE_TIME.testJavaLocalDateTimeMs("2020-01-01T13:12:30.10Z") //100000000 ns = 100 ms
            //expected:<...020-01-01T13:12:30.1[0]Z> but was:<...020-01-01T13:12:30.1[]Z>
        }
        assertFailsWith<AssertionError> {
            DateTimeFormatter.ISO_OFFSET_DATE_TIME.testJavaLocalDateTimeMs("2020-01-01T13:12:30.100Z") //100000000 ns = 100 ms
            //expected:<...020-01-01T13:12:30.1[00]Z> but was:<...020-01-01T13:12:30.1[]Z>
        }
        assertFailsWith<AssertionError> {
            DateTimeFormatter.ISO_OFFSET_DATE_TIME.testJavaLocalDateTimeMs("2020-01-01T13:12:30.1000Z") //100000000 ns = 100 ms
            //expected:<...020-01-01T13:12:30.1[000]Z> but was:<...020-01-01T13:12:30.1[]Z>
        }

        DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SX").testJavaLocalDateTimeMs("2020-01-01T13:12:30.1Z")
        DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSX").testJavaLocalDateTimeMs("2020-01-01T13:12:30.10Z")
        DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSX").testJavaLocalDateTimeMs("2020-01-01T13:12:30.100Z")
        DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSSX").testJavaLocalDateTimeMs("2020-01-01T13:12:30.1000Z")
    }

    private fun DateTimeFormatter.testJavaLocalDateTimeMs(dateStr: String) {
        val parsed = java.time.LocalDateTime.parse(dateStr, this)
        val formatted = this.format(parsed.atOffset(ZoneOffset.UTC))
        println("$dateStr parsed as $parsed (${parsed.nano} ns), formatted: $formatted")
        assertEquals(dateStr, formatted)
    }

@ArtRoman
Copy link
Contributor Author

ArtRoman commented May 14, 2024

  • S* interprets the value as a fraction as it is expected in ISO like a fraction of seconds, so ISO can be implemented with S*

It's all OK with S*, as I've checked. It's a small breaking change if anyone is using custom pattern with SSS values, but it's a good chance to switch to DateFormat.ISO_DATE_TIME_OFFSET or to use S* in patterns. I believe you'll mention that in docs.

  • S/SS/SSS... works as the JVM. It just parses as it as an integer and uses that as milliseconds so format S with the value 1 is not 100 milliseconds, but 1 millisecond. This is how the JVM behaves, and that's why it is not possible to use S for sub seconds like that.

Here I've found some inconsistency in compare to JVM:

    @Test
    fun testMillisKorlibsNew() {
        assertEquals(1, DateFormat.ISO_DATE_TIME_OFFSET.parseUtc("2020-01-01T13:12:30.001Z").milliseconds)
        assertEquals(10, DateFormat.ISO_DATE_TIME_OFFSET.parseUtc("2020-01-01T13:12:30.010Z").milliseconds)
        assertEquals(100, DateFormat.ISO_DATE_TIME_OFFSET.parseUtc("2020-01-01T13:12:30.100Z").milliseconds)
        assertEquals(100, DateFormat.ISO_DATE_TIME_OFFSET.parseUtc("2020-01-01T13:12:30.1Z").milliseconds)
        assertEquals(100, DateFormat.ISO_DATE_TIME_OFFSET.parseUtc("2020-01-01T13:12:30.10Z").milliseconds)
        assertEquals(100, DateFormat.ISO_DATE_TIME_OFFSET.parseUtc("2020-01-01T13:12:30.100Z").milliseconds)
        assertEquals(100, DateFormat.ISO_DATE_TIME_OFFSET.parseUtc("2020-01-01T13:12:30.1000Z").milliseconds)

        assertEquals(1, DateFormat("yyyy-MM-dd'T'HH:mm:ss.S*Z").parseUtc("2020-01-01T13:12:30.001Z").milliseconds)
        assertEquals(10, DateFormat("yyyy-MM-dd'T'HH:mm:ss.S*Z").parseUtc("2020-01-01T13:12:30.010Z").milliseconds)
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.S*Z").parseUtc("2020-01-01T13:12:30.100Z").milliseconds)
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.S*Z").parseUtc("2020-01-01T13:12:30.1Z").milliseconds)
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.S*Z").parseUtc("2020-01-01T13:12:30.10Z").milliseconds)
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.S*Z").parseUtc("2020-01-01T13:12:30.100Z").milliseconds)
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.S*Z").parseUtc("2020-01-01T13:12:30.1000Z").milliseconds)

        assertEquals(1, DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ").parseUtc("2020-01-01T13:12:30.001Z").milliseconds)
        assertEquals(10, DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ").parseUtc("2020-01-01T13:12:30.010Z").milliseconds)
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ").parseUtc("2020-01-01T13:12:30.100Z").milliseconds)
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ").parseUtc("2020-01-01T13:12:30.1Z").milliseconds) //❌ S, SS and SSS parses .1 as 1ms
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ").parseUtc("2020-01-01T13:12:30.10Z").milliseconds) //❌ S, SS and SSS parses .10 as 10ms
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ").parseUtc("2020-01-01T13:12:30.100Z").milliseconds)
        assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSSZ").parseUtc("2020-01-01T13:12:30.1000Z").milliseconds) //❌ expected:<100> but was:<0>
    }
  1. korlibs allows to parse a smaller count of digits than specified in format, but I think it's OK.
    JVM will throw an error in this case:
DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSX").testJavaLocalDateTimeMs("2020-01-01T13:12:30.1Z")
//'2020-01-01T13:12:30.1Z' could not be parsed at index 20
  1. korlibs is still parsing .1 ms as 1 ms instead of 100 ms, and 0.10 as 10 ms instead of 100ms. That's not correct in terms of fraction of second.

  2. .1000 parsed as 0 ms instead of 100 ms (the format is SSSS). That's not correct, too.
    It outputs formatted as 2020-01-01T13:12:31.0Z, so korlibs parsed the last part as 1000 millis = 1 second. As I see, there is no difference in the number of S in the pattern now.

I think that your way parsing S* is more correct and you should use the same for S…, just taking number of first digits corresponding to the count of S in the pattern.

@soywiz
Copy link
Member

soywiz commented May 14, 2024

  • S/SS/SSS... works as the JVM. It just parses as it as an integer and uses that as milliseconds so format S with the value 1 is not 100 milliseconds, but 1 millisecond. This is how the JVM behaves, and that's why it is not possible to use S for sub seconds like that.

Sorry, I didn't get that. Which classes do you mention here for parsing/formatting?

S in Java DateTimeFormatter is "fraction of second", acts as nanoseconds (not milliseconds), with truncating by default for output. It produses LocalDateTime as result of parsing using this formatter, and it has only nano getter, which can be from 0 to 999,999,999.

This looks like the root of the confusions.
I was using and having into consideration this time (and in screenshots) SimpleDateFormat where S is milliseconds.
By checking the DateTimeFormatter that one looks much more reasonable.
Since I did this like a lot of years ago I don't remember, but probably me or someone else used SimpleDateFormat initially, then someone tried the later, and stuff messed up.

Then I guess we can change S* to just S, and drop the old behaviour. I have seen that also the DateTimeFormatter supports the optional chunks [.

@soywiz
Copy link
Member

soywiz commented May 14, 2024

    assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ").parseUtc("2020-01-01T13:12:30.1Z").milliseconds) //❌ S, SS and SSS parses .1 as 1ms
    assertEquals(100, DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ").parseUtc("2020-01-01T13:12:30.10Z").milliseconds) //❌ S, SS and SSS parses .10 as 10ms

Fixed: korlibs/korlibs@582b37c

  1. korlibs allows to parse a smaller count of digits than specified in format, but I think it's OK.
    JVM will throw an error in this case:
DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSSX").testJavaLocalDateTimeMs("2020-01-01T13:12:30.1Z")
//'2020-01-01T13:12:30.1Z' could not be parsed at index 20

Since I have removed S*, I believe it should be okay like that. Accepting variable number of digits was something supported by SimplerDateTime. Now the implementation is a mix of Simpler+DateTimeFormatter in that regard.

  1. korlibs is still parsing .1 ms as 1 ms instead of 100 ms, and 0.10 as 10 ms instead of 100ms. That's not correct in terms of fraction of second.

That was the SimplerDateTime misconception. It should be fixed now with plain S, SS...

  1. .1000 parsed as 0 ms instead of 100 ms (the format is SSSS). That's not correct, too.
    It outputs formatted as 2020-01-01T13:12:31.0Z, so korlibs parsed the last part as 1000 millis = 1 second. As I see, there is no difference in the number of S in the pattern now.

I think that your way parsing S* is more correct and you should use the same for S…, just taking number of first digits corresponding to the count of S in the pattern.

So I guess comparing to DateTimeFormatter the only discrepance regarding this should be allowing to parse a variadic number of digits. Also nanoseconds are not supported. Only up to milliseconds.

I believe this should be enough. Going to merge it. Feel free to reopen it if you believe it makes sense or if I missed something.

@github-project-automation github-project-automation bot moved this from Pending to Done in Korlibs May 14, 2024
@ArtRoman
Copy link
Contributor Author

For me the ideal solution in an ideal world would be to provide seconds as a "double" with fraction part, but the problem is that in some locales . or , might have different meanings regarding numeric separators (for 1000 or 10000) or the fractional separator, and likely other rules I'm not aware of.

I can't find any example of using , as separator in datetime representation. At least I've found that TimeSpan in C# for German locale has , separator, but I can't reproduce similar output for DateTime, so it shouldn't be an issue for you. I believe it should be dot in ISO 8601 standard.

var timespan = new TimeSpan(12345678);
Console.WriteLine(timespan.ToString("g", new CultureInfo("de"))); // 0:00:01,2345678

var datetime = DateTime.UtcNow;
Console.WriteLine(datetime.ToString("o", new CultureInfo("de"))); // 2024-05-14T21:38:57.5422080Z

And at last I want to notice multiple formats with the similar semantics:

DateFormat.FORMAT2 // DateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ")
DateFormat.ISO_DATE_TIME_OFFSET // DateFormat("yyyy-MM-dd'T'HH:mm[:ss[.S]]Z").withOptional()
ISO8601.DATETIME_UTC_COMPLETE_FRACTION // IsoDateTimeFormat("YYYYMMDDThhmmss.sssZ", "YYYY-MM-DDThh:mm:ss.sssZ")

The second one, for example, will lose precision due to outputting only one milliseconds digit when using as parameter in DateTime.toString(format: DateFormat), but it's best for parsing because of optional fields.
And the first with the last doesn't make any difference for parsing/stringifying.

@soywiz
Copy link
Member

soywiz commented May 16, 2024

@ArtRoman

The whole picture after all the changes I did yesterday:

  • Since SSSS... is for fraction of second right now and it seems to be standard somehow, the concern about punctuation and seconds as double it not relevant anymore, for me at least
  • I have created a new DateComponents and DateComponentsFormat that supports parsing Dates, Times and DateTimeSpans (including month spans) and that is used in the underlying final implementation. DateComponents also support nanosecond precision so can be used as fallback since korlibs DateTime is backed by a Double for JS performance sake
  • DateComponents supports converting back and forth from DateTime, Duration and DateTimeSpan, and DateComponentsFormat support converting it into DateFormat, TimeFormat and DateTimeSpanFormat
  • I have kept the typical pattern and the ISO patterns separately, so it is possible to provide formats in both. Now there is a single ISO parser for dates, times and spans. The rest of the
  • I have also provided a generic ISO parser that should be able to parse any typical ISO date the best it can. It is used in the underlying ISO8601 parsers.
  • Since I have other priorities right now and I have to choose and I'm not getting paidM though I have added a few tests, and klock have a good set of tests, the coverage might not be super high
  • The decision here is to be as strict as possible while formatting, and somehow lax while parsing. So it is not going to fail on invalid dates like a strict parser would do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
2 participants