-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Velox doesn't support legacy date format behavior in Spark SQL #10354
Comments
@NEUpanning, is Spark's legacy policy widely used in your production environment? |
@PHILO-HE Yes. In Spark upgrades, we need to use legacy policy for backward compatibility rather than pushing users to accept the new behavior. Here is part of legacy policies we are using :
|
@NEUpanning: we have recently added support for different date/timestamp parsing semantics; are these not enough to follow the Spark parsing semantic you described? If not, could you highlight some of the differences? https://github.com/facebookincubator/velox/blob/main/velox/type/TimestampConversion.h#L48 Cc: @mbasmanova |
@pedroerp Thanks for your reply. |
@NEUpanning https://github.com/facebookincubator/velox/blob/main/velox/functions/lib/DateTimeFormatter.h#L26 |
@pedroerp I'd like to add a new |
Sounds reasonable. Do you have any examples on how the parsing compares with Joda parsing? Cc: @mbasmanova who recently enhanced this part of the code. |
@pedroerp Here is an example I wrote from scratch. code :
output :
|
@pedroerp I'm implementing new DateTimeFormatterType and i see DateTimeFormatSpecifier doesn't support |
@NEUpanning A separate PR would be nice. Smaller PRs are easier to review. Thanks. |
Here is the difference between joda and SimpleDateFormat:
I'm in favor of this approach too, because there are conflicting parts between the two formats. cc @rui-mo, @NEUpanning |
@ccat3z Thanks for providing the details. Sounds good to me generally. What are the functions that are impacted by this date format issue, if you could list them? |
|
We encountered more result mismatch issues due to Velox joda Dateformatter not aligning with Spark SimpleDateFormat. I did some research on Spark SimpleDateFormat. And I updated the issue description to include the main difference between Spark SimpleDateFormat and Velox Joda and the tasks to resolve this issue. |
Summary: Introduce new DateTimeFormatterType called 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' that are used when Spark legacy time parser policy is enabled for java.text.SimpleDateFormat in lenient and non-lenient mode. The implementation of 'LENIENT_SIMPLE' and 'STRICT_SIMPLE' is just copy from Joda in this PR and further PR will change the behavior to align with Spark. Spark functions using strict mode(lenient=false): 'from_unixtime', 'unix_timestamp', 'make_date', 'to_unix_timestamp', 'date_format'. Spark functions using lenient mode: cast timestamp to string. 'casting timestamp to string' will use LENIENT_SIMPLE only after the behavior of LENIENT_SIMPLE is aligned with Spark since it does not use Joda DateFormatter to do cast. Relates #10354 Pull Request resolved: #10966 Reviewed By: xiaoxmeng Differential Revision: D63261575 Pulled By: Yuhta fbshipit-source-id: 20ebdc1ad38a43d7064e5c232c9d52d361b7f474
Summary: `java.text.SimpleDateFormat` supports using 'week of month' to parse/format date. The specifier of 'week of month' is 'W'. Now DateTimeFormatter supports 3 group of fields specifying the day within the year. They are following combinations: ``` year + week + dayOfWeek year + dayOfYear year + month + day ``` This PR introduces a new combination that is `year + month + weekOfMonth + dayOfWeek` and adds support for "week of month" in SimpleDateTimeFormatter. Relates issue : #10354 Pull Request resolved: #11103 Reviewed By: Yuhta, amitkdutta Differential Revision: D64920551 Pulled By: pedroerp fbshipit-source-id: 3db9b6d33783aac0ee41791aeac96142e63fb22a
… used (#11131) Summary: For Spark functions 'unix_timestamp', 'from_unixtime' and 'get_timestamp', returns null for invalid datetime format when legacy date formatter is used. Relating to #10354. Pull Request resolved: #11131 Reviewed By: xiaoxmeng Differential Revision: D65517767 Pulled By: mbasmanova fbshipit-source-id: 863d3955caa64317c306458ff917e13e2593bc8f
…1386) Summary: The Spark legacy datetime formatter allows parsing date from incomplete text, seeing [code link](https://github.com/openjdk/jdk8/blob/master/jdk/src/share/classes/java/text/DateFormat.java#L351). This PR enables partial date parsing when the `LENIENT_SIMPLE` or `STRICT_SIMPLE` datetime formatter is used. Relates issues: #10354, [gluten#6227](apache/incubator-gluten#6227) Pull Request resolved: #11386 Reviewed By: pedroerp Differential Revision: D65948039 Pulled By: Yuhta fbshipit-source-id: 0d17084f723ebeaded7278178982b5a10d9f9fed
Bug description
For backward capability of parsing/formatting of timestamp/date strings expression behavior and adoption of new behaviors. Spark adds a setting
spark.sql.legacy.timeParserPolicy
to indicate whether using legacy behavior. In legacy behavior, Spark does parse/format date using SimpleDateFormat that has different standard and behavior with Velox Joda. For example, SimpleDateFormat doesn't performs strict checking for expression's input and Velox doesn't support this when usingunix_timestamp
function, there is a related gluten issue : link. There are more result mismatch cases include gluten#7109 gluten#7069The main difference between Spark SimpleDateFormat and Velox Joda
SimpleDateFormat Letter vs Joda (Velox) Letter
Range of field value
SimpleDateFormat in non-lenient mode:
Behavior for invalid format
Calendar supporting
SimpleDateFormat supports both the Julian and Gregorian calendar systems with the support of a single discontinuity. But Velox Joda only supports Gregorian Calendar.
Behavior when not cosuming entire string
spark result : result value
gluten result : Error
gluten issue
Obey the number of pattern letters for parsing
SimpleDateFormat obeys the number of pattern letters for parsing when the next pattern is numeric field, but Velox always obeys the number of pattern letters.
Road map
spark.sql.legacy.timeParserPolicy
isLEGACY
gluten#7375The text was updated successfully, but these errors were encountered: