Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] When Spark reads Doris data as a DataFrame based on time conditions, if the time condition data keeps growing, updating the read DataFrame data will cause inaccurate updates #222

Open
2 of 3 tasks
Kris1314Love opened this issue Jul 30, 2024 · 0 comments

Comments

@Kris1314Love
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Spark Doris Connector 3.1_2.12
Spark 3.4.0
Doris 2.0.0

What's Wrong?

When Spark reads Doris data as a DataFrame based on time conditions, if the time condition data keeps growing, updating the read DataFrame data will cause inaccurate updates

What You Expected?

fix this bug

How to Reproduce?

Doris Spark connector reads Doris data as a DataFrame based on time conditions. At this time, a routineload is writing data to the table where the DataFrame is located with the same time condition. Then, it writes the read DataFrame data to other tables in Doris and updates a column of data in the DataFrame to be written back to the original table in Doris. However, it is found that the number of data written to other tables does not match the number of data updated

Anything Else?

doris- spark-connector按照时间条件读取Doris数据为DataFrame、此时有routineload正在往DataFrame所在表里相同时间条件写数据、然后将读取到的DataFrame数据写入Doris其他表、并更新DataFrame中一列的数据写回Doris原始表里、会发现写入到其他表的数据条数和更新的数据条数不一致

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant