Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] DataSource data source caching problem #16845

Open
3 tasks done
jack-wqing opened this issue Nov 26, 2024 · 5 comments
Open
3 tasks done

[Bug] DataSource data source caching problem #16845

jack-wqing opened this issue Nov 26, 2024 · 5 comments
Labels
bug Something isn't working help wanted Extra attention is needed ready-to-close

Comments

@jack-wqing
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

类: org.apache.dolphinscheduler.plugin.datasource.api.plugin.DataSourceClientProvider
dataSouce本地缓存: private static final Cache<String, DataSourceClient> uniqueId2dataSourceClientCache = CacheBuilder.newBuilder()
.expireAfterWrite(duration, TimeUnit.HOURS)
.removalListener((RemovalListener<String, DataSourceClient>) notification -> {
try (DataSourceClient closedClient = notification.getValue()) {
logger.info("Datasource: {} is removed from cache due to expire", notification.getKey());
}
})
.maximumSize(100)
.build();

What you expected to happen

expireAfterWrite: 指定的时间到之后,不管怎样都是强者关闭DataSource
更好的方式可以使用 expireAfterAccess 替代,保存使用的过程中尽量不被关闭

How to reproduce

比如Hive场景,hql执行时间一般较长,使用的过程DataSource比关闭;导致hive thrift地址的socket被关闭,但是内部状态判断还在继续
案例:
sql task error and appId:[]
java.sql.SQLException: org.apache.thrift.transport.TTransportException: SASL authentication not complete
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at org.apache.hive.jdbc.HiveStatement.executeUpdate(HiveStatement.java:490)
at org.apache.hive.jdbc.HivePreparedStatement.executeUpdate(HivePreparedStatement.java:122)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.executeUpdate(SqlTask.java:323)
at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.executeFuncAndSql(SqlTask.java:220)
at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.handle(SqlTask.java:167)
at

Anything else

No response

Version

3.1.x

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@jack-wqing jack-wqing added bug Something isn't working Waiting for reply Waiting for reply labels Nov 26, 2024
@github-actions github-actions bot changed the title DataSource数据源缓存问题 DataSource data source caching problem Nov 26, 2024
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

Class: org.apache.dolphinscheduler.plugin.datasource.api.plugin.DataSourceClientProvider
dataSouce local cache: private static final Cache<String, DataSourceClient> uniqueId2dataSourceClientCache = CacheBuilder.newBuilder()
.expireAfterWrite(duration, TimeUnit.HOURS)
.removalListener((RemovalListener<String, DataSourceClient>) notification -> {
try (DataSourceClient closedClient = notification.getValue()) {
logger.info("Datasource: {} is removed from cache due to expire", notification.getKey());
}
})
.maximumSize(100)
.build();

What you expected to happen

expireAfterWrite: After the specified time is up, the strong one will close the DataSource no matter what.
A better way can be to use expireAfterAccess instead, and try not to be closed during saving and use.

How to reproduce

For example, in the Hive scenario, hql execution time is generally long, and the process DataSource used is closed; causing the socket of the hive thrift address to be closed, but the internal status judgment continues.
Case:
sql task error and appId:[]
java.sql.SQLException: org.apache.thrift.transport.TTransportException: SASL authentication not complete
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:399)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at org.apache.hive.jdbc.HiveStatement.executeUpdate(HiveStatement.java:490)
at org.apache.hive.jdbc.HivePreparedStatement.executeUpdate(HivePreparedStatement.java:122)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.executeUpdate(SqlTask.java:323)
at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.executeFuncAndSql(SqlTask.java:220)
at org.apache.dolphinscheduler.plugin.task.sql.SqlTask.handle(SqlTask.java:167)
at

Anything else

No response

Version

3.1.x

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@SbloodyS
Copy link
Member

Please using hive task type instead of sql task type to execute hive sql.

@SbloodyS SbloodyS added help wanted Extra attention is needed and removed Waiting for reply Waiting for reply labels Nov 26, 2024
@SbloodyS SbloodyS changed the title DataSource data source caching problem [Bug] DataSource data source caching problem Nov 27, 2024
@ruanwenjun
Copy link
Member

Please use AdHocDataSourceClient, you can find the related code in dev.

@jack-wqing
Copy link
Author

Please using hive task type instead of sql task type to execute hive sql.

所有的sql任务,使用该provider的强制关闭都是有问题;不光是hive sql

@jack-wqing
Copy link
Author

AdHocDataSourceClient

其实sql任务数据源强制关闭,都会影响已经执行的sql任务;我的是3.1.x得版本;我看master/dev还存在我指出的代码位置;个人建议还是可以修改缓存的过期方式

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed ready-to-close
Projects
None yet
Development

No branches or pull requests

3 participants