Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #5509] Add Apache Impala JDBC engine dialect #6104

Conversation

tigrulya-exe
Copy link
Contributor

@tigrulya-exe tigrulya-exe commented Feb 27, 2024

🔍 Description

Issue References 🔗

This pull request fixes #5509

Describe Your Solution 🔧

Added Apache Impala support in the form of the JDBC engine dialect. Slightly modified Kyuubi Hive JDBC driver in order to use it as driver for Impala dialect instead of the original Hive driver.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Related Unit Tests

  • org.apache.kyuubi.engine.jdbc.impala.OperationWithImpalaEngineSuite
  • org.apache.kyuubi.engine.jdbc.impala.SessionSuite
  • org.apache.kyuubi.engine.jdbc.impala.StatementSuite

Related Integration Tests

  • org.apache.kyuubi.it.jdbc.impala.OperationWithServerSuite
  • org.apache.kyuubi.it.jdbc.impala.SessionWithServerSuite
  • org.apache.kyuubi.it.jdbc.impala.StatementWithServerSuite

Checklist 📝

Be nice. Be informative.

@codecov-commenter
Copy link

codecov-commenter commented Feb 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 61.13%. Comparing base (abb782c) to head (32ae6d8).

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #6104      +/-   ##
============================================
+ Coverage     61.06%   61.13%   +0.07%     
  Complexity       23       23              
============================================
  Files           623      623              
  Lines         37202    37203       +1     
  Branches       5039     5039              
============================================
+ Hits          22716    22745      +29     
+ Misses        12023    12009      -14     
+ Partials       2463     2449      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@yaooqinn yaooqinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dev/dependencyList Outdated Show resolved Hide resolved
// getMoreResults() javadoc says, that it should implicitly close
// any current ResultSet object, so here in case of Statement.CLOSE_CURRENT_RESULT
// we can implement the same logic
// see also https://issues.apache.org/jira/browse/HIVE-7680
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a bug fix, and is worth making this change in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR: #6115

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, which errors will be happen without this change?

Copy link
Member

@pan3793 pan3793 Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, ResultSetWrapper calls statement.getMoreResults(Statement.CLOSE_CURRENT_RESULT)

# under the License.

# modified compose file from Impala repo, see
# https://github.com/apache/impala/blob/master/docker/quickstart.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we refers a latest release tag here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added link to the release 4.3.0

.getCodeSourceLocation(getClass).split("integration-tests").head

private val hiveJdbcConnectorPath: String = {
val keyword = "kyuubi-hive-jdbc-shaded"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CI, we explicitly set -Pjdbc-shaded to use the kyuubi-hive-jdbc-shaded for testing, let's make the test also works without -Pjdbc-shaded(kyuubi-hive-jdbc is used in such a case)

Suggested change
val keyword = "kyuubi-hive-jdbc-shaded"
val keyword = "kyuubi-hive-jdbc"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I used ${hive.jdbc.artifact} in integration tests, but in case of non-shaded driver artifact, test failed with ClassNotFoundException of driver dependencies (for instance, org.apache.http.client.HttpClient). So, in my opinion, here we should use self-contained driver fat jar with all needed dependencies, which is what a shaded artifact is.

Copy link
Contributor Author

@tigrulya-exe tigrulya-exe Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, impala integration tests will pass without -Pjdbc-shaded if the updated Kyuubi JDBC driver with fixed getMoreResults() method will be available in snapshot maven repo, so I'll create separate PR with that fix as you've suggested in #6104 (comment)

Comment on lines +55 to +56
- impala-data:/user/hive/warehouse
- ./impala_conf:/opt/impala/conf:ro
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please use the same path style? e.g.

Suggested change
- impala-data:/user/hive/warehouse
- ./impala_conf:/opt/impala/conf:ro
- impala-data:/user/hive/warehouse
- impala_conf:/opt/impala/conf:ro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one is named volume, controlled by docker, and second is directory path located relatively to the compose file. If we replace it with impala_conf: docker will fail finding named volume called impala_conf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for explanation

protected def hiveServerJdbcUrl: String = withContainers { container =>
val feHost: String = container.getServiceHost(IMPALAD_SERVICE_NAME, IMPALAD_PORT)
val fePort: Int = container.getServicePort(IMPALAD_SERVICE_NAME, IMPALAD_PORT)
s"jdbc:hive2://$feHost:$fePort/;auth=noSasl"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"kyuubi" is encouraged over "hive2"

Suggested change
s"jdbc:hive2://$feHost:$fePort/;auth=noSasl"
s"jdbc:kyuubi://$feHost:$fePort/;auth=noSasl"

tableName: String,
tableTypes: util.List[String]): String = {
if (isPattern(schema)) {
throw KyuubiSQLException.featureNotSupported("Pattern-like schema names not supported")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to add a nagetive test case for it? someday we may want to implement it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests for such cases to org.apache.kyuubi.engine.jdbc.impala.DialectSuite

@@ -133,6 +133,13 @@
<overWrite>true</overWrite>
<outputDirectory>${project.build.directory}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.apache.kyuubi</groupId>
<artifactId>kyuubi-hive-jdbc-shaded</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make it respect -Pjdbc-shaded

Suggested change
<artifactId>kyuubi-hive-jdbc-shaded</artifactId>
<artifactId>${hive.jdbc.artifact}</artifactId>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in #6104 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the existing approach does not handle transitive deps properly, leave it then

"-redirect_stdout_stderr=false", "-logtostderr",
"-mt_dop_auto_fallback=true",
"-default_query_options=mt_dop=4,default_file_format=parquet,default_transactional_type=insert_only",
"-mem_limit=4gb"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the single GitHub Actions runner has 7GiB memory limit, please reduce the memory consumption as much as possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decreased to 1500mb. Test cases fail at lower values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for experimenting

Comment on lines +103 to +107
case Types.FLOAT =>
floatToTTypeId

case Types.BOOLEAN =>
booleanToTTypeId
Copy link
Member

@pan3793 pan3793 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like affects(or not?) all dialects, if so, change it in a seperate PR first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, other dialects don't use this types or use another JDBC types for the same types from db (for instance,Types.BIT for db boolean), so I suggest to left these changes in current PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@pan3793
Copy link
Member

pan3793 commented Feb 28, 2024

@tigrulya-exe thanks for your excellent works!

@pan3793
Copy link
Member

pan3793 commented Feb 28, 2024

Also cc @wenzhenghu @adios666 @qian0817

@pan3793 pan3793 added this to the v1.9.0 milestone Feb 28, 2024
@tigrulya-exe tigrulya-exe force-pushed the feature/5509-support-impala-jdbc-dialect branch 2 times, most recently from 856338b to 43872e1 Compare February 28, 2024 16:15
@tigrulya-exe tigrulya-exe force-pushed the feature/5509-support-impala-jdbc-dialect branch from 43872e1 to 9852125 Compare February 29, 2024 10:21
Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nits, overall LGTM

version: "3.5"
services:
metastore:
image: apache/impala:4.0.0-impala_quickstart_hms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible to use the latest 4.3? or are there good reasons to stick to 4.0?

Copy link
Contributor Author

@tigrulya-exe tigrulya-exe Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no official impala quickstart images for version 4.3 in Dockerhub, images from compose file are the latest ones. I also tried to use other user-built images, but they were either too old or too large, so I thought that 4.0 would be suitable for tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good.

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge once CI passed

@pan3793 pan3793 closed this in 9f53a09 Feb 29, 2024
zhaohehuhu pushed a commit to zhaohehuhu/incubator-kyuubi that referenced this pull request Mar 21, 2024
# 🔍 Description
## Issue References 🔗

This pull request fixes apache#5509

## Describe Your Solution 🔧

Added [Apache Impala](https://impala.apache.org) support in the form of the JDBC engine dialect. Slightly modified Kyuubi Hive JDBC driver in order to use it as driver for Impala dialect instead of the original Hive driver.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Related Unit Tests
- `org.apache.kyuubi.engine.jdbc.impala.OperationWithImpalaEngineSuite`
- `org.apache.kyuubi.engine.jdbc.impala.SessionSuite`
- `org.apache.kyuubi.engine.jdbc.impala.StatementSuite`

#### Related Integration Tests
- `org.apache.kyuubi.it.jdbc.impala.OperationWithServerSuite`
- `org.apache.kyuubi.it.jdbc.impala.SessionWithServerSuite`
- `org.apache.kyuubi.it.jdbc.impala.StatementWithServerSuite`
---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes apache#6104 from tigrulya-exe/feature/5509-support-impala-jdbc-dialect.

Closes apache#5509

32ae6d8 [Tigran Manasyan] Codestyle fixes
9852125 [Tigran Manasyan] fix review comments
ecb0d7d [Tigran Manasyan] copy impala compose file to integration tests resources
5ea3474 [Tigran Manasyan] fix order in services file
2c63a70 [Tigran Manasyan] Add Apache Impala JDBC engine dialect

Authored-by: Tigran Manasyan <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SQL query service for Impala component
4 participants