Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executeQuery - column type inference #20

Open
jbguerraz opened this issue Dec 4, 2020 · 8 comments
Open

executeQuery - column type inference #20

jbguerraz opened this issue Dec 4, 2020 · 8 comments
Assignees
Labels

Comments

@jbguerraz
Copy link
Contributor

Rework the column type detect method. Eventually query Druid for columns data types.

@jbguerraz jbguerraz changed the title Backend - Execute Query - Column type inference Backend - executeQuery - Column type inference Dec 4, 2020
@jbguerraz jbguerraz changed the title Backend - executeQuery - Column type inference executeQuery - Column type inference Dec 4, 2020
@jbguerraz jbguerraz changed the title executeQuery - Column type inference executeQuery - column type inference Dec 4, 2020
@hardikbajaj
Copy link

hardikbajaj commented Mar 23, 2023

@jbguerraz Can you please assign this issue to me

@jbguerraz
Copy link
Contributor Author

WIth pleasure @hardikbajaj :) Here to help as much as needed

@hardikbajaj
Copy link

Hey @jbguerraz , I'm thinking of inferring column data type from druid itself.
Through Druid sql API, I can request for typesHeader and sqlTypesHeader (For Ref)

With the help of sqlTypeHeader, I'm thinking of inferring data type as per following algorithm

switch sqlDataType {
case "TIMESTAMP" , "DATE":
	frame.DataType = []time.Time
case "CHAR" , "VARCHAR":
	frame.DataType = []string
case "BOOLEAN":
	frame.DataType = []bool
default:
	frame.DataType = []float64
}

I'm not sure about Complex data type though.
Also, all type of Numbers would be handled as float64, Grafana will visualise them correctly
I would like to have your inputs on this, and should I go ahead?

@jbguerraz
Copy link
Contributor Author

jbguerraz commented Mar 28, 2023

Hello @hardikbajaj and thank you!! that enhancement rocks :)

Druid Web UI itselfs uses such query to display columns info in datasource details pane:

SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS
          WHERE TABLE_SCHEMA = 'druid' AND TABLE_NAME = '{{ datasourceName }}'

also, as per https://druid.apache.org/docs/latest/querying/segmentmetadataquery.html :

{
  "queryType":"segmentMetadata",
  "dataSource":"{{ datasourceName }}",
  "merge": true,
}

Interesting bits:

All columns contain a typeSignature that Druid uses to represent the column type information internally. The typeSignature is typically the same value used to identify the JSON type information at query or ingest time. One of: STRING, FLOAT, DOUBLE, LONG, or COMPLEX<typeName>, e.g. COMPLEX<hyperUnique>.

That would probably makes it better to use it to map that way:

STRING:string
FLOAT:float32
DOUBLE:float64
LONG: int64
COMPLEX<typeName>: Ignore for now

Additionally, JSON queries are a bit faster than SQL ones as far as I've seen in past (probably less parsing time).

Probably great to cache those datasources definitions for a period of time at least (like let's say 30 minutes by default ? could later be changed at datasource settings level ?) to avoid querying Druid for each request ? That would also be usable later on for providing autocomplete informations.

WDYT?

@m-ghazanfar
Copy link
Contributor

@hardikbajaj Thanks for working on this.
Minor comment: I think that the default case should return string type. This is because most data can be represented as a string. In case we've missed a case here or if there's a new type in the future string will continue to work. Converting to float might not always work and will crash the plugin.

@hardikbajaj
Copy link

hardikbajaj commented Mar 28, 2023

Hey @jbguerraz ! Thanks for your feedback!
This looks great but If I'm not wrong, the SegmentMetaData is also a native query type and It just provide sort of a Schema for existing data types, I think It would be difficult to assert data type for groupBy, scan, virtual columns and other aggregation queries, for which columns are specified by user.
I think this is a great solution that would work completely on both SQL and Native queries, but should we first implement the frontend with this and provide autocomplete suggestion, and maybe find a way to assert all datatypes from frontend? Just a thought.

For now I think, the sql api provides us a complete solution, where we can get the data type in Headers from Druid response only.
I am thinking of implementing column assertion for SQL queries from Druid, as It would be a great enhancement for inferring columns, and would be using SQL Types for the column, as they are giving us a more detailed knowledge on data type like TIMESTAMP and Float or integers.

@Ghazanfar-CFLT I'll make sure to refine the number types, and make default type as String

@hardikbajaj
Copy link

@jbguerraz , This PR change would be required for making SQL query type asserting column type completely from druid.
Can you please release the latest druid-go Client?

@hardikbajaj
Copy link

hardikbajaj commented Apr 9, 2023

Hey @jbguerraz , in case you missed it, can we release latest go-druid client to be used by plugin?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants