Skip to content
This repository has been archived by the owner on Dec 9, 2022. It is now read-only.

Tests for TPCDS queries join ordering #21

Closed
wants to merge 6 commits into from
Closed

Tests for TPCDS queries join ordering #21

wants to merge 6 commits into from

Conversation

kokosing
Copy link
Member

@kokosing kokosing commented Mar 1, 2018

Tests for TPCDS queries join ordering

@kokosing kokosing requested a review from findepi March 1, 2018 12:12
@@ -0,0 +1 @@
Move queries to TPCDS generator library.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't find this file. Better to just reg an issue somewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teradata/tpcds#28

Also I asked on presto slack to copy that project under prestodb community group, see https://prestodb.slack.com/archives/C07JH9WMQ/p1519824551000061

customer_total_return ctr1
, store
, customer
WHERE ("ctr1"."ctr_total_return" > (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need yet another copy of TPCDS queries.

$ find -regex '.*ds.*q01.sql' -exec sha1sum {} +
550d3e6497028853792bc46c8500776e92d9971b  ./presto-tpcds/src/main/resources/tpcds/queries/q01.sql
550d3e6497028853792bc46c8500776e92d9971b  ./presto-product-tests/src/main/resources/sql-tests/testcases/tpcds/q01.sql
0169942554f00ce16abf51e98b1730fa8be8e005  ./presto-benchto-benchmarks/src/main/resources/sql/presto/tpcds/q01.sql

I would rather see some reuse (even hacky) then a copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queries in benchmarks are modified (they have variables like ${catalog}.${schema}).
Product tests have their headers. Here we need vanilla queries.

I prefer to copy files which no-one ever should modify, instead of dragging and/or playing with all the product-tests deps (like cassandra or kafka).

Copy link

@findepi findepi Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queries in benchmarks are modified (they have variables like ${catalog}.${schema}).

you could simply find-replace ${catalog} with tpcds ...

I prefer to copy files which no-one ever should modify,

This is wishful thinking. The queries are being modified from time to time. Casts, grouping, .. And they will be modified in the future (casts and chars).

import static java.util.Objects.requireNonNull;
import static org.testng.Assert.assertEquals;

public class BaseJoinReorderingTest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure. Is

Extract BaseJoinReorderingTest

an extract only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure it is. :)

cross join:
cross join:
cross join:
cross join:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep... this is how we handle this query today, see: prestodb#6944

join (LEFT, REPLICATED):
join (INNER, REPLICATED):
tpcds:customer:sf3000.0
join (INNER, REPLICATED):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these files needed? I thought the idea was that there is no persistent 'expected' value?

Also, include in the cmt msg how the files were generated (technique, what session settings).

Also, name the files .txt to avoid stupid problems like ".join_ordering is an unknown extension" or whatever.


private static File getExpectedJoinOrderingFile(String queryId)
{
return new File(String.format("src/test/resources/join_ordering/%s.join_ordering", queryId));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rnn queryId

static import format

use Path

end file name with e.g. .txt

}
}

@Test(dataProvider = "getTpcdsQueryIds", enabled = false)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, calling this requires temporarily removing enabled = false. What about making this a main method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it has to be copied to both classes ; /

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it has to be copied to both classes ; /

this i don't understand

<dependency>
<groupId>io.airlift</groupId>
<artifactId>bytecode</artifactId>
<scope>test</scope>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, what are these for?

public void test(String queryId)
throws IOException
{
assertEquals(joinOrderString(tpcdsQuery(queryId)), read(getExpectedJoinOrderingFile(queryId)));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice and functional.

However, if something goes wrong (or one wants to run this with a debugger), this is no longer so nice.

public void update(String queryId)
throws IOException
{
Files.write(joinOrderString(tpcdsQuery(queryId)), getExpectedJoinOrderingFile(queryId), UTF_8);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice and functional.

However, if something goes wrong (or one wants to run this with a debugger), this is no longer so nice.

import static java.lang.String.format;
import static java.util.Objects.requireNonNull;
import static org.testng.Assert.assertEquals;

/**
* This class tests cost-based optimization rules related to joins. It contains unmodified TPCH queries.
* This class is using TPCH connector configured in way to mock Hive connector with unpartitioned TPCH tables.
*/
public class TestJoinReordering
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestTpchJoinReordering?

@kokosing
Copy link
Member Author

kokosing commented Mar 8, 2018

@kokosing kokosing closed this Mar 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants