Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify example queries using Athena engine v3 #23

Open
sebastian-nagel opened this issue Feb 6, 2023 · 0 comments
Open

Verify example queries using Athena engine v3 #23

sebastian-nagel opened this issue Feb 6, 2023 · 0 comments

Comments

@sebastian-nagel
Copy link
Contributor

The example queries below src/sql/examples/cc-index/ were developed using Athena engine v1 or v2. There might be issues when engine v3 (based on Trino instead of PrestoDb) is used. Eg. (num_pages/total_pages_host) >= .05 in one of the queries needs to be updated to (1.0*num_pages/total_pages_host) >= .05 (int/float cast).
Ideally, should run the queries against each of the engines and compare the results to verify that everything works as expected.

sebastian-nagel added a commit that referenced this issue Mar 21, 2023
- WARC storage metrics by MIME type:
  extract common/frequent file suffixes from URL
- site discovery by language: implicit cast to floating point
  number (address #23)
- select robots.txt records for a given list of domains
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant