-
-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SEO 2024 queries #3791
base: main
Are you sure you want to change the base?
SEO 2024 queries #3791
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM with a couple of small comments.
Let me know when good to merge.
page, | ||
getLoadingPropertyMarkupInfo(JSON_EXTRACT_SCALAR(payload, '$._markup')) AS loading_property_markup_info | ||
FROM | ||
`httparchive.all.pages` TABLESAMPLE SYSTEM (0.01 PERCENT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed. Was the query run on the full dataset?
`httparchive.all.pages` TABLESAMPLE SYSTEM (0.01 PERCENT) | |
`httparchive.all.pages` |
page AS site, | ||
getRobotsSize(payload) AS robots_size | ||
FROM | ||
`httparchive.all.pages` TABLESAMPLE SYSTEM (0.01 PERCENT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed. Was the query run on the full dataset?
`httparchive.all.pages` TABLESAMPLE SYSTEM (0.01 PERCENT) | |
`httparchive.all.pages` |
header.value AS request_header_value, | ||
COUNT(DISTINCT page) AS sites, | ||
SUM(COUNT(DISTINCT page)) OVER (PARTITION BY client, is_root_page) AS total, | ||
SAFE_DIVIDE(COUNT(0), SUM(COUNT(0)) OVER ()) AS pct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this OVER ()
correct?
is_root_page, | ||
REGEXP_CONTAINS(LOWER(IFNULL(request_headers[SAFE_OFFSET(0)].name, '')), r'user-agent') AS resp_vary_user_agent, | ||
COUNT(0) AS freq, | ||
SAFE_DIVIDE(COUNT(0), SUM(COUNT(0)) OVER ()) AS pct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
Makes progress on #3600
This PR adds the finalized SQL files which now include an is_root_page element that differentiates between the homepage and secondary pages. All SQL files utilize the June dataset, as it was the originating dataset used during the construction of these queries.
Context:
These changes were made to finalize the SQL queries for the 2024 SEO analysis. The new is_root_page element improves data separation between homepages and other pages, enhancing the overall analysis accuracy. Additionally, minor updates were applied to the SQL queries from 2022 to align with the new dataset structure. Common Table Expressions (CTEs) were introduced to improve efficiency and query readability.
Changes Made: