Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS 2024 queries #3731

Open
wants to merge 77 commits into
base: main
Choose a base branch
from
Open

CMS 2024 queries #3731

wants to merge 77 commits into from

Conversation

nrllh
Copy link
Collaborator

@nrllh nrllh commented Aug 14, 2024

Makes progress on #3608

@tunetheweb tunetheweb added the analysis Querying the dataset label Aug 21, 2024
@tunetheweb tunetheweb added this to the 2024 Analysis milestone Aug 21, 2024
@mgifford
Copy link
Contributor

This should be using httparchive.all.pages instead of  httparchive.technologies.2024_06_01_*  

This will give access to secondary resources and also be more future-proof.

@nrllh
Copy link
Collaborator Author

nrllh commented Aug 24, 2024

This should be using httparchive.all.pages instead of httparchive.technologies.2024_06_01_*

This will give access to secondary resources and also be more future-proof.

Thanks! I'm currently adjusting the queries, the results remain very stable and there are almost no changes.

@mgifford
Copy link
Contributor

Is it worth adding ? 

WHERE is_root_page

@@ -0,0 +1,68 @@
#standardSQL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testing this I'm getting "There is no data to display." as an error.

I don't think it was updated correctly. Maybe the dateformat is wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got better results with this:

#standardSQL
# CMS popularity per geo
WITH geo_summary AS (
SELECT
`chrome-ux-report`.experimental.GET_COUNTRY(country_code) AS geo,
IF(device = 'desktop', 'desktop', 'mobile') AS client,
origin,
COUNT(DISTINCT origin) OVER (PARTITION BY country_code, IF(device = 'desktop', 'desktop', 'mobile')) AS total
FROM
`chrome-ux-report.materialized.country_summary`
WHERE
yyyymm = 202406
)


SELECT
client,
geo,
cms,
COUNT(0) AS pages,
ANY_VALUE(total) AS total,
COUNT(DISTINCT url) / ANY_VALUE(total) AS pct
FROM (
-- Step 1: Extract distinct URLs from geo_summary (grouped by country and device).
SELECT DISTINCT
geo,
client,
CONCAT(origin, '/') AS url,
total
FROM
geo_summary
) JOIN (
-- Step 2: Join with the CMS data from httparchive.all.pages for the top CMS per country.
SELECT DISTINCT
client,
technologies.technology AS cms,
page AS url
FROM
`httparchive.all.pages`,
UNNEST(technologies) AS technologies,
UNNEST(technologies.categories) AS categories
WHERE
categories = 'CMS' AND
technologies.technology != '' AND
date = '2024-06-01' AND
is_root_page
) USING (client, url)
GROUP BY
client,
geo,
cms
HAVING
pages > 1000 -- Include only CMSes with more than 1000 pages in a country.
ORDER BY
pages DESC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get this error. Can you confirm it is still an error please?

@tunetheweb tunetheweb changed the title Queries of CMS 2024 (replicated from 2022's version) CMS 2024 queries Oct 11, 2024
Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM but with one suggestion.

Let me know when good to merge

sql/2024/cms/cms_adoption_by_rank.sql Outdated Show resolved Hide resolved
sql/2024/cms/top_cms_by_rank.sql Outdated Show resolved Hide resolved
@kevinfarrugia
Copy link
Contributor

From my end this is ready to be merged. @sirjonathan can you confirm?

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.. Let me know if good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants