Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tech Report: Technologies total origins from crawl #47

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
73 changes: 64 additions & 9 deletions definitions/output/reports/cwv_tech_technologies.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,74 @@ publish('cwv_tech_technologies', {
tags: ['crux_ready']
}).query(ctx => `
/* {"dataform_trigger": "report_cwv_tech_complete", "name": "technologies", "type": "dict"} */
WITH pages AS (
SELECT DISTINCT
client,
root_page,
tech.technology
FROM ${ctx.ref('crawl', 'pages')},
UNNEST(technologies) AS tech
WHERE
date = '${pastMonth}'
${constants.devRankFilter}
),

tech_origins AS (
SELECT
client,
technology,
COUNT(DISTINCT root_page) AS origins
FROM pages
GROUP BY
client,
technology
),

technologies AS (
SELECT
name AS technology,
description,
ARRAY_TO_STRING(categories, ', ') AS category,
max-ostapenko marked this conversation as resolved.
Show resolved Hide resolved
categories AS category_obj,
NULL AS similar_technologies
FROM ${ctx.ref('wappalyzer', 'apps')}
),

total_pages AS (
SELECT
client,
COUNT(DISTINCT root_page) AS origins
FROM pages
GROUP BY client
)

SELECT
client,
app AS technology,
technology,
description,
category,
SPLIT(category, ",") AS category_obj,
category_obj,
similar_technologies,
origins
FROM tech_origins
INNER JOIN technologies
USING(technology)

UNION ALL

SELECT
client,
'ALL' AS technology,
NULL AS description,
rviscomi marked this conversation as resolved.
Show resolved Hide resolved
ARRAY_TO_STRING(categories, ', ') AS category,
categories AS category_obj,
NULL AS similar_technologies,
origins
FROM ${ctx.ref('core_web_vitals', 'technologies')}
LEFT JOIN ${ctx.ref('wappalyzer', 'apps')}
ON app = name
WHERE date = '${pastMonth}' AND
geo = 'ALL' AND
rank = 'ALL'
ORDER BY origins DESC
FROM total_pages
CROSS JOIN (
SELECT
ARRAY_AGG(DISTINCT category IGNORE NULLS ORDER BY category) AS categories
FROM technologies,
UNNEST(category_obj) AS category
)
`)
Loading