Add detailed site-breakage data #71

jgraham · 2022-07-22T15:30:05Z

This is intended to allow us to compute the overall priority of an
issue based on the factors we use in triage, rather than assigning the
impact/severity directly.

It adds a number of fields under 'breakage' that contain details about
the extent and impact of the breakage.

For now plain URLs are still allowed, to avoid having to rewrite all
existing entries at once.

ksy36 · 2022-07-22T16:07:05Z

schemas/entry.schema.json

+                },
+                "resolution": {
+                  "type": "string",
+                  "enum": ["site_changed", "site_fixed"],


Should site_broken be also an option? (noticed it is used in one of the sidebar breakage reports)

I thought site_changed could also cover "this site just doesn't exist anymore" because the use case I had in mind was tracking how often sites patch our issues directly vs how often they persist until the site undergoes larger changes; for the latter "broken" and "redesigned" are basically the same. But maybe there is some other case where we would benefit from tracking the difference?

oops sorry - I thought site_broken is used in resolution, but it's actually one of the impact fields.
I guess my question is more about whether we want to track that site is still "broken" (i.e. issue reproducible) in the resolution field, or such issues will have no resolution field?

I was assuming it would be easier to treat "still broken" as the missing value default rather than an explicit state.

ok, that makes sense :)

denschub · 2022-07-22T16:11:15Z

schemas/entry.schema.json

+                },
+                "resolution": {
+                  "type": "string",
+                  "enum": ["site_changed", "site_fixed"],


This raises the question again of "what do we do with KB-entries about platform issues that have been fixed".

If we, in some future, fix <marquee>. Do we delete marquee.yml? Do we remove all site breakage because we no longer see some? Do we move the entry into a "fixed" folder? imho, having historic data is good, because it allows us to go back in time and check back on things we've succeeded on.

Keeping that in mind, we probably need different "fixed" resolutions, depending on what happened:

fixed_by_intervention

fixed_by_platform_change

fixed_by_site

or something along those lines.

I think if the platform issue is fixed we should either a) have some top-level status field that records that or b) move the entry into a different directory. I think I favour b) because it seems easier to implement and also probably makes tooling easier (you just ignore all the "fixed" issues for most operations by not looking in that directory).

My thinking for not considering "fixed_by_intervention" as a "fixed" status is that the issue isn't really fixed and we already have a field to link to the intervention, if there is one.

that the issue isn't really fixed and we already have a field to link to the intervention, if there is one.

This is true, but that distinction is relevant for our ranking. If we have an issue with critical breakage on critical sites, but we can and do ship interventions for that, the issue should probably be ranked lower than an issue we can't ship interventions for.

Right, but I was imagining the presence of an intervention would cut the computed priority quite a lot. Are there cases where we have an intervention but the issue isn't really "fixed" by the intervention?

I don't think so. If we can't completely fix an issue with an intervention, we usually don't even bother shipping it. So if you want to take the presence of an intervention as an additional signal, not just the state, then yeah, your version works fine. :)

ksy36 · 2022-07-22T16:13:44Z

data/sidebar.yml

+      affects_users: all
+      resolution: site_changed
+    - url: https://github.com/webcompat/web-bugs/issues/55685
+      site:  http://www.susu09.com/forum.php


Wonder if we want to store just the domain name here? Though probably not a big deal as we could parse and extract it to determine site rank on the computing stage :)

Yes, I think for the tranco list the hard part is actually knowing what the eTLD+1 is, since I think that's what it's storing (e.g. we don't want to lookup drive.google.com but just google.com). So I think putting the whole URL is better because a) it's easier to copy and b) it preserves information in case we can do more accurate lookups in the future, without causing any additional difficulty because we still need to parse it and do a public suffix list lookup.

Yes, I think for the tranco list the hard part is actually knowing what the eTLD+1 is, since I think that's what it's storing (e.g. we don't want to lookup drive.google.com but just google.com).

That made me think that we have logic for this on webcompat.com (it extracts url from the issue body, checks first level domain and then less level domain (if no rank is found), gets the ranking from Tranco db or Alexa db) https://github.com/webcompat/webcompat.com/blob/main/webcompat/webhooks/helpers.py#L122. We use it for adding priority labels, but I was planning on changing it to add numerical rank in webcompat/webcompat.com#3707.

Maybe I could add an API as well, which we can call to get the rank when computing the overall impact? We have Tranco data and can fetch more per country data from Alexa (while it's still accessible).

Yeah, an API that takes a URL or a (full) domain and returns the site rank sounds amazing. I was looking at implementing it, but it makes sense that you already have it for webcompat.com and calling an API is easier than writing all the logic :)

Alright, I'll look into it then!

Although, thinking about it, for an API we would presumably want to have rate limits to avoid abuse, and that could make the client code more complex than just downloading the tranco list and matching each part of the domain from the end. So if you don't get to the API part don't worry.

jgraham · 2022-07-22T16:19:54Z

schemas/entry.schema.json

+                  "format": "date",
+                  "description": "Most recent date the issue was successfully reproduced"
+                },
+                "intervention": {


Do we ever have > 1 intervention per site?

For the "same" issue that is; I imagine we could have a site affected by multiple kinds of breakage that get seperate interventions.

While >1 intervention per site is possible in theory (hasn't happened yet), more than one intervention per site per issue is unlikely. (Edit: yeah, what you just wrote. Your reply wasn't there yet when I hit "comment")

I think there is a google maps intervention that has css and js files
https://github.com/mozilla-extensions/webcompat-addon/blob/main/src/data/injections.js#L170

yeah, but that's "the same" intervention, and we should probably link to the definition, instead of the individual files (in that case? in general?).

True, and I guess in this case the js intervention is the more important one.. could either link to it, or the definition.

In general, it seems individual files would be easier to locate, if the intervention is removed (
provided we use canonical urls 🙂), like you mentioned in #70 (comment)

schemas/entry.schema.json

jgraham · 2022-07-26T16:44:10Z

Hmm, I based this on #65 so I guess I should pull out the foundational commits from that PR, put up a review for those, rebase this onto that branch, and then we can land this without also getting review of the features added in the updates draft.

This is intended to allow us to compute the overall priority of an issue based on the factors we use in triage, rather than assigning the impact/severity directly. It adds a number of fields under 'breakage' that contain details about the extent and impact of the breakage. For now plain URLs are still allowed, to avoid having to rewrite all existing entries at once.

jgraham force-pushed the breakage_details branch from 793a7d8 to 77d67a1 Compare July 22, 2022 15:43

ksy36 reviewed Jul 22, 2022

View reviewed changes

denschub reviewed Jul 22, 2022

View reviewed changes

ksy36 reviewed Jul 22, 2022

View reviewed changes

jgraham commented Jul 22, 2022

View reviewed changes

ksy36 mentioned this pull request Jul 22, 2022

Add user_base_impact and detailed breakage for zoom css #69

Merged

janodvarko reviewed Jul 26, 2022

View reviewed changes

schemas/entry.schema.json Outdated Show resolved Hide resolved

jgraham marked this pull request as ready for review July 26, 2022 15:11

ksy36 approved these changes Jul 26, 2022

View reviewed changes

Use miette for fancy error reporting

968002f

jgraham force-pushed the breakage_details branch 2 times, most recently from ace0fb2 to 74dc67f Compare July 27, 2022 10:18

jgraham force-pushed the breakage_details branch from 74dc67f to 4c37d31 Compare July 27, 2022 10:21

jgraham changed the base branch from updates to main July 27, 2022 10:31

jgraham closed this Jul 27, 2022

jgraham reopened this Jul 27, 2022

jgraham merged commit 9236720 into main Jul 27, 2022

jgraham deleted the breakage_details branch July 27, 2022 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add detailed site-breakage data #71

Add detailed site-breakage data #71

jgraham commented Jul 22, 2022

ksy36 Jul 22, 2022

jgraham Jul 22, 2022

ksy36 Jul 22, 2022

jgraham Jul 22, 2022

ksy36 Jul 22, 2022

denschub Jul 22, 2022

jgraham Jul 22, 2022

denschub Jul 22, 2022

jgraham Jul 22, 2022

denschub Jul 22, 2022

ksy36 Jul 22, 2022

jgraham Jul 22, 2022

ksy36 Jul 22, 2022 •

edited

Loading

jgraham Jul 22, 2022

ksy36 Jul 22, 2022

jgraham Jul 22, 2022

jgraham Jul 22, 2022

jgraham Jul 22, 2022

denschub Jul 22, 2022 •

edited

Loading

ksy36 Jul 22, 2022

denschub Jul 22, 2022

ksy36 Jul 22, 2022

jgraham commented Jul 26, 2022

Add detailed site-breakage data #71

Add detailed site-breakage data #71

Conversation

jgraham commented Jul 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ksy36 Jul 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

denschub Jul 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgraham commented Jul 26, 2022

ksy36 Jul 22, 2022 •

edited

Loading

denschub Jul 22, 2022 •

edited

Loading