-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track specific validation errors #260
base: main
Are you sure you want to change the base?
Conversation
As for performance: we could use a bit-mask rather than an array of strings (since the standard only has a handful of unique validation errors). The cost is that we would lose some information, namely:
It would also be more limiting, as we would be unable to store associated information with each error (e.g. cursor position). But, in return, we would be able to store all errors that were encountered in a fixed amount of memory. Another alternative would be for the parser to accept a validation error callback closure. Callers could supply a closure which collects all errors in an array, or which ignores them entirely, and the overhead in the parser should not be worse than any other virtual function call; the parser would just call the user-supplied function saying "hey, this error happened". Importantly, the caller would provide the memory to store errors, if they indeed want to store them. |
569c975
to
0ecf6dd
Compare
I've implemented the closure approach on a branch (https://github.com/karwa/whatwg-url/commits/validation-error-callback) and hooked it up roughly to the live viewer. The constructor for the {
let curArg = arguments[2];
if (curArg !== undefined && typeof curArg === "function") {
args.push(curArg);
}
} But yeah, it illustrates the approach. Since it's a callback function we can also add supplementary information about the error such as the pointer position, without having to worry about the memory overhead of storing all this stuff. |
This is exciting work! I think we only want to expose this capability through the low-level URL Standard API, not the Other things we'd need to do before merging this:
It may also be worth benchmarking the results of a no-op default |
I also like the if (this.onValidationError && !isURLCodePoint(c) && c !== p("%")) {
this.onValidationError("invalid-URL-unit");
} instead of just if (!isURLCodePoint(c) && c !== p("%")) {
this.onValidationError.?("invalid-URL-unit");
} There are a few other instances of that pattern as well that might be worth optimizing out, e.g. if (c === p("%") &&
(!infra.isASCIIHex(this.input[this.pointer + 1]) ||
!infra.isASCIIHex(this.input[this.pointer + 2]))) {
this.onValidationError("invalid-URL-unit");
} could remove collapse the three checks down into one. |
From the table, I've only done the "URL parsing" section, not the IDNA or host parsing sections.
Having this in the reference implementation has already helped me confirm a couple of issues with the validation errors reported by the standard (bugs and PRs to
WHATWG/url
incoming). It can also be useful for other implementations to check the errors they report. Hopefully we can add expected validation errors to the WPT test suite in due course, as well.That said, these results are not exposed to the live viewer yet. I've hacked something together locally, but I'm not sure of the cleanest way to thread that through to the viewer.