For compressed responses, omit `Content-Length` and `Content-Encoding` headers #23

Skn0tt · 2023-12-11T16:13:34Z

The fetch API is becoming the standard HTTP client for server usage. When proxying back a fetched resource to the user-agent, here's what i'd intuitively do:

async function myEndpoint(request: Request): Promise<Response> {
   return await fetch("https://www.gatsbyjs.com/Gatsby-Logo.svg");
}

The URL in this example responds with a gzip-compressed SVG. As specced in 16.1.1.1 of 15.6 HTTP - network fetch, the body stream contains an uncompressed version of the SVG, making compression transparent to the user. This works well for consuming the SVG.

However, the spec does not require headers to mirror this decompression. Although the Response has uncompressed body, the Content-Length headers shows it compressed length and the Content-Encoding header pretends it's compressed. In nodejs/undici#2514 (the issue I originally opened), I've included a repro of what this means if the Response is proxied to the user-agent:

The body ends up being longer than the headers described. This is not allowed in the HTTP spec, and leads clients like cURL to warn.
The user-agent will try decompress a body that isn't compressed

The current workaround is to manually alter headers on responses from fetch:

async function myEndpoint(request: Request): Promise<Response> {
   let resp = await fetch("https://www.gatsbyjs.com/Gatsby-Logo.svg");
   if (resp.headers.get("content-encoding") {
     const headers = new Headers(resp.headers)
     headers.delete("content-encoding")
     headers.delete("content-length")
     resp = new Response(body, { ...resp, headers })
   }
   return resp
}

This is cumbersome and should live in library code. It can't live in frameworks, because it's impossible to differentiate a response produced by fetch (content-encoding header, but uncompressed body) from a compressed Response created in user code (same content-encoding header, but compressed body).

Instead of this workaround, fetch should require the content-length and content-encoding headers to be deleted when response body is decompressed.

Some implementors already do this, including Deno and workerd. Netlify might implement the same for Netlify Functions 2.0.

I've checked undici (Node.js), node-fetch, Chrome and Safari - all of them expose the behaviour explained above, where the headers don't match the body.

An alternative solution for this would be whatwg#1524 - that way, frameworks can use the decompressed body field to tell compressed responses from uncompressed ones.

Summary:

What's the problem? compressed responses having content-encoding header, but decompressed body leads to incoherent Response
What's the usecase? mostly reverse proxying
Is it relevant to upstream? Yes, potentially because of service workers.
What's my suggestion for a fix? Upon decompression, content-length and content-encoding headers should be deleted.
How do engines deal with this? Most have the bug, but Deno and workerd implemented the fix I propose.

The text was updated successfully, but these errors were encountered:

mnot · 2023-12-20T01:55:46Z

Removing the headers automatically loses information, and changing behaviour from fetch silently might be a footgun for some.

Exposing the raw content is absolutely necessary for non-browser use cases. Maybe in addition to that, have a flag that when true makes the body coherent (by removing these headers)?

Skn0tt · 2023-12-20T14:55:07Z

If we're looking to stay as in-line with upstream as possible, then adding a new field for the raw content is probably the best as it keeps all other fields the same as upstream.

A flag would still require users to think about compression, because they need to remember enabling it. That would complicate things further.

jimmywarting · 2023-12-21T07:24:52Z

I don't think removing the headers is such a good idea.
I would still like to be able to forward those headers as well.

But it would also require to being able to access the raw stream

lucacasonato · 2023-12-21T18:23:14Z

@mnot and @jimmywarting: I think there are two related but separate use cases that are being mingled here:

Exposing a decompressed stream to the user
Exposing a still-compressed stream to the user

What @Skn0tt accurately points out, is that what we are doing in Fetch upstream right now, is 1) but very poorly. We decompress the stream, but then do not sanitize the rest of the response to reflect what we've done. Nor do we provide any indication to the user that we have performed decompression. This is actually very bad in the scenario where a new Content-Encoding is added that a browser may not yet understand, so a user implements manual decompression logic, and then the browser starts implementing that encoding algorithm natively. In this case, a browser adding a new native compression algorithm could break the web. Very very bad.

What I think you two are saying, is that in scenario 2) it would be detrimental to not expose Content-Length and Content-Encoding because that breaks the users' ability to decompress themselves. I agree with this strongly - however that is not the behaviour in question here, because we already do not expose a way to tell an implementation to not decompress. This is something we can discuss in a separate issue maybe.

I think that this issue (properly sanitizing the headers after automatic decompression) is a bug worth adressing upstream in Fetch. I've opened an issue here: whatwg#1729

mnot · 2023-12-22T05:30:11Z

I think that this issue (properly sanitizing the headers after automatic decompression) is a bug worth adressing upstream in Fetch. I've opened an issue here: whatwg#1729

Understand that they're likely to be reluctant to make any behavioural change here if it breaks existing code (which is very likely, somewhere).

Fetch is a browser-focused API that made a reasonable assumption that their users had no need for compressed content. That assumption doesn't hold for intermediaries, which often need to handle the 'raw' stream.

Adding support for access to the 'raw' stream is pretty straightforward; it's "correcting" the current behaviour of Fetch in a backwards-compatible way that's tricker. That's why I suggested a flag -- acknowledging this isn't a great solution, but it is backwards-compatible.

Skn0tt · 2023-12-22T11:09:26Z

Does it need to be a flag? What if, instead of a flag, rawBody field would be added in addition to body? That would be backwards-compatible, and then frameworks could check for the existence of that and use it when proxying a response. Same for users decompressing themselves.

This would mean two streams to access a shared underlying stream - not sure what performance implications that has for implementors. From what i've seen in the undici implementation, it would be fine as long as reading from the streams is mutually exclusive. From my (cursory) understanding of Deno's implementation, this also sounds doable.

mnot · 2023-12-22T23:26:32Z

Adding rawBody (or similar) would allow access to the compressed / pre-processing content of the response (there are other cases where it might differ), but it doesn't address that when you access the body, the headers you get are a mismatch. Effectively, the headers exposed now are rawHeaders and you want something like processedHeaders -- i.e., headers suitable for what's exposed in body, right?

Skn0tt · 2024-01-02T08:58:08Z

In a world of perfectly-coherent naming, yes headers would need to be named rawHeaders instead. I don't think that's necessary, though - once there's rawBody, frameworks like Express can use that to correctly handle compressed Responses. I'm happy as long as we can hide this complexity from most developers. It's OK to have frameworks deal with it.

lucacasonato mentioned this issue Dec 21, 2023

Automatic decompression should sanitize Content-Encoding and Content-Length headers from the response whatwg/fetch#1729

Open

jacob-ebey mentioned this issue Feb 14, 2024

fix: Remove content-encoding header from already decompressed responses remix-run/web-std-io#54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For compressed responses, omit `Content-Length` and `Content-Encoding` headers #23

For compressed responses, omit `Content-Length` and `Content-Encoding` headers #23

Skn0tt commented Dec 11, 2023 •

edited

Loading

mnot commented Dec 20, 2023

Skn0tt commented Dec 20, 2023

jimmywarting commented Dec 21, 2023

lucacasonato commented Dec 21, 2023 •

edited

Loading

mnot commented Dec 22, 2023

Skn0tt commented Dec 22, 2023 •

edited

Loading

mnot commented Dec 22, 2023

Skn0tt commented Jan 2, 2024

For compressed responses, omit Content-Length and Content-Encoding headers #23

For compressed responses, omit Content-Length and Content-Encoding headers #23

Comments

Skn0tt commented Dec 11, 2023 • edited Loading

mnot commented Dec 20, 2023

Skn0tt commented Dec 20, 2023

jimmywarting commented Dec 21, 2023

lucacasonato commented Dec 21, 2023 • edited Loading

mnot commented Dec 22, 2023

Skn0tt commented Dec 22, 2023 • edited Loading

mnot commented Dec 22, 2023

Skn0tt commented Jan 2, 2024

For compressed responses, omit `Content-Length` and `Content-Encoding` headers #23

For compressed responses, omit `Content-Length` and `Content-Encoding` headers #23

Skn0tt commented Dec 11, 2023 •

edited

Loading

lucacasonato commented Dec 21, 2023 •

edited

Loading

Skn0tt commented Dec 22, 2023 •

edited

Loading