Skip to content

Latest commit

 

History

History
113 lines (73 loc) · 6.58 KB

curl-2022-10-30.md

File metadata and controls

113 lines (73 loc) · 6.58 KB

curl (2022-10-30)

I was invited by Daniel Stenberg to work with him on curl improvements sponsored by the Sovereign Tech Fund, an initiative of the German government to strengthen digital infrastructure and open source in the public interests. Daniel blogged about it.

Via this blog I try to give some updates on my ongoing work in this project, not least for transparency. This is deeply technical gobbledygook.

Goals

The two subprojects I am working on are HTTP/3 and HTTP/2 proxying.

HTTP/3

This has been an experimental feature in curl for some time and the goal is to give it all bells and whistles to have it a standard feature of the library. The main things missing here are:

  • a good, extensive test suite
  • multiplexing support, e.g. parallel transfers over the same QUIC connection
  • 0-RTT support
  • discovery: when/how to use HTTP/3 automatically and fallback to HTTP/1 or 2 on failures

HTTP/2 proxying

Almost everything (counter example: HTTP/3) can be used over a HTTP(S) proxy via CONNECT. But the facilities all rely on the HTTP/1.x way of doing it. HTTP/2 should be made to work here is well, if a proxy offers it.

Assessment

I dove into the internals of libcurl to find out how to achieve these goals and what might stand in their way. What would need to be changed.

Curl has over 1500 test cases and CI jobs that execute those in various combinations on a variety of platforms. That makes me confident that we can change internals with a high chance that any breakage shows up in CI, fixable before a release. Not a 100% chance, of course. Every change has risks.

Curl has an important internal abstraction: protocol handlers. There are different ones for each supported protocol, of which there are over 20. Some are quite similar, e.g. HTTP and HTTPS and share code. These handlers are run over a common State Machine that takes them through the various stages of performing a transfer: CONNECT, PROTOCONNECT, TUNNEL, DO, DONE to name the major ones.

Each handler has callbacks to be invoked in the various states to do the corresponding work. This way, the FTP implementation does not have to know anything about the IMAP or HTTP implementation and vice versa. This is excellent for readability and maintainability. It also makes it easy to switch protocols on/off when building curl.

The most complex handler implementation is HTTP, since it has to deal with three major and very different versions of the protocol: HTTP/1.x, HTTP/2 and HTTP/3. No abstraction inside this handler does exist and code is if/elseing around many cases, or calling http2 functions which then do nothing unless h2 is really in play on a connection.

This is an efficient way when you add HTTP/2 to an existing HTTP/1 implementation, but with now adding HTTP/3, it becomes a bit more branchy than what I would like. Some thought needs to go into a HTTP-internal abstraction that keeps the implementation of each version more separate and standalone, while maintaining the common, core HTTP parts.

Looking at the implementation of proxying and SSL, there are dependencies which makes them more complicated than they need to be. They make perfect sense when you regard the 20+ years of history of curl and HTTP. But now, more combinations of use cases have to be supported, like proxying over HTTP/2. We would like the proxy and SSL implementations to more like lego bricks that libcurl can stick together as needed.

Connection Filters

Connection filters are the new design I am currently working on. The idea is that one sticks them together in a "chain" for a connection and then invokes connect() until they are done (or an error occurred). The caller would no longer have to care about SSL and proxying.

Greatly simplified, this is currently happening:

c = new_connection();
while(!done)
  if (conn_to_host and name_resolved)
     connect_socket(c, hostname);
  if (proxy_tunneling)
     if (tunneling_ssl)
        connect_proxy_ssl(c);
     connect_proxy_tunnel();
  if (conn_uses_ssl)
     connect_main_ssl(c);
     
# do tranfer
# close
if (conn_uses_ssl)
     close_main_ssl(c);
if (proxy_tunneling)
   if (tunneling_ssl)
      close_proxy_ssl(c);
   close_proxy_tunnel();
close_socket(c, hostname);

which will change to:

c = new_connection();
filter_add(c, socket_filter);
if (proxy_tunneling)
   if (tunneling_ssl)
      filter_add(c, SSL_filter);
   filter_add(c, HTTP_tunnel_filter);
if (conn_uses_ssl)
   filter_add(c, SSL_filter);

# now we have:
# c->filter-> SSL -> HTTP_Tunnel -> SSL -> SOCKET

while(!done)
  connect(c);    # call connect on filters, low to high

# do tranfer
# close
close(c);        # call close on filters, high to low

As you see, connection creation takes care about all the combinations, but establishing and closing a connection no longer has to. The same is true for other properties of a connection like is_connected(), recv()/send(), has_data_pending(), etc. This makes code in protocol handlers simpler.

Another thing, if you look closely, you see that there currently are two possible SSL instances: the main and the proxy one. They have either separate functions, or ssl functions have an is_proxy parameter to let them know which instance they should use.

With connection filters, the SSL instance is local to the filter. There can be as many as you have filters installed at a connection. If ever there would come a need for tunneling two proxies, filters could handle such a case.

As for the project goals, we would then add connection filter types for HTTP/3 and HTTP/2 proxying and plug them into connections wherever needed. And the protocol handlers would not have to care.

Filter chains can adapt during the lifetime of a connection. The fallback from HTTP/3 to HTTP/2 could be implemented in a new filter type that, on connect, triggers two sub filters, one for TCP and one for QUIC and takes the first one that succeeds. A similar pattern can be used for HTTP version negotiation via ALPN.

Status

I have a first implementation of connection filters locally that passes some 1400+ tests and fails on 2 (staring at NTLM authentication!). I like to get that to 100% success in the coming days, cleanup now dead code and then get Daniels opinion on the changes. (Note to self: when refactoring, you tend to be very aggressive. Prepare for push back!)

But I am quite optimistic of the overall approach. I believe it will make the implementation of HTTP/2 proxying and other features much easier.

But we'll see.