Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit SERVER_NAME's impact on routing and external URL generation #5553

Closed
rsyring opened this issue Aug 15, 2024 · 12 comments · Fixed by #5634
Closed

Revisit SERVER_NAME's impact on routing and external URL generation #5553

rsyring opened this issue Aug 15, 2024 · 12 comments · Fixed by #5634
Assignees
Milestone

Comments

@rsyring
Copy link
Contributor

rsyring commented Aug 15, 2024

#998 was a discussion regarding the impact the SERVER_NAME setting has on an application. IMO, there was consensus in that issue, including by @mitsuhiko, that SERVER_NAME being used both in routing and in URL generation was problematic.

That issue was closed by #2635, which made a change regarding subdomain matching. However, IMO, it did not address the fundamental problem of SERVER_NAME impacting two disparate mechanisms in Flask.

I found two additional issues since #998 that have attempted to point out the problem with SERVER_NAME: #2813, #3465

In #2813, a minimal, complete, and verifiable example was requested and I have therefore prepared an example gist that demonstrates the problem.

The gist's third "currently impossible" test demonstrates that it's currently impossible to:

  • Set SERVER_NAME so external URL generation works outside a request context with only an app context; and
  • Have the app respond properly to multiple host names.

My particular use case currently is that I need SERVER_NAME set so external URLs will be created accurately in distributed tasks where no web request is present. Additionally, I need the app to respond to multiple host names 1) the canonical URL the public uses; and 2) the IP address based URL our load balancer uses for health checks.

IMO, SERVER_NAME should be deprecated and two new settings should be created. One which affects external URL generation and a second that affects routing resolution.

The setting for the URL generation could be CANONICAL_URL and, because its a URL, it could eliminate the need for url_for()'s _scheme arg.

Thanks for your consideration.

  • Python version: 3.12
  • Flask version: 3.0.3
@davidism
Copy link
Member

We should expose Werkzeug's trusted host checking, with a new ALLOWED_HOSTS = [] config to match Django's. This is similar to what we recently did for the dev server debugger.

I don't think we need to deprecate SERVER_NAME. If it's set, it's added to ALLOWED_HOSTS, otherwise behavior stays the same.

it could eliminate the need for url_for()'s _scheme arg.

This is already covered by PREFERRED_URL_SCHEME. If we were to deprecate SERVER_NAME, I'd rename it to PREFERRED_URL_HOST or something, but I think it's fine to leave it.

@rsyring
Copy link
Contributor Author

rsyring commented Aug 15, 2024

David, thanks for chiming in.

I think SERVER_NAME still remains problematic:

  1. SERVER_NAME still affects two disparate parts of the system. If all I want to do is configure the external URL used by url_for(), I can't do that without affecting routing or host checking. I should be able to set them separately. This becomes a practical problem in the case of...
  2. What if I don't know all of the hosts that I want to allow? Maybe I'm serving up marketing content on multiple domains and non-technical people are creating names for SSO reasons. Maybe, as is my case, the load balancer is going to use different IPs every time I deploy the app (aws copilot/cloud formation). Or, if I have multiple processes/containers running, there will be multiple IPs that need to be valid but I don't know what they are.

Django doesn't seem to handle the unkown IP case natively: SO discussion and https://pypi.org/project/django-allow-cidr/

I don't know much about how Flask handles subodmains but noting it here in case it should impact the discussion.

Thanks.

@davidism
Copy link
Member

davidism commented Nov 2, 2024

I'm hesitant to deprecate/rename SERVER_NAME. Adjusting what I suggested above, if SERVER_NAME is set and ALLOWED_HOSTS is empty, then behavior remains the same by setting ALLOWED_HOSTS = [SERVER_NAME] automatically. But if ALLOWED_HOSTS is set, SERVER_NAME is not automatically added. So you can get the new behavior by setting both config keys, but nothing changes if you keep your config the same. Does that sound reasonable?

@rsyring
Copy link
Contributor Author

rsyring commented Nov 4, 2024

That would at least facilitate my objective of setting the external URL without affecting ALLOWED_HOSTS as, presumably, I can use ALLOWED_HOSTS = ['*'].

But I'd point out that someone who just wants to set SERVER_NAME for the external URL is also getting a completely separate functionality applied, by default. In a way that is implicit rather than explicit. They then have to go explicitly "undo" that implicit application of allowed hosts. At the very least, the documentation should be really clear about how SERVER_URL impacts ALLOWED_HOSTS and how to undo that if desired.

PREFERRED_URL_SCHEME also seems problematic TBH. Unless I'm misunderstanding, it's a workaround for SERVER_NAME doing double-duty. If we had two separate settings, then one should just be EXTERNAL_HOST_URL (or whatever you want to name it) and then you don't need PREFERRED_URL_SCHEME?

Creating two separate configs, one for each purpose, still makes the most sense to me. There is already consensus in #998 that the one setting is problematic. Why not fix the root cause?

Regardless, I'd be happy for any way to fix this in the Flask config as the workaround I use is ugly.

Thanks. :)

@davidism
Copy link
Member

davidism commented Nov 4, 2024

Regarding a combined single config key:

We have three configs that affect routing and URL generation.

  • PREFERRED_URL_SCHEME defaults to http.
  • SERVER_NAME is unset.
  • APPLICATION_ROOT defaults to /.

All three are needed to generate URLs outside a request. APPLICATION_ROOT is needed during requests as well, to know what part of the path is the prefix from the server and what part is handled by the app.

We could have a DEFAULT_BASE_URL like https://example.com:56374/root. The problem is APPLICATION_ROOT, for WSGI there's no way to know when app setup is finished and when requests handling will start, so there's no way to say "now we know all config has been loaded, set APPLICATION_ROOT from DEFAULT_BASE_URL if needed, then pass the app on to start serving". We would want to split it during config though, otherwise we'd need to call urlsplit(DEFAULT_BASE_URL).path for every request, which would be expensive. The other parts only need to be split when generating URLs outside a request, so that expense isn't important then.

@rsyring
Copy link
Contributor Author

rsyring commented Nov 5, 2024

To recap, with the ALLOWED_HOSTS proposal, here are the configuration items that are present in this discussion:

  • PREFERRED_URL_SCHEME
    • Only used for external URL generation
  • SERVER_NAME
    • For external URL generation
    • For sub-domain route matching
    • Sets a default for ALLOWED_HOSTS
  • APPLICATION_ROOT
  • ALLOWED_HOSTS
    • Presumably defaults to ['*']
    • Might be set implicitly by SERVER_NAME

As a thought experiment, if starting from scratch, it seems more intuitive to me if this was broken out as:

  • Note: proposed config names are only representative
  • HOST_URL: it's purpose is to impact request routing, although reasonable defaults can be inferred for non-routing functionality
    • Defaults to / which would keep the current Flask behavior for a non-configured application
    • The path portion (/ or /foo) is used for what is now APPLICATION_ROOT
    • If more than a path, then it must be a full URL, https://www.foo.com/bar and impacts:
      • Sub-domain route matching presuming that, in the example above, www.foo.name is the base if a subdomain is given by a route or blueprint. In most typical cases, the app would likely use https://foo.name/ if using subdomains.
      • Sets a default for HOST_URL_EXTERNAL
      • Maybe sets a default for ALLOWED_HOSTS? IMO, this is determined by whether or not we want to encourage it being used for security reasons.
  • HOST_URL_EXTERNAL (or maybe APPLICATION_URL): it's only purpose is external URL generation
    • Defaults to None, must be a full URL, https://www.baz.com/bar
    • Can be set when HOST_URL isn't to configure only how external URLs are generated and not impact request routing
    • Can also be set if, for some reason, the default from HOST_URL is not accurate. As in the case that started this issue for me where the internal host name is different from the external URL.
  • HOSTS_ALLOWED: same as ALLOWED_HOSTS but, by naming, keeps these three configs named similar since they are related(ish)
    • Django requires this value to be explicitly set, presumably for security reasons.
    • Would Flask do similar or keep the default of no limitations? IMO, that impacts whether or not HOST_URL impacts the default of this config item.

We obviously are not starting from scratch, but that is still, IMO, the way the functionality most obviously groups together when thinking through the configuration.

From an implementation perspective, I don't know if it's better to keep all the old settings as individual config items or deprecate them. I'd likely lean in favor of the latter since I think the current ones grew over time into a bit of an ugly knot. But, if not wanting to deprecate as you've seemed to indicate, then you could arguably split HOST_URL into the various components and set the existing config values from it. Although, in that case, I'd adjust external URL generation to solely pull from HOST_URL_EXTERNAL instead of being pieced together from the configuration items that should only affect request routing, IMO.

Could also not set any defaults from HOST_URL and make everything more explicit. I think that just depends on how Flask leans in terms of "user friendly" vs "explicit is better than implicit." Given that it's very easy to do:

HOST_URL = 'https://foo.bar/baz'
app = Flask()
app.config['HOST_URL'] = HOST_URL
app.config['HOST_URL_EXTERNAL'] = HOST_URL
app.config['HOSTS_ALLOWED'] = ['foo.bar']

And that doesn't require any parsing of the URL to get the bare host for HOSTS_ALLOWED, I'd lean towards everything being explicit.

We could have a DEFAULT_BASE_URL like https://example.com:56374/root. The problem is APPLICATION_ROOT, for WSGI there's no way to know when app setup is finished and when requests handling will start...

Two options I see:

  • Customize the config to parse HOST_URL any time it's set and distribute the values to other configuration items immediately. It would be a special handling of that one value which may or may not be that big of a deal.
  • Make it a keyword arg of Flask instead of a configuration value, just like some of the other "core" params that get set that way. Parse it out during init and apply to the configuration as defaults.

@davidism davidism added this to the 3.1.0 milestone Nov 6, 2024
@davidism
Copy link
Member

davidism commented Nov 6, 2024

Thanks for the analysis and writeup, I was trying to do something similar and ran out of steam. I'll think about the exact path forward, but I'm marking it for 3.1.

@davidism davidism self-assigned this Nov 7, 2024
@davidism
Copy link
Member

davidism commented Nov 10, 2024

I'm struggling to figure out why Werkzeug's Map.bind_to_environ handles server_name the way it does.

Note that because of limitations in the protocol there is no way to get the current subdomain and real server_name from the environment. If you don’t provide it, Werkzeug will use SERVER_NAME and SERVER_PORT (or HTTP_HOST if provided) as used server_name with disabled subdomain feature.

If subdomain is None but an environment and a server name is provided it will calculate the current subdomain automatically. Example: server_name is 'example.com' and the SERVER_NAME in the wsgi environ is 'staging.dev.example.com' the calculated subdomain will be 'staging.dev'.

The end of the first paragraph has a confusing grammatical typo ("will use ... as used"), so I can't be sure of the intention. Not sure what "as" was supposed to say.

https://github.com/pallets/werkzeug/blob/7868bef5d978093a8baa0784464ebe5d775ae92a/src/werkzeug/routing/map.py#L321-L339

Looking at the way Flask calls this, subdomain is always None, and host_matching is not used by default. So if SERVER_NAME is set, it will always be required to match, otherwise a warning is shown and a 404 is returned. The quick answer here is to never pass server_name, but why is that even a behavior? Why is that not handled by allowed_hosts later on?

It seems to be saying "if the subdomain wasn't given, try to get it from the host header as a prefix of the name the WSGI server is bound to", which does make sense. But also "if I can't get a prefix, then don't even try to route", but routing would still work fine if it assumed the empty prefix, as if the host exactly matched the server's name.

@davidism
Copy link
Member

davidism commented Nov 10, 2024

There's also the subdomain_matching parameter to Flask, which defaults to False. And the app.url_map.default_subdomain value, which is unset by default. For some reason, Werkzeug's Map doesn't account for default_subdomain in bind_to_environ, only bind. And Flask sets subdomain = self.url_map.default_subdomain or None, so only if it's truthy. But "" as the default subdomain totally works. If I remove the or None and set url_map.default_subdomain = '" and SERVER_NAME = "abc.localhost:5000, URL generation works and any domain or IP works. It's still a problem that there's this many parts, many undocumented, but it can work today.

from flask import Flask, request, url_for

app = Flask(__name__)
app.config["SERVER_NAME"] = "example.com:5000"
app.url_map.default_subdomain = ""  # and patch out the `or None`

@app.get("/")
def index() -> str:
    return url_for("index", _external=True)

@davidism
Copy link
Member

SERVER_NAME does have a routing purpose in this case, without interfering: it allows determining the subdomain by comparing Host to SERVER_NAME. For example, if Host = "a.b.example.com" and SERVER_NAME = "example.com", remove SERVER_NAME as a suffix and you get subdomain = "a.b". If you wanted to route beyond that, you'd either need to enable full host_matching instead of subdomain matching, and/or redirect alias domains to the configured default name.

@davidism
Copy link
Member

davidism commented Nov 10, 2024

Yeah, I'm starting to see how this makes sense now. I think the entire problem is the or None here:

flask/src/flask/app.py

Lines 442 to 451 in 07c7d57

if not self.subdomain_matching:
subdomain = self.url_map.default_subdomain or None
else:
subdomain = None
return self.url_map.bind_to_environ(
request.environ,
server_name=self.config["SERVER_NAME"],
subdomain=subdomain,
)

If that's fixed, then:

  • With subdomain_matching and host_matching disabled (the default), all hosts are accepted and don't affect routing.
  • SERVER_NAME doesn't affect routing unless it's a suffix of the host for determining the subdomain.
  • If subdomain_matching is enabled, it makes sense that hosts that didn't have SERVER_NAME as a suffix wouldn't route, there's no subdomain to determine.
    • Or Werkzeug could also be changed to use default_subdomain in that case, although the correctness of that is debatable.
  • If host_matching is enabled, the subdomain doesn't matter and none of this applies anyway.

So I think Flask.__init__ needs:

if not subdomain_matching:
    url_map.default_subdomain = ""

And Flask.create_url_adapter removes the or None. And we also expose a new ALLOWED_HOSTS config so that we can restrict the hosts still.

Longer term, Werkzeug 3.2 can add Map.subdomain_matching and adjust the behavior of Map.bind_to_environ a bit. But 3.1 just came out, so that will have to wait a bit.

@davidism
Copy link
Member

The change was slightly different than what I guessed above, but after some more testing, I'm convinced that's the right fix now. It seems to be what was intended in #2635. This whole thing, host_matching, subdomain_matching, and SERVER_NAME is really not easy to keep track of, and I clearly was misinterpreting it over the years.

It's not helped by the number of configurations in Werkzeug's routing:

  • host_matching and subdomain_matching can both be enabled (subdomain_matching is implemented in Flask, not Werkzeug), but host_matching takes priority in MapAdapter.
  • Passing subdomain to Map.bind when host_matching is enabled is an error. So it doesn't matter that host_matching takes priority in MapAdapter.
  • Rule (from @route or add_url_rule) can take both a subdomain and host pattern. Neither is used unless the corresponding x_matching is enabled, so they just invisibly don't work if you don't remember to set it in both places, or if you set the wrong one. And again, host_matching takes priority in case both are set, again invisibly.

I'm not clear what of this can be fixed or warned about, or even if it should be. Something to think about when moving subdomain_matching out of Flask and into Werkzeug 3.2 eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants