Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix include paths #263

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

BrunoMSantos
Copy link
Collaborator

So here's a possible (interim) solution which also solves a common problem for me at least: always having to specify -nostdinc -I... manually when playing around with cross compiled projects and so on.

This does not address the fact that the clang in the path might not be the one associated with the libclang we're using at any given moment, but does that matter? We're telling hawkmoth to parse these files with the headers used by compiler A, where A may not even be clang at all... Generally, libclang is meant to provide us with a parser, not code.

As for our tests... We don't really care I think: both the parser and our code snippets will work with any compliant headers, we just make a fair assumption that there's a clang binary in the path, which is quite fair except for mac perhaps (I didn't look into that, but should be a simple fix, no?).

What do you think @jnikula ?

@BrunoMSantos BrunoMSantos requested a review from jnikula November 29, 2024 01:13
@jnikula
Copy link
Owner

jnikula commented Dec 14, 2024

@BrunoMSantos Sorry for being quiet for such a long time. I've been researching and thinking about this on and off though, so rest assured you haven't been ignored even though I haven't replied.

I think the overall conclusion is that we'll have to infer the args from the compiler, and we can't trust libclang to do it. It's a bummer, but oh well.

As to the implementation, I have a few comments:

  • The hawkmoth_compiler config should be orthogonal to actually using it for figuring out the args. We may want to use the compiler for other things, specifying it should not lead to something magic happening on the args. This should probably default to clang instead of None. We should probably document that hawkmoth_compiler, if specified, should point at a compiler compatible with the libclang being used.
  • I think we should have a configuration hawkmoth_dtrt (with a better name obviously) that automatically figures out the header search paths using hawkmoth_compiler, perhaps using -isystem instead of -I, adds -nostdinc, updates args passed to libclang. Flip a switch, do everything right out of the box. Kind of what you do depending on hawkmoth_compiler, but separated. It might be desirable to be able to specify all, c, c++, and none, or similar, to limit this to a single language.
  • The above should be figured out once at setup, not for every invocation. Otherwise you'll do one compiler invocation per directive, even for when we have cached parse results available. It would be nice to debug log this too.
  • I think get_include_args should have minimal (or perhaps no) modifications. Don't change the keyword arg, to remain compatible with existing users. Don't add -nostdinc. We may want to improve the interface in a backward compatible way, so that it remains useful for current users, but mostly add the new stuff without impacting them.

@BrunoMSantos
Copy link
Collaborator Author

@BrunoMSantos Sorry for being quiet for such a long time. I've been researching and thinking about this on and off though, so rest assured you haven't been ignored even though I haven't replied.

No worries, we've all been there! Sorry in advance for the long reply :)

I think the overall conclusion is that we'll have to infer the args from the compiler, and we can't trust libclang to do it. It's a bummer, but oh well.

As to the implementation, I have a few comments:

* The `hawkmoth_compiler` config should be orthogonal to actually using it for figuring out the args. We may want to use the compiler for other things, specifying it should not lead to something magic happening on the args.

So actually I see it as the expected behaviour: you state what compiler you use to compile the code, you get the exact same headers you do when compiling. If the default headers used by this compiler are not ok, then you have to override them just as you would for compiling anything. This is by far the least surprising option I think.

Consider the alternative: you select the same compiler you use to compile the code, you give the exact same flags and still it won't parse the code ok despite the same options compiling fine, after which you go file a bug report / request for help.

This should probably default to clang instead of None. We should probably document that hawkmoth_compiler, if specified, should point at a compiler compatible with the libclang being used.

What do you mean by compatible with libclang? It only needs to support extraction of the standard includes in the gcc/clang manner. If not, it's good to support the None option and let the user deal with it. Maybe this could be better documented with a few use cases though.

As for the default, it's a toss for me... Presumably clang is available, so it's a good default, in fact this is what we thought was libclang's default already. And as long as we default to clang, maybe we could use just -nostdinc for the compiler = None option just to be extra proper? It would be too pedantic as the default behaviour, but fine if the user specifically states None for the compiler I think. Thoughts?

* I think we should have a configuration `hawkmoth_dtrt` (with a better name obviously) that automatically figures out the header search paths using `hawkmoth_compiler`, perhaps using `-isystem` instead of `-I`, adds `-nostdinc`, updates args passed to libclang. Flip a switch, do everything right out of the box. Kind of what you do depending on `hawkmoth_compiler`, but separated. It might be desirable to be able to specify `all`, `c`, `c++`, and `none`, or similar, to limit this to a single language.

I confess I didn't understand what 'dtrt' should stand for 😅

As for the -isystem, (lib)clang apparently has no such option. Just the -I and -F (some Mac OS specific thing that we may or may not have to fiddle with in Mac, don't know).

Finally, regarding the all/c/c++, currently it does assume a single compiler has support for both languages with -x c/c++, though I think that is a fair enough assumption. Things I deliberately decided to ignore: using a different compiler for each in mixed language projects and different targets with different compilers even for the same language. The solution there is to just be a lot more explicit on the per file flags for now, or... See final note. I don't think variables like that scale in any way and I'd rather stay clear.

* The above should be figured out _once_ at setup, not for every invocation. Otherwise you'll do one compiler invocation per directive, even for when we have cached parse results available. It would be nice to debug log this too.

Hmm, good point, thanks.

* I think `get_include_args` should have minimal (or perhaps no) modifications. Don't change the keyword arg, to remain compatible with existing users. Don't add `-nostdinc`. We may want to improve the interface in a backward compatible way, so that it remains useful for current users, but mostly add the new stuff without impacting them.

I get the backwards compatibility point, but we add back anything we remove with -nostdinc, no? Only way this would break something would be if a user explicitly specified the wrong compiler to derive the includes, but do we really need to protect against that?

I'm more sympathetic to the argument name change... Maybe this would avoid widening the API and still nudge it towards language agnosticism?

def get_include_args(cpath='clang', lang='c', cc_path=None):
    if cc_path is not None:
        # Emit deprecation warning, maybe inform about `lang`
        cpath = cc_path
    ...

-->8--

Untimely right now, but I'm kind of brewing this idea that maybe we could (optionally) support the extraction of more specific compilation flags from a compile_commands.json as so much tooling relies on it already these days and which is so easily generated even when no explicit support exists (e.g...). Such a thing could get the compiler for each file and flags, getting everything right no matter how complicated the setup is...

Not considering working on that for now though, however, a possible middle ground if not a stepping stone towards that might be a configuration dictionary where a user can specify options for any/all/none files (or directories?) individually.

@jnikula
Copy link
Owner

jnikula commented Dec 15, 2024

I think the overall conclusion is that we'll have to infer the args from the compiler, and we can't trust libclang to do it. It's a bummer, but oh well.
As to the implementation, I have a few comments:

* The `hawkmoth_compiler` config should be orthogonal to actually using it for figuring out the args. We may want to use the compiler for other things, specifying it should not lead to something magic happening on the args.

So actually I see it as the expected behaviour: you state what compiler you use to compile the code, you get the exact same headers you do when compiling. If the default headers used by this compiler are not ok, then you have to override them just as you would for compiling anything. This is by far the least surprising option I think.

My point was: Specifying the compiler should not lead to any magic happening. That's just the compiler that would be used for figuring out the header search path if that's desired.

Consider the alternative: you select the same compiler you use to compile the code, you give the exact same flags and still it won't parse the code ok despite the same options compiling fine, after which you go file a bug report / request for help.

But we aren't using the compiler (front end), we use libclang. We need to figure out the header search path using the front end because libclang is unable to. And we need to specify which front end to use.

This should probably default to clang instead of None. We should probably document that hawkmoth_compiler, if specified, should point at a compiler compatible with the libclang being used.

What do you mean by compatible with libclang? It only needs to support extraction of the standard includes in the gcc/clang manner. If not, it's good to support the None option and let the user deal with it. Maybe this could be better documented with a few use cases though.

The point is, there may be several clang toolchains available. For example, on github macos-latest, there are three. Two from homebrew that use GCC libstd++ and one from Xcode that uses libc++. Whichever libclang you end up using, you need to specify a compiler that gives you the appropriate headers. The C++ headers between libstd++ and libc++ are different. The language versions supported by the compilers/libclang may be different.

As for the default, it's a toss for me... Presumably clang is available, so it's a good default, in fact this is what we thought was libclang's default already. And as long as we default to clang, maybe we could use just -nostdinc for the compiler = None option just to be extra proper? It would be too pedantic as the default behaviour, but fine if the user specifically states None for the compiler I think. Thoughts?

* I think we should have a configuration `hawkmoth_dtrt` (with a better name obviously) that automatically figures out the header search paths using `hawkmoth_compiler`, perhaps using `-isystem` instead of `-I`, adds `-nostdinc`, updates args passed to libclang. Flip a switch, do everything right out of the box. Kind of what you do depending on `hawkmoth_compiler`, but separated. It might be desirable to be able to specify `all`, `c`, `c++`, and `none`, or similar, to limit this to a single language.

I confess I didn't understand what 'dtrt' should stand for 😅

Do The Right Thing. :)

As for the -isystem, (lib)clang apparently has no such option. Just the -I and -F (some Mac OS specific thing that we may or may not have to fiddle with in Mac, don't know).

This seems to claim it does:

https://clang.llvm.org/docs/ClangCommandLineReference.html#include-path-management

Finally, regarding the all/c/c++, currently it does assume a single compiler has support for both languages with -x c/c++, though I think that is a fair enough assumption. Things I deliberately decided to ignore: using a different compiler for each in mixed language projects and different targets with different compilers even for the same language. The solution there is to just be a lot more explicit on the per file flags for now, or... See final note. I don't think variables like that scale in any way and I'd rather stay clear.

The idea was mainly to try to do the right thing, but allow the user to not have that per language if one language works fine and the other not, and let user handle it manually.

* I think `get_include_args` should have minimal (or perhaps no) modifications. Don't change the keyword arg, to remain compatible with existing users. Don't add `-nostdinc`. We may want to improve the interface in a backward compatible way, so that it remains useful for current users, but mostly add the new stuff without impacting them.

I get the backwards compatibility point, but we add back anything we remove with -nostdinc, no? Only way this would break something would be if a user explicitly specified the wrong compiler to derive the includes, but do we really need to protect against that?

Again, the idea is not to use get_include_args as-is for args when you've specified "try to do the right thing". That would be separate. get_include_args is a building block that the user may use directly if the "try to do the right thing" fails.

@BrunoMSantos
Copy link
Collaborator Author

[...]
My point was: Specifying the compiler should not lead to any magic happening. That's just the compiler that would be used for figuring out the header search path if that's desired.
[...]
But we aren't using the compiler (front end), we use libclang. We need to figure out the header search path using the front end because libclang is unable to. And we need to specify which front end to use.

Hmm, I'm a bit confused by your mentioning of "front ends" (such an LLVM term) and libclang compatibility. Our only compatibility requirement is that the code is some C/C++ dialect that libclang understands. I.e. hawkmoth_compiler = /some/inbred/proprietary/cross/compiler would be a valid setting as long as the compiler supports the required switches to extract relevant information about how it compiles the code, most notably the header search paths.

But back on the magic claim, that's a fair point of disagreement I guess, though I'm a bit surprised we do disagree on that. It sounds so much more surprising that we should get missing headers despite specifying a compiler and the same compilation flags that work when passed to said compiler. Surely we're not doing enough with the information the user gave us, no?

[...]
The point is, there may be several clang toolchains available. For example, on github macos-latest, there are three. Two from homebrew that use GCC libstd++ and one from Xcode that uses libc++. Whichever libclang you end up using, you need to specify a compiler that gives you the appropriate headers. The C++ headers between libstd++ and libc++ are different. The language versions supported by the compilers/libclang may be different.

Sure, but why is it different to have 10 different clangs or 10 different gccs? As for selecting a different libclang, that's out of our purview. But using libclang from toolchain A with headers from clang in toolchain B is totally fine and as valid as using the headers from some (e.g.) gcc based cross compiler.

Now if you specify an exotic compiler that doesn't support exfiltration of include headers in this manner, then you're out of luck and you need to do it unaided.

[...]
This seems to claim it does:
https://clang.llvm.org/docs/ClangCommandLineReference.html#include-path-management

That's for a newer version it seems, maybe they've added it recently? The version I tested with wouldn't work (I did try it originally assuming it would work like gcc). Don't remember if it tripped on the option or if it ignored it, I'll test it again later. But it's not in the man page for sure.

The idea was mainly to try to do the right thing, but allow the user to not have that per language if one language works fine and the other not, and let user handle it manually.

Hmmm, I'm not thoroughly convinced, but yeah, we could allow that. I'd then do hawkmoth_compiler_c/cpp, but that's just going back to the main discussion up top :)

[...]
Again, the idea is not to use get_include_args as-is for args when you've specified "try to do the right thing". That would be separate. get_include_args is a building block that the user may use directly if the "try to do the right thing" fails.

You mean it should fail gracefully unlike the current get_include_args? Not sure I got it.

Maybe we could indeed warn that fetching the headers failed and invite the user to disable the hawkmoth_compiler option and specify all the necessary flags by hand.

@jnikula
Copy link
Owner

jnikula commented Dec 16, 2024

I'm starting to feel like I could express myself more precisely writing this in Python rather than English. 😁

I don't think we have any huge disagreements here though. I think we agree we need to be able to configure a compiler to use for automatically figuring out the args to pass to libclang. The rest is details.


Let's try to focus on one thing at a time maybe? I would like to separate specifying hawkmoth_compiler from actually using that configuration to automatically figure out args.

When the automatic part works, this is not a big deal. But what if it fails, and the user needs to manually specify the args? Wouldnt't it be nice if we had an interface to get the include search path using hawkmoth_compiler but without directly modifying the args?

For the same reason I wouldn't want compiler.get_include_args() to add -nostdinc. If the user needs manual control, give the user full control. They can add -nostdinc themselves if they need it.

And we can have an internal function for doing all the magic for the automatic configure, including adding -nostdinc.


I'd suggest:

hawkmoth_compiler string, default to clang. Doesn't make anything happen on its own.

hawkmoth_autoconfigure boolean, default to false. If set to true, use hawkmoth_compiler to automatically figure out args to pass to libclang, including -I and -nostdinc.

I'm simplifying the latter config option from my earlier suggestion.


I don't think the existing compiler.get_include_args() function is all that important here. We can use it as a building block, but we can also add separate things.

Maybe we should have an API that takes Sphinx app or env so that it can look up the config to decide on the compiler to use?

@BrunoMSantos
Copy link
Collaborator Author

For the same reason I wouldn't want compiler.get_include_args() to add -nostdinc. If the user needs manual control, give the user full control. They can add -nostdinc themselves if they need it.

I think I addressed that specifically without a counter. Notably, the whole purpose of the function is to figure out standard includes, prefixing it with -nostdinc is either benign (avoid duplicated / overridden paths) or the user is somehow expecting to mix headers from 2 different standard libraries (e.g. 1 set of paths with a partial stdlib implementation overriding only a subset of libclang's default headers), which would be unbelievably surprising if it were intentional or needed since the code would not rely on the additional headers to compile anyway.

To the extent that anyone already relies on it... Maybe we could be more conservative in how we handle the transition? Though I fail to see what case should warrant such caution.

(note: originally I thought you were referring to the API change where a user might be calling the function with named parameters as get_include_args(cc_path='path'), which is a clear API breakage, hence the confusion)

I'd suggest:

hawkmoth_compiler string, default to clang. Doesn't make anything happen on its own.

hawkmoth_autoconfigure boolean, default to false. If set to true, use hawkmoth_compiler to automatically figure out args to pass to libclang, including -I and -nostdinc.

I'm with you so far, though:

  • I'd call it hawkmoth_auto_stdinc as there might be other things we try to derive from the compiler later on I guess. It's a net improvement to be able to disable specific features though.

  • As for the default... That's probably the biggest / only contention. Defaulting to clang's or even libclang's headers is a much stronger assumption than using the explicitly specified compiler to configure anything I think. But I think I'm loosing that battle 😅

Maybe we should have an API that takes Sphinx app or env so that it can look up the config to decide on the compiler to use?

Yeah, it might still surprise a user that specifying a particular libclang should not yield the corresponding clang as default or fail to find it altogether. Not sure how to solve that though... PATH is no guarantee for that, but neither is anything else (the dependency vector goes clang->libclang, not the other way around). These are the (totally no-go!) options I can think of:

  • Probing package managers does not scale obviously nor catches manually installed toolchains.

  • Going over all clang's and finding out which one links against which libclang might yield multiple clangs to a single libclang. Also, how would one go about robustly finding out all the clangs?

...and that is a big part of why I left the default as None.

@BrunoMSantos
Copy link
Collaborator Author

I'll probably work on a revised patch over the holidays, whatever the outcome.

@jnikula
Copy link
Owner

jnikula commented Dec 16, 2024

For the same reason I wouldn't want compiler.get_include_args() to add -nostdinc. If the user needs manual control, give the user full control. They can add -nostdinc themselves if they need it.

I think I addressed that specifically without a counter. Notably, the whole purpose of the function is to figure out standard includes, prefixing it with -nostdinc is either benign (avoid duplicated / overridden paths) or the user is somehow expecting to mix headers from 2 different standard libraries (e.g. 1 set of paths with a partial stdlib implementation overriding only a subset of libclang's default headers), which would be unbelievably surprising if it were intentional or needed since the code would not rely on the additional headers to compile anyway.

I guess that's fair. I'm just looking for maximum flexibility when that is required. I'm not hung up on this point though. Maybe it should be a separate first step to add -nostdinc?

To the extent that anyone already relies on it... Maybe we could be more conservative in how we handle the transition? Though I fail to see what case should warrant such caution.

(note: originally I thought you were referring to the API change where a user might be calling the function with named parameters as get_include_args(cc_path='path'), which is a clear API breakage, hence the confusion)

Well, I also meant that. 😁

I'd suggest:
hawkmoth_compiler string, default to clang. Doesn't make anything happen on its own.
hawkmoth_autoconfigure boolean, default to false. If set to true, use hawkmoth_compiler to automatically figure out args to pass to libclang, including -I and -nostdinc.

I'm with you so far, though:

* I'd call it `hawkmoth_auto_stdinc` as there might be other things we try to derive from the compiler later on I guess. It's a net improvement to be able to disable specific features though.

I think you're looking to add one config option per thing to autoconfigure. I'm thinking true/false could be expanded to a list of things. Same goal, different approaches.

* As for the default... That's probably the biggest / only contention. Defaulting to clang's or even libclang's headers is a much stronger assumption than using the explicitly specified compiler to configure anything I think. But I think I'm loosing that battle 😅

It's just that if hawkmoth_autoconfigure is disabled by default, the hawkmoth_compiler default isn't quite as important. But you could set the former without the latter, and in many cases it would just work with less configuration.

If hawkmoth_compiler is None by default, you have to set both to get anything going. Autoconfigure can't do anything without the compiler.

Maybe we should have an API that takes Sphinx app or env so that it can look up the config to decide on the compiler to use?

Yeah, it might still surprise a user that specifying a particular libclang should not yield the corresponding clang as default or fail to find it altogether. Not sure how to solve that though... PATH is no guarantee for that, but neither is anything else (the dependency vector goes clang->libclang, not the other way around). These are the (totally no-go!) options I can think of:

* Probing package managers does not scale obviously nor catches manually installed toolchains.

* Going over all clang's and finding out which one links against which libclang might yield multiple clangs to a single libclang. Also, how would one go about robustly finding out all the clangs?

I don't think we can even reliably figure out which libclang we end up using.

...and that is a big part of why I left the default as None.

I'm just thinking of sane defaults. The default for libclang is just libclang.so in the library path. It's not too far off to use clang in the regular path. 🤷

else:
clang_args.extend(getattr(conf, 'hawkmoth_clang_cpp', []))
clang_args.extend(compiler.get_include_args('clang', 'c++'))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two should get the compiler config from conf.py. We already import it for other configs and use e.g. clang_args = getattr(conf, 'hawkmoth_clang', []).

For #254 I imagine we'll need to somehow pass the compiler path from the workflow yaml. It's a bit easier if we can handle that in conf.py alone, not everywhere.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order in which the final args are added should be the same in both the extension and here.

@BrunoMSantos
Copy link
Collaborator Author

[...]

I guess that's fair. I'm just looking for maximum flexibility when that is required. I'm not hung up on this point though. Maybe it should be a separate first step to add -nostdinc?

I'll split it up 👍

  • I'd call it hawkmoth_auto_stdinc as there might be other things we try to derive from the compiler later on I guess. It's a net improvement to be able to disable specific features though.

I think you're looking to add one config option per thing to autoconfigure. I'm thinking true/false could be expanded to a list of things. Same goal, different approaches.

Ah, I see. That makes sense, keep the number of configuration variables down.

[...]

It's just that if hawkmoth_autoconfigure is disabled by default, the hawkmoth_compiler default isn't quite as important. But you could set the former without the latter, and in many cases it would just work with less configuration.

If hawkmoth_compiler is None by default, you have to set both to get anything going. Autoconfigure can't do anything without the compiler.

Right, possibly splitting hairs, but maybe a matrix of default possibilities can clear it up? Options numbered in my order of preference, with compiler / autoconf pairs as:

  1. None, True
  2. clang, True
  3. None, False
  4. clang, False

Options 1, 3 and 4 have no change in behaviour, but are affected by differences across distros that patch libclang's search paths by default (should we care?).

One could argue that autoconf = True in option 1 is latent, but it would be defaulting to libclang's paths which is still a form of autoconf 🤷‍♂️ I might also add that option 3 might warrant -nostdinc by default, but I digress.

Between the 1st and 4th one, I'd wager it's more likely that clang is not the desired compiler than it is likely a user would not want auto path configuration having chosen a compiler, so the 4th option would more often require setting both settings explicitly to get it working as intended.

More importantly though, the 1st one is more explicit in what hurdle a user should be clearing: it's not hawkmoth that magically autoconfigures when prompted, it's the user that selects an appropriate compiler so that it can do so. The 4th option places the focus on the magical 'autoconf' against a default/implicit compiler, which is why I rank it lower than my 3rd preference. And this is by far my most important claim I guess.

Finally, n.2 changes the behaviour, but towards what we expected from libclang's defaults in the 1st place, so... Fine? There's a possible surprise with default mismatches between clang / libclang, but it should solve any difference across distros as long as a clang is in PATH. It would also have avoided most if not all tickets on this issue so far, which might suggest it is the better default overall?

I really can't make a better case than this though, so I'll defer to you if I have not yet convinced you.

@jnikula
Copy link
Owner

jnikula commented Dec 17, 2024

Right, possibly splitting hairs, but maybe a matrix of default possibilities can clear it up? Options numbered in my order of preference, with compiler / autoconf pairs as:

  1. None, True

  2. clang, True

  3. None, False

  4. clang, False

Options 1, 3 and 4 have no change in behaviour, but are affected by differences across distros that patch libclang's search paths by default (should we care?).

One could argue that autoconf = True in option 1 is latent, but it would be defaulting to libclang's paths which is still a form of autoconf 🤷‍♂️ I might also add that option 3 might warrant -nostdinc by default, but I digress.

I must be missing something. To me option 1 is an impossible default, because you can't autoconfigure without the compiler, which defaults to None here. I would flag that combo as an error in configuration!

You refer to defaulting to liblang's paths (i.e. status quo), but then what does option 3 mean? Do you mean it would pass -nostdinc to actively disable the search paths?

[...]

I really can't make a better case than this though, so I'll defer to you if I have not yet convinced you.

I would like to understand your idea better before I disagree with it. 😁

@BrunoMSantos
Copy link
Collaborator Author

BrunoMSantos commented Dec 17, 2024

I must be missing something. To me option 1 is an impossible default, because you can't autoconfigure without the compiler, which defaults to None here. I would flag that combo as an error in configuration!

To me it makes sense in 1 of 2 ways, though the 1st one is better:

  • It's a latent predilection to autoconfigure as soon as a compiler is specified. Does nothing by default. I'm sure we could find a way to distinguish between cases where a user explicitly sets it to True without setting the compiler to raise a warning, though I'd just document the option is ignored unless a compiler is selected.

  • It's autoconfiguring without a compiler, which means autoconfiguring to libclang's defaults as the next best available guess. In that case, if a user explicitly did None, False, yeah, I'd prefer to stick a -nostdinc there as in 'give the user a parser and nothing else'.

Option 1 aside, do you give any weight to the argument that most / all users' 1st concern should be with selecting the right compiler and not autoconf? Or maybe see option 2 as viable? It's really a toss between 1 and 2 for me: they both allow the user to select a compiler and get going.

@jnikula
Copy link
Owner

jnikula commented Dec 17, 2024

I must be missing something. To me option 1 is an impossible default, because you can't autoconfigure without the compiler, which defaults to None here. I would flag that combo as an error in configuration!

To me it makes sense in 1 of 2 ways, though the 1st one is better:

  • It's a latent predilection to autoconfigure as soon as a compiler is specified. Does nothing by default. I'm sure we could find a way to distinguish between cases where a user explicitly sets it to True without setting the compiler to raise a warning, though I'd just document the option is ignored unless a compiler is selected.

Right. IOW, autoconfigure by default if we have a compiler to work with. Finally 💡

  • It's autoconfiguring without a compiler, which means autoconfiguring to libclang's defaults as the next best available guess. In that case, if a user explicitly did None, False, yeah, I'd prefer to stick a -nostdinc there as in 'give the user a parser and nothing else'.

Option 1 aside, do you give any weight to the argument that most / all users' 1st concern should be with selecting the right compiler and not autoconf? Or maybe see option 2 as viable? It's really a toss between 1 and 2 for me: they both allow the user to select a compiler and get going.

I agree selecting the right compiler is more important than enabling autoconf. However, the 1st concern shouldn't be fiddling with either, really.

If we had no backwards compatibility to consider, I think I'd probably go with clang & True as defaults, ignoring errors if we fail to find clang in PATH. (If compiler is set or unset by the user, we should fail if we can't find it.) I believe most systems will have only one libclang in the library path, will have clang available, and will have only one clang in PATH. Sanest possible default? If the user had to specify libclang location, they should specify clang too.

How about backwards compatibility? If the user doesn't specify anything manually, or uses compiler.get_include_args() without specifying the compiler, I don't expect problems. However, if the user specifies system include paths manually, the ones from autoconf will have precedence, and it can lead to problems. They'd have to disable autoconf. Is that too much of a surpise?

@jnikula
Copy link
Owner

jnikula commented Dec 17, 2024

We've been taking backwards compatibility quite seriously considering we're still at version < 1.0. Too seriously?

@BrunoMSantos
Copy link
Collaborator Author

If we had no backwards compatibility to consider, I think I'd probably go with clang & True as defaults, ignoring errors if we fail to find clang in PATH. (If compiler is set or unset by the user, we should fail if we can't find it.) I believe most systems will have only one libclang in the library path, will have clang available, and will have only one clang in PATH. Sanest possible default? If the user had to specify libclang location, they should specify clang too.

I agree, that option seems to be the perfect one in hindsight.

How about backwards compatibility? If the user doesn't specify anything manually, or uses compiler.get_include_args() without specifying the compiler, I don't expect problems. However, if the user specifies system include paths manually, the ones from autoconf will have precedence, and it can lead to problems. They'd have to disable autoconf. Is that too much of a surpise?

I may have done it wrong here, but we can and should ensure we add the autoconf -I options after any user clang arguments, so that they would not take precedence. The only surprise would be if someone expected libclang defaults and that those defaults should differ from clang's.

Judging by the tickets we got and intuitively, I don't think any of those scenarios are much to worry about.

@BrunoMSantos
Copy link
Collaborator Author

We've been taking backwards compatibility quite seriously considering we're still at version < 1.0. Too seriously?

I think it's a judgment call. Sometimes you've got to break things even when they are in use (e.g. imagine introducing a breaking change that is rolled out and becomes itself depended upon...).

Fortunately I don't think that's our case. I think it's fair to say we have few users, and the ones we have had more problems working around the issue in the first place than not, and the fix (assuming option 2) would take a frankly adversarially constructed case to break anything 🤷‍♂️

I wouldn't place too much weight on being below v1.0 though. Open source projects have a tendency to dwell there long after general adoption and we haven't committed to a definition of v1.0 yet (have we?).

@jnikula
Copy link
Owner

jnikula commented Dec 17, 2024

I may have done it wrong here, but we can and should ensure we add the autoconf -I options after any user clang arguments, so that they would not take precedence. The only surprise would be if someone expected libclang defaults and that those defaults should differ from clang's.

Good point. My gut feeling was to add this first so they can be overridden, but really this is stuff that can't be overridden. I commented on the extension and tests having a different order. That's still valid, they should be the same, but these options should come last.

@jnikula
Copy link
Owner

jnikula commented Dec 17, 2024

We've been taking backwards compatibility quite seriously considering we're still at version < 1.0. Too seriously?

I think it's a judgment call. Sometimes you've got to break things even when they are in use (e.g. imagine introducing a breaking change that is rolled out and becomes itself depended upon...).

Fortunately I don't think that's our case. I think it's fair to say we have few users, and the ones we have had more problems working around the issue in the first place than not, and the fix (assuming option 2) would take a frankly adversarially constructed case to break anything 🤷‍♂️

I wouldn't place too much weight on being below v1.0 though. Open source projects have a tendency to dwell there long after general adoption and we haven't committed to a definition of v1.0 yet (have we?).

The case for v1.0+ being different wrt breaking changes comes from semantic versioning. It specifically says the rules aren't the same for < v1.0. We follow it, but we haven't really committed to it.

In any case, I think we should just go ahead with this.

@jnikula
Copy link
Owner

jnikula commented Dec 17, 2024

I'll try to recap.

Add -nostdinc to compiler.get_include_args().

Modify compiler.get_include_args() to allow passing in the compiler.

Add hawkmoth_compiler config option, default to clang.

Add hawkmoth_autoconf (?) config option, default to True.

If hawkmoth_autoconf is enabled, use hawkmoth_compiler early (at extension setup if possible) to figure out header search path for each language. Add -nostdinc too. Debug log the figured out options once.

  • If hawkmoth_compiler is the default clang but not found, ignore errors, don't add anything, but debug log about it.

  • If hawkmoth_compiler is modified by user but not found or unset, flag errors.

If hawkmoth_autoconf is disabled, hawkmoth_compiler is don't care. Do not add -nostdinc either. (It could be a future autoconf option to only do -nostdinc, but frankly the user could set that via hawkmoth_clang too.)

Agreed? I don't think that's too far off from your original proposal. Just the extra config option and figuring out stuff early.

@BrunoMSantos
Copy link
Collaborator Author

I'll get to it in a few days ;)

@BrunoMSantos BrunoMSantos force-pushed the fix-include-paths branch 2 times, most recently from a3b5c61 to f22be55 Compare December 23, 2024 21:37
@BrunoMSantos
Copy link
Collaborator Author

I think it works as we discussed now. Let me know what you think ;)

I'm still exploring how to tackle testing though.

Include paths differ between C and C++ domains (typically C++ has
additional search directories), so we need to take that into account.

Furthermore, it makes sense to specify `-nostdinc` if we are including
the standard include directories explicitly already. Not only we prevent
potential duplication of search paths, we ensure that the resulting
search is strictly confined to the paths the given compiler would search
and not the disjunction of those and libclang's own default search
paths.

Since this is going to be called from within the directives now, it also
makes sense to improve error handling, which we do.
It's a common problem to want hawkmoth to parse the sources with the
exact headers a given code is meant to be compiled with and doing so
requires some verbosity upfront which can easily be avoided. Provide a
couple of options to do just that.

The defaults are chosen in a technically backwards incompatible way, but
it would take a frankly adversarial case to break it. In fact the new
defaults should yield the default behaviour everyone expected from the
beginning: we all expected libclang's search paths to be an exact match
to clang's, and indeed they are in some distributions.

Finally, the new search paths are added after other user's options since
the headers are searched in a left to right manner, giving the user the
possibility of overriding specific headers despite using this option.

For maximum flexibility, this functionality is gated on the 'autoconf'
settings, which can easily be extended later to support additional
features.
This reverts the effects of commit 1243852, while fixing the
inconsistent handling of C/C++ domains since libclang can't always
figure out the right search paths for its own version of Clang despite
prior claims.

It is still unclear whether the upstream issue is an actual issue or if
this is the intended behaviour
(llvm/llvm-project#18150), though it _is_ an
issue and simply using the fixed helper function and configuration
parameter comes at a suitably low cost to maintenance.

Fixes jnikula#262.
Allow each test case to select which of all the supported tests are
active. By default, all of them are, but sometimes it's impossible to
create tests that will pass as they target something specific to those
tests.

The motivation here is the expanding set of configuration variables,
which have not been tested so far, but which only affect extension
tests. Creating a test case for a breaking configuration would then not
work since the parser / cli tests would not see the same breaking
configuration.

Technically, it's unlikely a test case will ever need to be _just_ for
cli or parser, but not extension: anything that breaks those would break
the extension ones. But we do it in a general way anyway for simplicity.
- bool.c
conf-overrides:
hawkmoth_clang: -nostdinc
hawkmoth_autoconf:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's this one failing in CI, though it passes locally. Don't have the time now (short lunch break). The idea was to add -nostdinc to normalize behaviour across distros and disable autoconf such that stdbool.h wouldn't be found that way either.

Not sure this one tests what we want to though, so I may just remove it: is it failing to find the header because we disabled auto configuring or because we set -nostdinc?


This is a shortcut to specify ``-nostdinc -I<dir 1> ... -I<dir n>`` in
:data:`hawkmoth_clang` with the search directories of the specified compiler.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the documentation for hawkmoth_compiler should merely briefly reference what it's used for, and other places (i.e. hawkmoth_autoconf) should describe the details.

self._clang_args_post = compiler.get_include_args(cpath, 'c++')
else:
self.logger.warning('autoconf: \'stdinc\' option ignored (missing compiler)')

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still gets called once per each directive used in the documentation. There's one instance for directive.

The hard part is that setup() can't use the config options. IIUC you'd have to use a Sphinx event to do it once.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to simplify and go back to the first version (instead of the constructor) and optimize later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course. Well, I did this on a crowded flight, it was bound to happen 😅

I'll look into it when I can. There are other (maybe less clean) alternatives like caching the result if we go for a temporary solution, but the events should work well enough if there's a suitable one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants