-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
isort
behavior depends on which files are available in the sandbox
#15069
Comments
isort
may require transient sourcesisort
may require transitive sources
One thing to check is whether it needs transitive dependencies ("everything recursively") or just direct dependencies (things literally mentioned in the imports or explicitly in BUILD files): it seems like given that it is sorting the imports, there is a distinct possibility that it only requires direct dependencies. But yea: we should definitely fix this. Accuracy is more important than performance. |
Ah, yes. Direct dependencies should be enough. |
Adding my 2 cents: we should instead seed |
That would be better. I've not found how to configure my way out of this, so any help with that is appreciated. Not urgent, so whenever time permits works for me. |
I've dug into this. It's not that it needs dependencies, it actually may need the entire source root. The reason is that isort has subtle logic to deal with namespace packages, presumably so that if So if we only have But if some Note that So we can either: A) Bring in the entire source root A seems like massive overkill. So, B I guess? I slightly fear unintended consequences, but it's probably fine since those are real files in the repo, that could have been brought in some other way anyway. |
Thanks @benjyw!
+1 to go with option B. I guess the |
@benjyw just curious, have you looked into the config route? |
Nevermind just saw your comment about how the config is wrong for namespace packages |
Thanks for digging into this Benjy, and reporting on it Andreas. For context, this was known when isort 5 first came out and our docs call attention to it: https://www.pantsbuild.org/docs/python-linters-and-formatters#isort-possible-issues-with-its-import-classifier-algorithm But our hand-wavey workaround of config files isn't robust, it sounds like. It's also a total gotcha. |
I suspect that that only worked because isort's heuristics respond to the presence of pyproject.toml, regardless of its content. |
For isort specifically, would it be feasible to specify first party modules via the In our project we were having similar issues with |
@huonw Well, you can certainly set that in an isort config file, and Pants will use that config file. But it sounds like you're suggesting generating config? |
Yeah, I'm wondering if, for instance, the following function could be expanded to include something like pants/src/python/pants/backend/python/lint/isort/rules.py Lines 41 to 61 in 161bbbe
|
This hit me again in another project.. ;) And yea, the "known first party" config option seems like a reasonable way to resolve this. Putting it in a generated isort config rather than as args on the command line could help avoid issues with too long command lines as well. |
I thought we were saying the config approach is not viable for namespace packages? @kaos If config is viable, then I strongly prefer we auto-setup config for you rather than including all the projects in the sandbox. The latter would be substantially slower, and I think not properly tracked with |
I'm not sure if I tried with the "known first party" config option. I only tried the "src paths" one, from what I can read in my issue/PR. The suggested approach by Benjy I think was to add empty |
If I understand correctly, doing this automatically might break the |
We may have to start by documenting this and offering the workaround of setting the firstparty package explicitly in isort config. |
To me, we're starting to get a little bit in the weeds on third-vs-first party, but point taken. Perhaps there's an implementation which uses the most-specific (i.e longest path) "superset" of modules? |
isort
may require transitive sourcesisort
behavior depends on which files are available in the sandbox
This is the crux of the issue, as I understand it. So I think it's the right amount of weeds... |
Well specifically I mean if "foo.bar" is "third party" when we talk about "foo.baz". They share an ancestor, so presumably they're in the same party, albeit different packages and codebases |
True, but Pants doesn't care about whether the same people wrote it or own the IP or whatever. It cares about how code is consumed. The salient point here is that |
Right, my point, outside of Pants, is "are packages outside of this one, but still in your namespace considered first party or third-party?" And then slap an isort lens on the question as well |
OK, so good news and bad news: Good news: This is easily fixable in a not-nasty way. Essentially specify every import we're going to provide as a firstparty module. It makes sense, and works around the namespace issue. This works purely on the argument level. Bad news: It requires a bugfix to |
Re |
Setting up a test case repo is a good idea though, I will try and do that asap |
Not dep inference, but Pants wholistically. Whether it's inferred or explicit, if we're providing a Python module via a file on disk, that must be firstparty. That's what I like about this. It's super simple. |
We literally already have this logic in dep inference though. Seems like we can consume that mapping and save ourselves some work. |
We don't need the mapping though, we just need the dependencies were about to provide 🙂 (which is a product of the mapping) |
As I mentioned in #15069 (comment) - looking at all deps (even transitively) is not sufficient. So can you explain what "specify every import we're going to provide as a firstparty module" means? |
Basically, if an import is being provided by Pants, that's first party, everything else if third party |
ELI5? I don't know what "import id" means here. |
Sorry fat fingered it on mobile: "import is". Basically we just tell isort what is first party. And we know that because we know what first party (direct) dependencies we provide. I think it's probably worth just demonstrating via a reproduction/test, especially since namespace packages are in the mix |
Yeah, I fear it's a little more subtle than that, but it's on me to make a repro of the problem. |
OK, I wanted this off my stack, so I decided to jump in: josh@cephandrius:~/work/splash$ tree .
.
├── bad
│ ├── foo
│ │ └── bar
│ │ ├── bar.py
│ │ └── __init__.py
│ └── run_isort.sh
└── good
├── foo
│ ├── bar
│ │ ├── bar.py
│ │ └── __init__.py
│ └── baz.py
└── run_isort.sh
6 directories, 7 files Files are identical across tree, with: josh@cephandrius:~/work/splash$ cat bad/foo/bar/bar.py
import foo.qux
import requests
josh@cephandrius:~/work/splash$ cat bad/run_isort.sh
#!/usr/bin/env bash
cd "$(dirname "$0")"
pipx run isort foo/bar/bar.py "$@" Then running each: josh@cephandrius:~/work/splash$ ./bad/run_isort.sh
josh@cephandrius:~/work/splash$ ./good/run_isort.sh
Fixing /home/josh/work/splash/good/foo/bar/bar.py and for contents: josh@cephandrius:~/work/splash$ diff -u bad/foo/bar/bar.py good/foo/bar/bar.py
--- bad/foo/bar/bar.py 2023-09-08 09:52:26.240101161 -0500
+++ good/foo/bar/bar.py 2023-09-08 09:53:32.650789329 -0500
@@ -1,2 +1,3 @@
-import foo.qux
import requests
+
+import foo.qux So @benjyw your theory holds. Now, let's reset everything and try again with my proposed fix (which is "if Pants could provide the direct dependency as a file, we pass josh@cephandrius:~/work/splash$ ./bad/run_isort.sh -p=foo.qux
Fixing /home/josh/work/splash/bad/foo/bar/bar.py
josh@cephandrius:~/work/splash$ ./good/run_isort.sh -p=foo.qux
Fixing /home/josh/work/splash/good/foo/bar/bar.py
josh@cephandrius:~/work/splash$ diff -u bad/foo/bar/bar.py good/foo/bar/bar.py So, as expected. We know what's firstparty because we're the all-seeing-eye of firstparty/thirdparty |
Reproducer in script form, for easier reproduction on other machines: cd $(mktemp -d)
# isort setup
python -m venv venv
. venv/bin/activate
pip install isort==5.12.0
# configure the directory structure
mkdir -p foo/bar/
touch foo/bar/__init__.py
cat > foo/bar/bar.py <<EOF
import foo.qux
import requests
EOF
echo '"BUG": foo.qux is treated as third-party'
python -m isort foo/bar/bar.py
cat foo/bar/bar.py
echo 'Proposed fix: explicit -p arg'
python -m isort foo/bar/bar.py -p=foo.qux
cat foo/bar/bar.py
echo 'Proposed fix matches behaviour when the file actually exists'
touch foo/qux.py
python -m isort foo/bar/bar.py
cat foo/bar/bar.py
(FWIW, this is a independent reinvention of a suggestion slightly earlier in the thread, e.g. #15069 (comment)) |
Yup I saw that comment after having the same thought. Great minds think alike? I have the code in a branch. I need to turn this reproduction into a test on |
So I've made a repro of the subtle problem, and I think I've convinced myself that it's not subtle after all, and that pulling direct deps into the sandbox will be sufficient to solve the problem without mucking around with config: https://github.com/benjyw/isort-issue I got tangled up in the idea that a file outside your deps could change the classification, which is true. But it's also true that it can only change it in the same direction as a direct dep would. So... |
#17403 (comment) describes a bit of a split between `lint` and `fix` in terms of batching. This change fixes #19634 by switching `AbstractFixRequest.files` to report `self.snapshot.files` which is already de-duplicated. (Note that the elements in the case of `lint` were the filepaths directly translated from addresses. Maybe it's worth de-duplicating at a higher layer, but 🤷♂️ ) This change also fixes #17403 (which really is a duplicate of #15069 manifest in new and exciting ways) by de-duplicating the `fix`-specific batching strategy. It didn't sit right to me that in the reproduction repo we did 1 batches of 2 files for `fix`, but 1 batch of 4 files for `lint`.
Hi, what's the status of this issue? Did #19796 fix this, as the description there implies?
|
I solved this at least temporarily by increasing batch size to larger than the repo, to get consistent results. |
Unfortunately isort doesn't play nicely with sandboxing, so the best fix is to manually add your top-level package(s) to the "known first party" isort config. That is more robust than attempting to finesse the batches. |
Describe the bug
When running
isort
on a single source file, the result is not always consistent compared to when running on all sources.Pants version
2.11.0rc1, 2.12.0.dev1
OS
Mac
Additional info
Test case demonstrating this issue in #15002
It seems there are cases even in
fmt
that requires transitive deps to be present, not only forcheck
.See related comment: #14186 (comment)
I've run across this a couple of times now at work, where I have a pre-commit hook that runs on a subset of files (exactly the subset of changed files only), and fails, where as when I run by hand I usually use globs like a hammer and includes a whole lot more, which exposes this issue, due to
isort
not being stable for the two ways it is being invoked.The text was updated successfully, but these errors were encountered: