Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make Table.join() use the same logic as Table.rename() #10753

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Jan 30, 2025

There are a few different changes here, I can split them into separate PRs if you want. I think the first two refactor commits are strictly improvements, with no behavior change (I think), and feel pretty good about them. The second two are behavior changes and could use more discussion.

  1. refactor: merely simplify/improve the implementation of Table.rename(). Pulling it out into its own util function so it's obvious what the inputs are, before it was tricky to see what was actually getting used. No changes to behavior here.
  2. refactor: change the default values for lname param of the .join() methods, from "" to "{name}". This by itself doesn't actually have any behavior change, it just is a lot more symmetrical with the other argument. This default is a remnant of when we used the lsuffix and rsuffix API.
  3. reuse .rename()'s implementation inside .join(). This increases the scope of .join(), so it can accept a wider range of inputs. But makes our API more consistent. I wonder if there are other places where we could take advantage of this util method?
  4. Change the behavior of .join(..., lname=""). Before. it would act the same as passing "{name}", now it errors. I doubt this affects many users, but probably affects some. I can adjust this in several ways (In order of my preference):
    1. adjust util.rename() so that resolving to falsy/None is an error. Users MUST return a string that they want as the result. This is the most breaking, but also I think the most sane. I already proposed this during the old .rename() refactor, but we decided against it.
    2. adjust util.rename() so that anything falsy is interpreted as a no-op. This is similar to how currently None is interpreted. This would not be breaking at all, but would increase the scope of both .join() and .rename()
    3. special case just .join() so that anything falsy is interpreted as a no-op. This has the least scope increase, also isn't breaking, but leads to inconsistency.

@github-actions github-actions bot added the tests Issues or PRs related to tests label Jan 30, 2025
There is this code inside expr/types/join.py:

```python
def disambiguate_fields(
    how,
    predicates,
    equalities,
    left_fields,
    right_fields,
    left_template,
    right_template,
):
    """Resolve name collisions between the left and right tables."""
    collisions = set()
    left_template = left_template or "{name}"
    right_template = right_template or "{name}"
   ...
```

This makes it so that "" is treated equivalently to "{name}". So this change doesn't have any behavior change, it just makes the defaults follow a more consistent pattern. The old patterns was a remnant from the lsuffix API we used to have.
before, any falsy values were interpreted as a noop. Now this errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant