Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add union_tag scalar function #14687

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add union_tag scalar function #14687

wants to merge 1 commit into from

Conversation

gstvg
Copy link
Contributor

@gstvg gstvg commented Feb 16, 2025

Which issue does this PR close?

Rationale for this change

Retrieve the name of the currently selected field on a union, as there's no way to do it today

What changes are included in this PR?

union_tag scalar function implementation

Are these changes tested?

Yes, with sqllogictests when possible, and with unit tests for union scalars, which are not supported in SQL yet

Are there any user-facing changes?

A new scalar function union_tag

@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions labels Feb 16, 2025
@alamb alamb mentioned this pull request Feb 17, 2025
// Union fields type IDs only constraints are being unique and in the 0..128 range:
// They may not start at 0, be sequential, or even contiguous.
// Therefore, we allocate a values vector with a length equal to the highest type ID plus one,
// ensuring that each field's name can be placed at the index corresponding to its type ID.
Copy link
Contributor Author

@gstvg gstvg Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The union column used on the sqllogictests contains a single field with type id 3, so this is put to the test

fn register_union_table(ctx: &SessionContext) {
let union = UnionArray::try_new(
UnionFields::new(vec![3], vec![Field::new("int", DataType::Int32, false)]),
ScalarBuffer::from(vec![3, 3]),
None,
vec![Arc::new(Int32Array::from(vec![1, 2]))],
)
.unwrap();
let schema = Schema::new(vec![Field::new(
"union_column",
union.data_type().clone(),
false,
)]);
let batch =
RecordBatch::try_new(Arc::new(schema.clone()), vec![Arc::new(union)]).unwrap();
ctx.register_batch("union_table", batch).unwrap();
}

"union_function.slt" => {
info!("Registering table with union column");
register_union_table(test_ctx.session_ctx())
}

@Omega359
Copy link
Contributor

@alamb - here is another function coming in (xxhash, regexp_extract (both versions of it), array_min/array_max functions) where it is not clear what should be accepted and what shouldn't be. Since we've already accepted union_extract this may be a case of fleshing that series of functions out.

I strongly think we need 'official' documentation as to what will be and won't be accepted and an recommended repository/sub project where additional functions can be located. Hopefully under the apache umbrella such that they hopefully can maintained by the community and work across multiple DF versions.

@alamb
Copy link
Contributor

alamb commented Feb 19, 2025

@alamb - here is another function coming in (xxhash, regexp_extract (both versions of it), array_min/array_max functions) where it is not clear what should be accepted and what shouldn't be. Since we've already accepted union_extract this may be a case of fleshing that series of functions out.

I strongly think we need 'official' documentation as to what will be and won't be accepted and an recommended repository/sub project where additional functions can be located. Hopefully under the apache umbrella such that they hopefully can maintained by the community and work across multiple DF versions.

Yeah I agree. I think we should file a "discussion" type ticket to have this discussion. I can file one at some point later (I am low on time this week) or if you can that would be sweet.

we have some generic guidance here: https://datafusion.apache.org/contributor-guide/index.html#what-contributions-are-good-fits

@Omega359
Copy link
Contributor

Omega359 commented Feb 19, 2025

Yeah I agree. I think we should file a "discussion" type ticket to have this discussion. I can file one at some point later (I am low on time this week) or if you can that would be sweet.

we have some generic guidance here: https://datafusion.apache.org/contributor-guide/index.html#what-contributions-are-good-fits

on it. #14777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add union_tag function
3 participants