-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add union_tag scalar function #14687
base: main
Are you sure you want to change the base?
Conversation
// Union fields type IDs only constraints are being unique and in the 0..128 range: | ||
// They may not start at 0, be sequential, or even contiguous. | ||
// Therefore, we allocate a values vector with a length equal to the highest type ID plus one, | ||
// ensuring that each field's name can be placed at the index corresponding to its type ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The union column used on the sqllogictests contains a single field with type id 3, so this is put to the test
datafusion/datafusion/sqllogictest/src/test_context.rs
Lines 411 to 430 in e4b78c7
fn register_union_table(ctx: &SessionContext) { | |
let union = UnionArray::try_new( | |
UnionFields::new(vec![3], vec![Field::new("int", DataType::Int32, false)]), | |
ScalarBuffer::from(vec![3, 3]), | |
None, | |
vec![Arc::new(Int32Array::from(vec![1, 2]))], | |
) | |
.unwrap(); | |
let schema = Schema::new(vec![Field::new( | |
"union_column", | |
union.data_type().clone(), | |
false, | |
)]); | |
let batch = | |
RecordBatch::try_new(Arc::new(schema.clone()), vec![Arc::new(union)]).unwrap(); | |
ctx.register_batch("union_table", batch).unwrap(); | |
} |
datafusion/datafusion/sqllogictest/src/test_context.rs
Lines 117 to 120 in e4b78c7
"union_function.slt" => { | |
info!("Registering table with union column"); | |
register_union_table(test_ctx.session_ctx()) | |
} |
@alamb - here is another function coming in (xxhash, regexp_extract (both versions of it), array_min/array_max functions) where it is not clear what should be accepted and what shouldn't be. Since we've already accepted union_extract this may be a case of fleshing that series of functions out. I strongly think we need 'official' documentation as to what will be and won't be accepted and an recommended repository/sub project where additional functions can be located. Hopefully under the apache umbrella such that they hopefully can maintained by the community and work across multiple DF versions. |
Yeah I agree. I think we should file a "discussion" type ticket to have this discussion. I can file one at some point later (I am low on time this week) or if you can that would be sweet. we have some generic guidance here: https://datafusion.apache.org/contributor-guide/index.html#what-contributions-are-good-fits |
on it. #14777 |
Which issue does this PR close?
union_tag
function #11080Rationale for this change
Retrieve the name of the currently selected field on a union, as there's no way to do it today
What changes are included in this PR?
union_tag
scalar function implementationAre these changes tested?
Yes, with sqllogictests when possible, and with unit tests for union scalars, which are not supported in SQL yet
Are there any user-facing changes?
A new scalar function
union_tag