-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper NULL handling in array functions #14451
Comments
take |
Hello @jkosh44, what do you think about this?
Also, since |
Good point, I don't think we can use
Yes, we should make sure that null handling is consistent for |
@alan910127 I have a very rough proposal, that might help you with this issue. When working with datafusion/datafusion/expr-common/src/signature.rs Lines 228 to 255 in ad60ffc
However, what if we modified it to look like this: #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Hash)]
pub struct ArrayFunctionSignature {
args: Vec<ArrayFunctionArg>,
}
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Hash)]
pub enum ArrayFunctionArg {
Element,
Index,
Array,
RecursiveArray,
MapArray,
DataType{
data_type: DataType,
name: &'static str
},
} I may try to put a PoC together today. |
Also FWIW, it seems like the issue with some of these functions is with passing in any incorrect type, not just NULL. For example,
which makes me think that fixing the signatures is the correct way to address this. |
This commit allows for more expressive array function signatures. Previously, `ArrayFunctionSignature` was an enum of potential argument combinations and orders. For many array functions, none of the `ArrayFunctionSignature` variants work, so they use `TypeSignature::VariadicAny` instead. This commit will allow those functions to use more descriptive signatures which will prevent them from having to perform manual type checking in the function implementation. As an example, this commit also updates the signature of the `array_replace` family of functions to use a new expressive signature, which removes a panic that existed previously. Works towards resolving apache#14451
This commit allows for more expressive array function signatures. Previously, `ArrayFunctionSignature` was an enum of potential argument combinations and orders. For many array functions, none of the `ArrayFunctionSignature` variants work, so they use `TypeSignature::VariadicAny` instead. This commit will allow those functions to use more descriptive signatures which will prevent them from having to perform manual type checking in the function implementation. As an example, this commit also updates the signature of the `array_replace` family of functions to use a new expressive signature, which removes a panic that existed previously. Works towards resolving apache#14451
OK I put this together here: #14532, if you're interested in help reviewing. It fixes array_resize, but doesn't touch any of the other array functions. If people are happy with that change, then you should be able to build off of that to fix the rest of the array functions. EDIT: Or if you're interested you can help debug the test failure in the PR because I'm done working on it for the day. |
This commit allows for more expressive array function signatures. Previously, `ArrayFunctionSignature` was an enum of potential argument combinations and orders. For many array functions, none of the `ArrayFunctionSignature` variants worked, so they used `TypeSignature::VariadicAny` instead. This commit will allow those functions to use more descriptive signatures which will prevent them from having to perform manual type checking in the function implementation. As an example, this commit also updates the signature of the `array_replace` family of functions to use a new expressive signature, which removes a panic that existed previously. There are still a couple of limitations with this approach. First of all, there's no way to describe a function that has multiple different arrays of different type or dimension. Additionally, there isn't support for functions with map arrays and recursive arrays that have more than one argument. Works towards resolving apache#14451
@jkosh44 Thanks for all this information! I’ll take a look at the PR later and likely review it. If I have any thoughts or suggestions, I’ll leave comments there. |
This commit allows for more expressive array function signatures. Previously, `ArrayFunctionSignature` was an enum of potential argument combinations and orders. For many array functions, none of the `ArrayFunctionSignature` variants worked, so they used `TypeSignature::VariadicAny` instead. This commit will allow those functions to use more descriptive signatures which will prevent them from having to perform manual type checking in the function implementation. As an example, this commit also updates the signature of the `array_replace` family of functions to use a new expressive signature, which removes a panic that existed previously. There are still a couple of limitations with this approach. First of all, there's no way to describe a function that has multiple different arrays of different type or dimension. Additionally, there isn't support for functions with map arrays and recursive arrays that have more than one argument. Works towards resolving apache#14451
This commit allows for more expressive array function signatures. Previously, `ArrayFunctionSignature` was an enum of potential argument combinations and orders. For many array functions, none of the `ArrayFunctionSignature` variants worked, so they used `TypeSignature::VariadicAny` instead. This commit will allow those functions to use more descriptive signatures which will prevent them from having to perform manual type checking in the function implementation. As an example, this commit also updates the signature of the `array_replace` family of functions to use a new expressive signature, which removes a panic that existed previously. There are still a couple of limitations with this approach. First of all, there's no way to describe a function that has multiple different arrays of different type or dimension. Additionally, there isn't support for functions with map arrays and recursive arrays that have more than one argument. Works towards resolving apache#14451
Describe the bug
The following array functions do not properly handle NULL input, they either return an error or panic (non-exhaustive):
array_sort
array_replace
array_replace_all
array_replace_n
array_resize
Discovered in #14289
To Reproduce
Scalar input
Array Input
Expected behavior
The functions should either return NULL or correctly handle the NULL input without panicking.
Additional context
I briefly investigated the functions and found the following:
array_sort
To fix this function we can add
NullHandling::Propagate
to theimpl ScalarUDFImpl for ArraySort
. Then we also have to updatearray_sort_inner
to return NULL on NULL inputs.array_resize
I haven't looked into this one yet, but I think the solution is probably similar to
array_sort
.array_replace
Note this applies to
array_replace_all
andarray_replace_n
.This function actually does not want to propagate NULLs, it should be able to find NULL elements and replace with NULL elements. When the inputs are
ArrayRef
s then actually everything works correctly, it's only when the inputs areScalar
that things break. The issue is that the functions acceptsDataType::Null
instead of a null value for the list element type. Then we get type errors when trying to use theDataType::Null
with a typed list.I believe if we updated the signature from
Signature::any(3, Volatility::Immutable)
to something that includes type information then we would fix the panic. However, we face a familiar problem that there is no way to represent(list, element-type, element-type, i64)
with the currentSignature
struct.The text was updated successfully, but these errors were encountered: