-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ListingTable cannot handle partition evolution #13270
Comments
cc @alamb I had promised you this a long time ago but only got around to it now |
Thanks @adriangb |
take |
@logan-keede I see you're doing some work on |
@adriangb my focus has been on refactoring |
@adriangb Sorry for the delay, i am starting to investigate this issue this week. |
First round investigation: We need runtime to do the partition evolution and infer partitions result need to overwrite the empty FileScanConfig table_partition_cols, i can't find a good way until now. Because many cases in code using the FileScanConfig table_partition_cols to pass the paras. @adriangb Do you have any suggestions, how can we do this in current architecture? Updated: We may can have a runtime cache to store the partition evolution result? So we can use it if FileScanConfig table_partition_cols is empty? |
I think the fundamental issue is that the partition columns are specified on a per-exec basis via
|
+1 I'm also blocked on this. It'd be nice if schema evolution could be a first class citizen in datafusion. It's been pretty painful/stressful running into schema evolution bugs with https://telemetry.sh. It feels like a ticking time bomb before a schema gets corrupted :( |
Just noticed we have a solution for partition evolution for dynamic file catalog, see details PR, may be we need some improvement based on it? https://github.com/apache/datafusion/pull/12683/files Still can't find a good solution from code side, feel free to take it. |
@zhuqi-lucas here's one current failure scenario with evolution: #14755 |
Describe the bug
With CSV:
With Parquet:
To Reproduce
No response
Expected behavior
Partition evolution is handled and both cases return
Additional context
Having played around quite a bit with ParquetExec and the SchemaAdapter machinery I think what should happen is:
PartitionedFile
and not on theFileScanConfig
The text was updated successfully, but these errors were encountered: