You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created a hudi table with record level index and performed upsert operation on it. Now, the first time when I performed the upsert operation, it read the record index file, figured out which files needed an update, and wrote the files to S3. The second time when I performed an upsert on the same table, I saw the record index folder being deleted and recreated under the metadata folder. My hudi table is quite large and this re-creation of the entire record level index is too expensive to support during an upsert operation.
To Reproduce
Steps to reproduce the behavior:
Create hudi table using insert mode and specify record level index
Perform an upsert
Perform another upsert
Expected behavior
I expect any number of upserts after the initial creation of the record level index to just update the index as required and not re-create the whole index.
The MDT is updated incrementally for each time of upsert, the reason MDT got re-initialized should be some data consistency issue between MDT and data table.
Is there a log line that I could search for to determine what might have caused it? Since there is only one writer that writes to this hudi table, is there a way to know what caused the inconsistency?
I may not be able to share driver logs as I saw this for our production table. However, is there a specific error message that I can search for in the logs? I can confirm if something like that exists.
Describe the problem you faced
I created a hudi table with record level index and performed upsert operation on it. Now, the first time when I performed the upsert operation, it read the record index file, figured out which files needed an update, and wrote the files to S3. The second time when I performed an upsert on the same table, I saw the record index folder being deleted and recreated under the metadata folder. My hudi table is quite large and this re-creation of the entire record level index is too expensive to support during an upsert operation.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect any number of upserts after the initial creation of the record level index to just update the index as required and not re-create the whole index.
Environment Description
Hudi version : 0.15.0
Spark version : 3.4.1
Hive version :
Hadoop version : 3.3.6
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Additional context
Config I used for upsert operation:
I do not see any errors but it doesn't make sense that hudi will clear away my index and recreate it.
The text was updated successfully, but these errors were encountered: