Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8934] Claim of RFC-87. Avro elimination for Flink writer #12729

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Alowator
Copy link
Contributor

@Alowator Alowator commented Jan 29, 2025

Change Logs

Inspired by RFC-84 HUDI-8920: there is an opinion Avro is not the best choice for Hudi. It requires an extra ser/de operations not only between Flink operators (will be fixed by RFC-84).

I decided to benchmark a POC version with native Flink's RowData writer for Hudi. It was simple enough, because Hudi already has native RowData to Parquet writer used by append mode, I reused this writer to create log blocks and two bottlenecks were found:

  1. Hudi performs a lot of Avro ser/de operations in writer runtime.

  2. Hudi stores Avro recrods as List, it causes a GC pressure on writer runtime, on my benchmarks garbage collection is about 30% of all hudi writer runtime.

profiler

Impact

As a result I reduced write time from ~4min to ~1min 20sec (x3 write performance boost):

results

I have a POC version we are already testing in our cloud environment, key improvements:

  1. This POC based on RFC-84 and already contains POC from RFC-84
  2. Write native RowData to parquet log blocks (eliminate unnecessary ser/de)
  3. Recrods are stored in BinaryInMemorySortBuffer (reduce GC pressure)
  4. Records are sorted by new QuickSort().sort(sortBuffer) before writing log block (maybe it's possible to preform data compaction without sorting? this sort performs fast enough so it doesn't affect write performance.)
  5. RowDataStreamWriteFunction flushes bucket async (reduces preciding operators backpressure)

My config

PC: 32CPU 128GiB
Data: 60 million records of TPC-H lineitem table
Java: 17 openjdk
Flink: 1.20, Single JM + Sinlge TM, standalone, taskmanager.process.size: 8G
Write: Hadoop HDFS 3.3.1, 9 node cluster
Read: Kafka 2.8, 3 node cluster, 8 partitions
Hudi table:
    'connector' = 'hudi',
    'path' = '<hdfs_path>',
    'table.type' = 'MERGE_ON_READ',
    'metadata.enabled' = 'false',
    'index.type'='BUCKET',
    'hoodie.bucket.index.hash.field'='l_orderkey,l_linenumber',
    'hoodie.bucket.index.num.buckets'='8',
    'hoodie.parquet.compression.codec' = 'snappy',
    'hoodie.logfile.data.block.format' = 'parquet',
    'hoodie.enable.fast.sort.write' = 'true',

    'write.operation' = 'upsert',
    'write.batch.size'='256',
    'write.tasks'='8',
    'compaction.async.enabled' = 'false',
    'clean.async.enabled' = 'false',
    'hoodie.archive.automatic' = 'false',
    'hoodie.clean.automatic' = 'false'

Risk level (write none, low medium or high below)

None

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Jan 29, 2025
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405
Copy link
Contributor

@cshuo is working on a RFC to introduce basic infras to Hudi, such as schema/data type/expressions, one of the goles is to integrate with engine native "row" for both reader and writer, welcome to contribute to it, I will cc you once the RFC is comming out.

@Alowator
Copy link
Contributor Author

Alowator commented Jan 30, 2025

@cshuo is working on a RFC to introduce basic infras to Hudi, such as schema/data type/expressions, one of the goles is to integrate with engine native "row" for both reader and writer, welcome to contribute to it, I will cc you once the RFC is comming out.

So do you think it’s better to do it as a part of another RFC ? I think creation of engine native row for Flink is a very big task to make it as part of another RFC, so it’s better to clearly separate this work

@danny0405
Copy link
Contributor

@cshuo is working on a RFC to introduce basic infras to Hudi, such as schema/data type/expressions, one of the goles is to integrate with engine native "row" for both reader and writer, welcome to contribute to it, I will cc you once the RFC is comming out.

So do you think it’s better to do it as a part of another RFC ? I think creation of engine native row for Flink is a very big task to make it as part of another RFC, so it’s better to clearly separate this work

That's my thought, @cshuo 's RFC is huge and it's great if you and @geserdugarov can get involved.

@Alowator
Copy link
Contributor Author

@cshuo is working on a RFC to introduce basic infras to Hudi, such as schema/data type/expressions, one of the goles is to integrate with engine native "row" for both reader and writer, welcome to contribute to it, I will cc you once the RFC is comming out.

So do you think it’s better to do it as a part of another RFC ? I think creation of engine native row for Flink is a very big task to make it as part of another RFC, so it’s better to clearly separate this work

That's my thought, @cshuo 's RFC is huge and it's great if you and @geserdugarov can get involved.

Ok, will look forward for @cshuo 's RFC. This work requires a scrutiny related to Flink write performance, including moments I presented in this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S PR with lines of changes in (10, 100]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants