Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up uuid UDF (40x faster) #14675

Merged
merged 4 commits into from
Feb 15, 2025
Merged

Speed up uuid UDF (40x faster) #14675

merged 4 commits into from
Feb 15, 2025

Conversation

simonvandel
Copy link
Contributor

@simonvandel simonvandel commented Feb 14, 2025

Which issue does this PR close?

N/A

Rationale for this change

It seems to be faster to generate random u128's in bulk, and then converting them to Uuids.

However, we are not exactly using the same random number generator. Uuid::new_v4() is using getrandom. In this PR, we are using rand.

$ cargo bench -p datafusion-functions --bench uuid
uuid                    time:   [19.516 µs 19.541 µs 19.565 µs]
                        change: [-97.652% -97.648% -97.643%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Are these changes tested?

Yes, existing tests. And added a new test for UUIDv4 format.

Are there any user-facing changes?

Yes, faster uuid() UDF.

@simonvandel
Copy link
Contributor Author

Oops, need to generate valid uuidv4

let mut randoms = vec![0u128; num_rows];
rng.fill(&mut randoms[..]);

let values = randoms.into_iter().map(|x| Uuid::from_u128(x).to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simonvandel
I'm just curious how this is faster as under the hood it uses the same approach

    pub fn new_v4() -> Uuid {
        // This is an optimized method for generating random UUIDs that just masks
        // out the bits for the version and variant and sets them both together
        Uuid::from_u128(
            crate::rng::u128() & 0xFFFFFFFFFFFF4FFFBFFFFFFFFFFFFFFF | 0x40008000000000000000,
        )
    }

probably repeat_with().take() is slow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried calling Uuid::new_v4() instead of Uuid::from_u128 with thread_rng. The performance was the same as before this PR.

I think the performance comes from not having to repeatedly call the rng for bytes. Instead, num_rows*36 bytes can be rng'ed at once.

@comphead
Copy link
Contributor

comphead commented Feb 15, 2025

@simonvandel I'd like to ask you to create a slt test for UUID(), I know it is non guaranteed output, but we can check the v4 validity format I suppose.

UPD: something like

> SELECT REGEXP_LIKE(uuid(), 
       '^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$') 
       AS is_valid;
+----------+
| is_valid |
+----------+
| true     |
+----------+

let mut randoms = vec![0u128; num_rows];
rng.fill(&mut randoms[..]);

let values = randoms.into_iter().map(|x| Uuid::from_u128(x).to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to avoid a to_string here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in f4f87c8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doubled performance :)

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 15, 2025
@simonvandel simonvandel changed the title Speed up uuid UDF (20x faster) Speed up uuid UDF (40x faster) Feb 15, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @simonvandel and @Dandandan and @comphead

I think this optimization makes sense

I also tested it out locally and it seems to work well 👍 -- thanks also for the additional test @simonvandel (and the idea @comphead )

DataFusion CLI v45.0.0
> select uuid() from generate_series(1, 10);
+--------------------------------------+
| uuid()                               |
+--------------------------------------+
| 40e9df43-5fb8-4159-8ce9-1ce8313983b2 |
| 1a386033-ff17-4c6d-af0a-e4a09bb7dfc9 |
| cf410b99-6fb4-40ba-ad15-caf2a832a966 |
| 7c38582c-e7d2-4881-8ace-a1acc94734b3 |
| 4d708c96-1cfd-4b28-8f76-1f72762575cf |
| dce47ffd-6522-49d0-bf76-9bb9127ef948 |
| a61dbb63-b849-49a4-99eb-eb6e614beec8 |
| 9177acdb-3cb6-4ed4-a9b8-02f6070e26c5 |
| 4f904c21-f463-40b9-b38d-3c20db7aef8a |
| 3caec5d3-cf81-4888-bc66-92487b8bb539 |
+--------------------------------------+
10 row(s) fetched.
Elapsed 0.026 seconds.

> select uuid() from generate_series(1, 10);
+--------------------------------------+
| uuid()                               |
+--------------------------------------+
| 5252b5a1-cc7b-4f2a-b7fc-8f02274ef12e |
| 6c752c96-df99-420d-a1c2-73e2f21e4383 |
| a7b8c7bf-0a15-4522-bb6a-67f2a22fb73a |
| c284478e-cf24-4d14-9427-246c144fc481 |
| d13bcef0-d008-480b-8d16-e02b10ceeaca |
| 6667939e-900f-4e56-a3a9-8dbb6259b633 |
| d21f6b5c-4579-490a-acdb-06f448fc73f3 |
| 261e6621-aaac-4e86-81c3-a5c2fabf96f8 |
| 32a6c5b4-16c5-4e78-a613-a55cec48a391 |
| 0448bbf5-39f6-4cef-9dbc-1d6a7dd2dd08 |
+--------------------------------------+
10 row(s) fetched.
Elapsed 0.007 seconds.

Looks random to me 🤷

@@ -87,9 +88,25 @@ impl ScalarUDFImpl for UuidFunc {
if !args.is_empty() {
return internal_err!("{} function does not accept arguments", self.name());
}
let values = std::iter::repeat_with(|| Uuid::new_v4().to_string()).take(num_rows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This old code will allocate a new string, create uuid, then copy that string into the output.

Removing that allocation alone is likely to make a big difference


let mut buffer = [0u8; 36];
for x in &mut randoms {
// From Uuid::new_v4(): Mask out the version and variant bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked this is the case:
https://docs.rs/uuid/latest/src/uuid/v4.rs.html#33-39

@@ -720,6 +720,14 @@ select count(distinct u) from uuid_table;
----
2

# must be valid uuidv4 format
query B
SELECT REGEXP_LIKE(uuid(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb added the performance Make DataFusion faster label Feb 15, 2025
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @simonvandel

@comphead comphead merged commit 40bb75f into apache:main Feb 15, 2025
29 checks passed
@comphead
Copy link
Contributor

Thanks everyone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions performance Make DataFusion faster sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants