Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLA flags: No speed ups on GPUs and segmentation fault #17103

Open
AakashKumarNain opened this issue Sep 12, 2024 · 5 comments
Open

XLA flags: No speed ups on GPUs and segmentation fault #17103

AakashKumarNain opened this issue Sep 12, 2024 · 5 comments
Assignees

Comments

@AakashKumarNain
Copy link

I am developing some code in Equinox and JAX, and running it on A100 GPUs. As per the JAX GPU tips, some operations should provide better performance.

On my end, I don't see any difference in the performance after enabling these flags. Also, some flags like --xla_gpu_enable_triton_softmax_fusion=true results in segmentation fault. Please check the detailed description in the related issue

@cheshire
Copy link
Contributor

Performance flags are not considered to be stable/supported/safe API to use.

@hawkinsp
Copy link
Member

@cheshire I don't think that's a realistic description of the current state. The current state is that because we haven't succeeded at landing many useful flags as defaults, lots of users can and do have to override XLA flags.

While we may not expect performance improvements from doing that, I think the segfault at least is a real bug and shouldn't be summarily closed.

@bchetioui
Copy link
Member

@AakashKumarNain thanks for pointing this out to me. Do you have a reproducer for your segmentation fault?

I have not been able to reproduce the crash with the code provided in jax-ml/jax#22705 (comment), at least.

If there is no way to re-trigger the crash, then I'm tempted to close this bug as "WAI" for the performance issue for now. (If this worked out of the box every time, it wouldn't be a flag :))

@AakashKumarNain
Copy link
Author

If there is no way to re-trigger the crash, then I'm tempted to close this bug as "WAI" for the performance issue for now. (If this worked out of the box every time, it wouldn't be a flag :))

I will try to reproduce it on my side and will update

@bchetioui bchetioui self-assigned this Nov 13, 2024
@AakashKumarNain
Copy link
Author

@bchetioui I tested this with the latest version, and this didn't cause any more segfault. But, it didn't improve any performance either

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants