Skip to content

Commit

Permalink
[ROCm EP] Fix transpose helper for gfx gridsize constraints (#23527)
Browse files Browse the repository at this point in the history
Remove inline default transposeHelper and ensure we use the proper check
via CanUse_hipBlasTransposeHelper_MLFloat16

Related to change in ROCm Onnxruntime repo:
ROCm#82

### Description

Required to correctly limit grid size of transpose helper kernel

### Motivation and Context
Compile was defaulting to the inline constructor that was removed
instead of using the overloaded case with proper checks.
Removed the inline default "true" case as this is incorrect for newer
AMD cards/targets

Co-authored-by: Ted Themistokleous <[email protected]>
  • Loading branch information
TedThemistokleous and Ted Themistokleous authored Jan 29, 2025
1 parent 80bc1d2 commit e3e4173
Showing 1 changed file with 0 additions and 1 deletion.
1 change: 0 additions & 1 deletion onnxruntime/core/providers/rocm/shared_inc/fpgeneric.h
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,6 @@ inline hipblasStatus_t hipblasTransposeHelper(hipStream_t /*stream*/, hipblasHan
return hipblasDgeam(handle, transa, transb, m, n, alpha, A, lda, beta, B, ldb, C, ldc);
}

inline bool CanUse_hipblasTransposeHelper_MLFloat16(int /*m*/, int /*n*/) { return true; } // CUDA has a limited grid size of 65536, ROCm has higher limits.
hipblasStatus_t hipblasTransposeHelper(hipStream_t stream, hipblasHandle_t, hipblasOperation_t, hipblasOperation_t, int m, int n, const half*, const half* A, int, const half*, const half*, int, half* C, int);

// copy
Expand Down

0 comments on commit e3e4173

Please sign in to comment.