[ROCm EP] Fix transpose helper for gfx gridsize constraints (#23527)

Remove inline default transposeHelper and ensure we use the proper check via CanUse_hipBlasTransposeHelper_MLFloat16 Related to change in ROCm Onnxruntime repo: ROCm#82 ### Description Required to correctly limit grid size of transpose helper kernel ### Motivation and Context Compile was defaulting to the inline constructor that was removed instead of using the overloaded case with proper checks. Removed the inline default "true" case as this is incorrect for newer AMD cards/targets Co-authored-by: Ted Themistokleous <[email protected]>
microsoft · Jan 29, 2025 · e3e4173 · e3e4173
1 parent 80bc1d2
commit e3e4173
Showing 1 changed file with 0 additions and 1 deletion.
diff --git a/onnxruntime/core/providers/rocm/shared_inc/fpgeneric.h b/onnxruntime/core/providers/rocm/shared_inc/fpgeneric.h
@@ -501,7 +501,6 @@ inline hipblasStatus_t hipblasTransposeHelper(hipStream_t /*stream*/, hipblasHan
   return hipblasDgeam(handle, transa, transb, m, n, alpha, A, lda, beta, B, ldb, C, ldc);
 }
 
-inline bool CanUse_hipblasTransposeHelper_MLFloat16(int /*m*/, int /*n*/) { return true; }  // CUDA has a limited grid size of 65536, ROCm has higher limits.
 hipblasStatus_t hipblasTransposeHelper(hipStream_t stream, hipblasHandle_t, hipblasOperation_t, hipblasOperation_t, int m, int n, const half*, const half* A, int, const half*, const half*, int, half* C, int);
 
 // copy