Torch Profiler Shows Zero Tensor Core Utilization for torch.nn.Conv3d, While Nsight Compute Confirms Usage #1041

BurkeHulk · 2025-02-14T06:39:54Z

Description

I profiled torch.nn.Conv3d using both PyTorch's built-in profiler and Nsight Compute. When viewing the results in TensorBoard, the PyTorch profiler reports zero Tensor Core utilization. However, Nsight Compute indicates that Tensor Cores are actually being used.

Upon investigating the codebase, I found that the Tensor Core allowlist (TC_Allowlist) in [tb_plugin/torch_tb_profiler/profiler/tensor_core.py](https://github.com/pytorch/kineto/blob/main/tb_plugin/torch_tb_profiler/profiler/tensor_core.py) appears to be outdated.

The kernel used in Conv3d is:

sm90_xmma_fprop_implicit_gemm_bf16bf16_bf16f32_f32_nhwckrsc_nhwc_tilesize128x128x64_warpgroupsize1x1x1_g1_execute_segment_k_off_kernel__5x_cudnn

However, xmma_fprop_implicit_gemm is not included in the allowlist, which might explain the discrepancy.

Expected Behavior

PyTorch's profiler using tensorboard should correctly report Tensor Core utilization when kernels that use Tensor Cores are executed.

Suggested Fix

The allowlist should be updated to include xmma_fprop_implicit_gemm and other relevant kernels.

Environment

PyTorch Version: 2.6.0+cu124
CUDA Version: 12.4
GPU: NVIDIA H200
Profiling Tools: PyTorch Profiler, Nsight Compute (2024.1.1.0 (build 33998838))
torch-tb-profiler: 0.4.3

The text was updated successfully, but these errors were encountered:

davidberard98 added the plugin PyTorch Profiler TensorBoard Plugin related label Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch Profiler Shows Zero Tensor Core Utilization for torch.nn.Conv3d, While Nsight Compute Confirms Usage #1041

Torch Profiler Shows Zero Tensor Core Utilization for torch.nn.Conv3d, While Nsight Compute Confirms Usage #1041

BurkeHulk commented Feb 14, 2025

Torch Profiler Shows Zero Tensor Core Utilization for torch.nn.Conv3d, While Nsight Compute Confirms Usage #1041

Torch Profiler Shows Zero Tensor Core Utilization for torch.nn.Conv3d, While Nsight Compute Confirms Usage #1041

Comments

BurkeHulk commented Feb 14, 2025

Description

Expected Behavior

Suggested Fix

Environment