HiFiGAN Finetune "Cannot re-initialize CUDA in forked subprocess." #12178

Fournogo · 2025-02-13T23:22:46Z

Describe the bug

When attempting to run the hifigan_finetune.py script in the provided tts examples, the following error is triggered:

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

I've tried the standard approach for fixing this issue which is including the following at the head of the hifigan_finetune.py script:
import torch.multiprocessing as mp
mp.set_start_method('spawn', force=True)

This just leads to a further cascading set of errors. I'm not sure if I'm doing something grossly wrong or if hifigan is grossly bugged. I've tried this in NeMo docker containers of various versions and conda environments of various configurations. All of which lead to the same issue.

Steps/Code to reproduce bug

Clone NeMo repository.
Add proper manifest.json files to examples/tts (for simplicity)
Update hifigan.yaml config to point to manifests. Update hifigan_finetune.py to use the "hifigan" config (instead of hifigan_44100)
Run hifigan_finetune.py

Expected behavior

I expect training to begin. Instead, the model is downloaded and sanity checking begins, followed shortly by the above error.

Environment overview (please complete the following information)

Environment location: Docker

I've used the following docker pull commands. ALL of which have reproduced the issue:
docker pull nvcr.io/nvidia/nemo:23.03
docker pull nvcr.io/nvidia/nemo:23.06
docker pull nvcr.io/nvidia/nemo:24.05
docker pull nvcr.io/nvidia/nemo:24.12.01

The issue does NOT seem to be present in the following version:
docker pull nvcr.io/nvidia/nemo:22.09

Environment details

If NVIDIA docker image is used you don't need to specify these.

Additional context

I've tried this on two machines, a Windows 11 PC running WSL with a RTX 2070 and a Linux Server running Ubuntu 22.04 with a RTX 2000 Ada. Both have experienced the same issue. The windows machine had a conda environment which ran fastpitch training no problem. The linux machine also ran fastpitch in its conda environment but I switched to the Docker containers to make sure this issue was actually pervasive and not just due to my install process.

Fournogo · 2025-02-13T23:34:28Z

I take that back - the issue is in 22.09.. I was able to move past it by setting num_workers=0. Is this a bug or am I mistaken in some part of this process?

Fournogo added the bug Something isn't working label Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HiFiGAN Finetune "Cannot re-initialize CUDA in forked subprocess." #12178

HiFiGAN Finetune "Cannot re-initialize CUDA in forked subprocess." #12178

Fournogo commented Feb 13, 2025

Fournogo commented Feb 13, 2025 •

edited

Loading

HiFiGAN Finetune "Cannot re-initialize CUDA in forked subprocess." #12178

HiFiGAN Finetune "Cannot re-initialize CUDA in forked subprocess." #12178

Comments

Fournogo commented Feb 13, 2025

Fournogo commented Feb 13, 2025 • edited Loading

Fournogo commented Feb 13, 2025 •

edited

Loading