-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transcoding hangs under load #158
Comments
Also same thing happened on Genesis box with one 1080ti trying to transcode 50 streams of |
Thanks for the report, couple notes:
|
In the provided link I can't find any problems reported, can you point me to what they're reporting?
Will do.
For me it hanged with eight GPUs, two streams per GPU, four (P144p30fps16x9;P360p30fps16x9;P576p30fps16x9;P720p30fps16x9) renditions per stream. |
@darkdarkdragon See the note on "Exceeding VRAM" :
So just 8 encodes per GPU? Curious, is there a pattern to which streams complete, and which remain incomplete? Does it make a difference if the streams themselves are staggered a bit, eg started ~500ms-1sec after another? |
I thought it was something specific to their Plex Client. And they talking about new transcodes, in our case transcode hangs half-way through.
How can I distinguish between streams? |
I usually have a file naming convention. Something like:
Ah, interesting. Is this consistently reproducible? Does it stop working at the same place each time? Do more streams make the issue worse? Anything in the logs? |
I'm using random names. Just took piece of code from our
It hangs probably in 90% cases - once same test with 10 streams passed, and hanged only on next iteration, but most of the times it hangs. |
example results from one run:
|
@j0sh can I enable logging inside ffmplib we're using? |
For the benchmarking, wouldn't it be useful to have a more consistent naming scheme to help diagnose issues like these?
Either:
|
@j0sh and it is not VRAM exceed problem - I've just looked at the VRAM consumption - it was using around 7M memory on the card with 16M. |
Just 7MB ? Is that right? Also, how is the measuring done? From the plex issue, it seems to have more to do with throughput, rather than the amount of memory in use. Anyway, agreed that it's hard to really say if it's the same issue we're having, but it seemed like an interesting and somewhat similar data point. |
Just
|
Traced it down to this line: It enters
For the next thing I'll try Nvidia's beta driver. |
Possibly related: cuda-memcheck is indicating issues within If the beta driver doesn't help, it would be interesting to see if the hang is also reproducible with ffmpeg itself.
This also occurs with ffmpeg:
|
While doing throughput testing hit situation where transcoding hangs (reproduced three times).
It was transcoding one 10 minutes generated video to seven 720 renditions in ten streams simultaneously. Looking at output files, it completely transcoded most of the files, except some.
nvidia-smi
shows thatLivepeer
process still uses GPU, but output files (one not completed) not growing in size.There is stopped GCE instance on which I've saw this, I've left it intact.
To reproduce:
Ivan-gpu-p100
instance/home/dark/go-livepeer
bench.sh
/disk-1-temp
directory - there will be 70 output files, most complete (same size), some smaller. and that smaller files not growing in size.The text was updated successfully, but these errors were encountered: