Simplify ffdec.py by using Popen.communicate() method #113

Bomme · 2021-11-30T10:35:58Z

Hi, thanks for this very useful library!
I was looking into ffdec.py since I need faster loading of mp3 and m4a files.
I believe that the module could be improved and simplified by using the Popen.communicate() method. This seems to be the recommended way of retrieving output from a subprocess.

The current implementation only allows to read the data in blocks which is suboptimal since a user might not be able to adapt the block size. (E.g. librosa just calls audio_open() which has no way of setting a block size.)

I did a speed comparison that shows that this way of reading data is slower than it needs to be, especially for large files:
https://gist.github.com/Bomme/d9aee452c8c1e68fb5fac743df6b2a07

If you decide to drop Python 2 support (#112) the timeout handling might be easier.
And for later versions of Python 3 the https://docs.python.org/3/library/subprocess.html#windows-popen-helpers might come in handy.

The text was updated successfully, but these errors were encountered:

sampsyo · 2021-11-30T11:44:12Z

Hi! It could be worth looking into! But there are a few reasons why ffdec works this way now:

Streaming. The interface lets clients process the audio while reading it in, rather than loading the entire file into memory first. This is important for handling large files without allocating too much memory.
As a kind of corollary, we need to read simultaneously from the stdout and stderr streams. Otherwise, the process could fill up its stderr buffer while we're waiting for data from the stdout buffer, and the interaction would deadlock.

I believe Popen.communicate() only works in a synchronous/blocking style that reads everything until EOF. So I'm not sure we can use that, but maybe there's something else useful in modern subprocess we could rely on?

Bomme · 2021-12-01T10:21:38Z

Hi! Thanks for the explanation.
Since the queue is infinite, the entire file could get read in the separate thread before the client code gets to call read_data. That's why I assumed it would be a simplification to make this explicit, i.e. loading the file into memory.
Shouldn't the queue size be limited to support real streaming?

Bomme · 2021-12-01T10:33:22Z

Since my main goal was to speed up the file loading and you plan to release a new version soon, what do you think about raising the block_size? Maybe by setting it to io.DEFAULT_BUFFER_SIZE as a (still) conservative size.

sampsyo · 2021-12-01T12:29:27Z

Ah, yeah, that's a good point. We probably should be limiting the queue size! It seems tricky to get exactly right, but probably worth a shot nonetheless.

Raising the block size seems like a great idea! Would you mind putting together a small PR for that?

Bomme mentioned this issue Dec 1, 2021

set default block size in FFmpegAudioFile to io.DEFAULT_BUFFER_SIZE #116

Merged

mightbecharles mentioned this issue Nov 15, 2022

ffdec hangs on wait #130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify ffdec.py by using Popen.communicate() method #113

Simplify ffdec.py by using Popen.communicate() method #113

Bomme commented Nov 30, 2021

sampsyo commented Nov 30, 2021

Bomme commented Dec 1, 2021

Bomme commented Dec 1, 2021

sampsyo commented Dec 1, 2021

Simplify ffdec.py by using Popen.communicate() method #113

Simplify ffdec.py by using Popen.communicate() method #113

Comments

Bomme commented Nov 30, 2021

sampsyo commented Nov 30, 2021

Bomme commented Dec 1, 2021

Bomme commented Dec 1, 2021

sampsyo commented Dec 1, 2021