Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In a typical voice sample, you can see that a significant portion of the speech does not consist of any activity, so in typical speech applications, the regions that don't have much happening are instead removed, as this results in us extracting the MFCCs of silence which aren't very helpful in most situations.
I've modified your code slightly to allow for Voice Activity Detection, it's default behaviour is still intact, but if someone wishes to implement a Voice Activity detector function they have the template, documentation and a simple threshold to play with, as well as an example showing simple applications.
The code allows for passing of the frames and the entire signal, which should be flexible enough for anyone to write their own versions depending on their purpose. I considered using the frame power provided as the first MFCC, but decided that this was overall more flexible, and allowed comparison to the entire signal at once.
This is a modification I made for my thesis in which I used your code to extract the MFCCs from a bunch of files, and I thought other people may find this handy too.