[Feature Request] Support Speaker Diarization #1039

uniqueness-ae · 2024-10-12T12:37:36Z

Implement speaker diarization for the existing mlx whisper support to:

Enhance transcription accuracy in multi-speaker conversations
Distinguish between different speakers in the output
Improve overall usability of the transcription feature

This addition will provide more insightful and structured transcripts, making it easier to analyze and understand complex audio content. Thanks

Hoohm · 2024-10-20T18:22:43Z

Would love to see this as well.
I can help out in making the feature but I need some pointers as to how it would be possible.

uniqueness-ae · 2024-10-26T15:02:10Z

I tried pyannote.audio model using rented cloud GPUs and had some success. Perhaps if there is a way to run this mlx, it will probably run faster. Maybe even better if it’s coupled with whisper to simplify the process. There is a repo from m-bain called WhisperX that does this. Could help as a reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support Speaker Diarization #1039

[Feature Request] Support Speaker Diarization #1039

uniqueness-ae commented Oct 12, 2024

Hoohm commented Oct 20, 2024

uniqueness-ae commented Oct 26, 2024

[Feature Request] Support Speaker Diarization #1039

[Feature Request] Support Speaker Diarization #1039

Comments

uniqueness-ae commented Oct 12, 2024

Hoohm commented Oct 20, 2024

uniqueness-ae commented Oct 26, 2024