Voice Assistant

Voice Assistant provides basic functionality to enable the creation of a voice activated software agent. An incoming stream of audio is chunked according to boundaries defined by a VAD (voice activity detector). When a complete chunk is found, it is sent to a voice recognition API, and the resulting text is used to determine what action, if any, should be taken

Voice Activity Detector

Google's venerable open source package WebRTC VAD is used, and 30ms chunks of audio are sent at 16000hz At some point it would be nice to upgrade to a more modern system.

Voice Recognition API

OpenAI's Whisper model is used to recognize incoming audio and transform it to text. A simple wrapper API is provided, but other API's could be used instead.

LLM Interface

Audio assistant supports open ended question answering by integrating with an LLM of your choice using a custom streaming client for the Ollama API

Audio Processing

In order to deal with audio in a streaming fashion, without the need to read all the samples into memory, we use the conduit-audio module to splice, resample and re-chunk.

Desired actions

Building

Linux

The followin packages are required for building and running:

zlib
pulseaudioFull
libsndfile
libsamplerate
lame
libao
pcre

If you have NixOS you can use the provided default.nix

OS X

brew install zlib pulseaudio libsndfile libsamplerate libao pcre lame sox
export CPATH=/opt/homebrew/include
export LIBRARY_PATH=/opt/homebrew/lib
cabal build

GHC

Tested with GHC 9.2.5, Cabal 3.6.2.0

Running

cabal run vad-audio -- --debug

Planned Improvements

I'd like to match the utterances with the actions using cosine similarity of sentence BERT embeddings rather than REGEX matches. Before that can happen, I need a few new pieces of functionality:

Calculate BERT embeddings for both source and target text
Rank by similarity
Perform dependency parsing probably with spaCy
Extract relevant parameters from dependency parse to feed to target functions

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
app		app
lib		lib
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Makefile		Makefile
README.md		README.md
cabal.project.freeze		cabal.project.freeze
default.nix		default.nix
test.json		test.json
vad-audio.cabal		vad-audio.cabal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Assistant

Voice Activity Detector

Voice Recognition API

LLM Interface

Audio Processing

Desired actions

Building

Linux

OS X

GHC

Running

Planned Improvements

About

Releases 1

Packages

Languages

ludflu/audio-assistant

Folders and files

Latest commit

History

Repository files navigation

Voice Assistant

Voice Activity Detector

Voice Recognition API

LLM Interface

Audio Processing

Desired actions

Building

Linux

OS X

GHC

Running

Planned Improvements

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages