Support Web Speech API #661

zoollcar · 2024-10-17T02:01:51Z

New feature:

TODO:

Some languages have very few voices. pre-select will causing the voice list to become empty. Perhaps we should hide the pre-select option in these cases.
When switching the TTS engine, the Elecenlabs test voice doesn't stop when I switch to the Web Speech API.
The upstream project has a some broken json files, I'll fix and make a PR there

Some previous questions:

I noticed the module is modules/browser; are there better alternatives?

module has been moved to modules/browser/speech-synthesis

The Web Speech API test sentence is from the web-speech-recommended-voices project and contains a placeholder {name}.

{name }has been replaced to the name of voice

Using a Git submodule might not be the best solution.

copy en.json as a local file Languages.json. fetch voice list when select a language

vercel · 2024-10-17T02:01:55Z

@zoollcar is attempting to deploy a commit to the Enrico Pro Team on Vercel.

A member of the Team first needs to authorize it.

enricoros · 2024-10-17T04:18:41Z

Thanks for the feature. This patch is now in a state where I can review it and potentially merge it.

vercel · 2024-10-20T09:17:38Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
big-agi-open-next	✅ Ready (Inspect)	Visit Preview	Oct 20, 2024 9:20am

enricoros · 2024-10-20T09:37:38Z

Update: thanks for resubmitting the PR, this is definitely a higher quality code that considers the application (e.g. other modules).

I'm testing it on mobile and it's hanging a couple of times (I believe it to be a stability error with some changing react reference) and it's possibly something I can fix, but it's gonna require some time for me to check out and develop.

On the UX side, there could be some rough edges (on my android phone the High quality List doesn't do much, no matter what one chooses the experience doesn't change, and this happens for the 4 available voices as well). So there's something that I can look into to improve the UX. Why is key? Because every feature Big AGI gets the same scrutiny and UX perfection.

Thanks again, I'll follow up when I have time to check this out and review and change what needs to be changed. Let me know in the meantime if anything can improve on your side.

enricoros

Overall there could be some details (and a crash) to be ironed out. Going in the right direction and increasing in quality.

I wonder if the whole module should get a cleanup, meaning to bring also elevenlabs and webspeech under a same umbrella, e.g. /modules/tts/* or similar.

The module would benefit likely from having an abstraction (e.g. interface ISpeechSynthesis or similar) with the 2 Engines implementing the same interface. This way every call becomes more abstract and the caller doesn't need to know "if elevenlabs do this, otherwise do that".

In general is good, please take a look at the comments (only come are necessary).

enricoros · 2024-10-20T11:48:26Z

src/common/components/useVoiceCapabilities.ts

+  throw new Error('TTSEngine is not found');
+}
+
+export function useCapability(): CapabilitySpeechSynthesis {


Crash issue identified to this hook (the one that gave the black screen in the screenshot). Seems that when switching provider, there's a react out-of-order issue. Only when switching TTS providers I believe.

It's possible that to fix this properly, we may have to overhaul the ttsengine reactivity (hooks)

enricoros · 2024-10-20T11:54:55Z

src/apps/call/Telephone.tsx

        setPersonaTextInterim(text);
+
+        // Maintain and say the current sentence


I love this.

enricoros · 2024-10-20T11:56:10Z

src/apps/chat/store-app-chat.ts

@@ -51,6 +52,12 @@ interface AppChatStore {
  micTimeoutMs: number;
  setMicTimeoutMs: (micTimeoutMs: number) => void;

+  TTSEngine: string;


for now this could be: TTSEngine: 'elevenlabs' | 'webspeech', to force typescript to do its job.

enricoros · 2024-10-20T11:59:00Z

src/apps/chat/store-app-chat.ts

@@ -114,6 +121,12 @@ const useAppChatStore = create<AppChatStore>()(persist(
    micTimeoutMs: 2000,
    setMicTimeoutMs: (micTimeoutMs: number) => _set({ micTimeoutMs }),

+    TTSEngine: TTSEngineList[0],


if TTSEngine: 'elevenlabs' | 'webspeech', then this become one of the two values (probably 'WebSpeech' by default) -- then the conversion to a nice string can be done in the settings UI, and in the code we only match against those IDs.

As an alternative this could be left undefined, and the UI will decide what to use every time, unles the user makes a choice. undefined will default to 'webspeech'

enricoros · 2024-10-20T11:59:53Z

src/apps/chat/store-app-chat.ts

+    TTSEngine: TTSEngineList[0],
+    setTTSEngine: (TTSEngine: string) => _set({ TTSEngine }),
+
+    ASREngine: ASREngineList[0],


same, we could keep an undefined here, and hardcode a 'webspeech' as the ID - so we can fall back to that as autodetect

enricoros · 2024-10-20T12:10:59Z

src/modules/browser/speech-synthesis/BrowserSpeechSettings.tsx

+  React.useEffect(() => {
+    if (languageCode) {
+      const fetchFunction = async () => {
+        let res = await fetch(`https://raw.githubusercontent.com/HadrienGardeur/web-speech-recommended-voices/refs/heads/main/json/${languageCode}.json`);


Well done here.

enricoros · 2024-10-20T12:11:36Z

src/modules/browser/speech-synthesis/BrowserSpeechSettings.tsx

+      };
+      fetchFunction().catch((err) => {
+        console.log('Error getting voice list: ', err);
+        addSnackbar({ key: 'browser-speech-synthesis', message: 'Error getting voice list', type: 'issue' });


I got this message with some of the languages of the list. Strange because I thought the list will have all valid languages.

Upstream error, the language listed in json file does not have a corresponding file. so 404. I'll delete all invalid languages

enricoros · 2024-10-20T12:12:32Z

src/modules/browser/speech-synthesis/store-module-browser.tsx

+import { persist } from 'zustand/middleware';
+import { useShallow } from 'zustand/react/shallow';
+
+export type BrowsePageTransform = 'html' | 'text' | 'markdown';


This probably doesn't belong here. We have already a browser store (for the browsing capability) but it's different.

enricoros · 2024-10-20T12:13:23Z

src/modules/browser/speech-synthesis/store-module-browser.tsx

+export type BrowsePageTransform = 'html' | 'text' | 'markdown';
+
+interface BrowseState {
+


I'm wondering if the settings also of TTSEngine could be here, to keep everything together.

enricoros · 2024-10-20T12:15:47Z

src/modules/browser/speech-synthesis/useBrowserSpeechVoiceDropdown.tsx

+import { useBrowseVoiceId } from './store-module-browser';
+import { speakText, cancel } from './browser.speechSynthesis.client';
+
+function VoicesDropdown(props: {


I see this is a duplication of the ElevenLabs. probalby needed because of different logic.

…version2

zoollcar · 2024-10-24T14:20:18Z

I'll make a abstraction(under modules/tts/, ISpeechSynthesis). The current plan is to refer to the llms module.

The refactored version will be updated these days.

zoollcar · 2024-10-25T15:33:19Z

Basically done the abstraction. What is worth mentioning is a change of UI:

Engine selection change to drop-down box for more options and mobile compatibility.

enricoros · 2024-10-29T08:32:21Z

Hi @zoollcar - just FWI - I won't have the time to merge this before the official V2 launch. I can't disclose dates, but I'll be very busy for a while. If you have a clean patch that doesn't require any work from my side, I'll see what I can do - in the meantime enjoy the fact that you're the only person with a custom big-AGI that supports multiple TTR/ASR engines.

michieal · 2024-12-27T20:54:20Z

Okay, so... from my initial tests, this PR is spot on!
I did notice though, that it doesn't work in offline mode. This raises a couple of questions for me; not a bad thing mind you, just curiosity.
First - is the output text sent out to become speech?
Second - is there a way to have it download the voice files and process it locally? I am asking about this, as one of the reasons to use a localized AI is that people don't want everything going out across the internet. So, on that thought, maybe a notice in the Voice area stating that it requires an active connection would be a good idea.

All in all - quite impressed with this! It has everything that I put into the Feature Request that I made (#703) so, you rock! :)

michieal · 2024-12-30T04:46:21Z

So, a couple days in of testing, and here's what I have found:

So far, no direct crashes from the voices that I can tell. The first thing I noticed was that the console window was flooded with [DEV] AIX: OpenAI-dispatch missing completion tokens in usage { usage: { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 } } lines.
Next thing that I noticed was that the Speak option's "First Paragraph" only speaks X characters and will stop at the first period that it finds, so... only a sentence at best.
Then... I don't know if it is related to this PR's actual code, or old V2 code, but Ollama bindings doesn't work... and after 2 days of testing, LocalAI seems to stop working too. I say this, because models started failing with a "[Streaming Issue] Localai: terminated" ... but when I check LocalAI via it's web interface, it does indeed work, even right after this version of Big AGI fails.

Also, a request: Can the asterisks * be stripped out before sending the generated text to the speech synthesis? It gets really annoying when you have it speak the full text, and it's doing headings and italicized text. You end up hearing "asterisk asterisk word asterisk asterisk asterisk italicized text asterisk asterisk asterisk word asterisk asterisk asterisk text asterisk" -- I asked for an outline of a basic task, and I heard the word "asterisk" so much I can hear it still.

Flubs: I did notice that occasionally, it will say the Eleven Labs voice sample sentence if one has that set up. And, once or twice, it started to speak that and then stopped.

Aside from the above, everything looks great. Still working on getting a local AI to work again with Big AGI after the one issue above.

EDIT: First edit was to correct the error message. Second edit: I had to clear the cookies and stored data to restore functionality. I backed up the conversations, and then started a new conversation to get this version of Big AGI to work with LocalAI.

enricoros · 2024-12-30T19:06:29Z

@michieal thanks for reviewing and testing this. It's a great change that shall come in. Quick replies:

I've pushed 7626b48 which shall remove the flooding of "[DEV] AIX: OpenAI-dispatch missing completion...". This was probably because the local models didn't return the correct completion tokens (0 when not zero). I've silenced the message and accept it to be 0
Do you have more details on the localAI issue
You're correct on ollama not working because this PR is not on the tip of v2-dev but earlier
Great point on asterisks stripping - there should be some de-markdown'ification of the text before being sent to any TTS. Will try to remember this

Huge thanks for @zoollcar for this PR, as @michieal says it's tested and solid. Let us know if there are recent changes.

michieal · 2025-01-01T00:49:11Z

@michieal thanks for reviewing and testing this. It's a great change that shall come in. Quick replies:

I've pushed 7626b48 which shall remove the flooding of "[DEV] AIX: OpenAI-dispatch missing completion...". This was probably because the local models didn't return the correct completion tokens (0 when not zero). I've silenced the message and accept it to be 0

Thank you!! big help!

Do you have more details on the localAI issue

Apparently, it's a LocalAI issue with its backends on the latest release. I routed the output of it to a log file and then matched up timestamps. Double checked with the V2-dev version that I have that worked, and the issue is still there... so yeah, not Big-AGI's fault. (It's a GRPC issue... "failed to load backend".)

Happy to help out with this. Sorry that I am not able to help out on the coding side. :)

TTS for version2

c78694f

enricoros self-requested a review October 17, 2024 04:18

vercel bot deployed to Preview October 20, 2024 09:20 View deployment

enricoros requested changes Oct 20, 2024

View reviewed changes

zoollcar added 2 commits October 24, 2024 21:44

Add Types, Remove invalid languages

0d5f661

Merge remote-tracking branch 'origin/big-agi-2' into Browser-TTS-for-…

5e98c30

…version2

Abstract TTS module

a538cc1

enricoros mentioned this pull request Dec 20, 2024

Feature Request - Voices addition. #703

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Web Speech API #661

Support Web Speech API #661

zoollcar commented Oct 17, 2024

vercel bot commented Oct 17, 2024

enricoros commented Oct 17, 2024

vercel bot commented Oct 20, 2024 •

edited

Loading

enricoros commented Oct 20, 2024

enricoros left a comment

enricoros Oct 20, 2024

enricoros Oct 20, 2024

enricoros Oct 20, 2024

enricoros Oct 20, 2024

enricoros Oct 20, 2024

enricoros Oct 20, 2024

enricoros Oct 20, 2024

enricoros Oct 20, 2024

enricoros Oct 20, 2024

zoollcar Oct 24, 2024

enricoros Oct 20, 2024

enricoros Oct 20, 2024

enricoros Oct 20, 2024

zoollcar commented Oct 24, 2024

zoollcar commented Oct 25, 2024

enricoros commented Oct 29, 2024

michieal commented Dec 27, 2024

michieal commented Dec 30, 2024 •

edited

Loading

enricoros commented Dec 30, 2024

michieal commented Jan 1, 2025

		setPersonaTextInterim(text);

		// Maintain and say the current sentence

		export type BrowsePageTransform = 'html' \| 'text' \| 'markdown';

		interface BrowseState {

Support Web Speech API #661

Are you sure you want to change the base?

Support Web Speech API #661

Conversation

zoollcar commented Oct 17, 2024

TODO:

Some previous questions:

vercel bot commented Oct 17, 2024

enricoros commented Oct 17, 2024

vercel bot commented Oct 20, 2024 • edited Loading

enricoros commented Oct 20, 2024

enricoros left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zoollcar commented Oct 24, 2024

zoollcar commented Oct 25, 2024

enricoros commented Oct 29, 2024

michieal commented Dec 27, 2024

michieal commented Dec 30, 2024 • edited Loading

enricoros commented Dec 30, 2024

michieal commented Jan 1, 2025

vercel bot commented Oct 20, 2024 •

edited

Loading

michieal commented Dec 30, 2024 •

edited

Loading