Add finetuning support for `Speaker` class #679

ain-soph · 2024-08-10T18:29:56Z

To allow tuning for speaker_embeds tensor. The PR includes:

Rename: _sample_random -> sample_random_tensor to allow usage out of class.
The spk_emb argument in Speaker.apply() changes str -> Union[str, torch.Tensor]
Add inplace argument to support gradient backward when setting to False
Update return type to return torch.Tensor

Besides, I don't quite see the advantage and motivation to encode the speaker_embeds tensor to string. To me, it only makes things more complex. I would personally prefer cpu torch.Tensor or np.ndarray

cc @fumiama

fumiama

Besides, I don't quite see the advantage and motivation to encode the speaker_embeds tensor to string.

People can still use the tensor but the .pt file can easily contains virus. The str encoding is equal to the numpy array dump but is easy to copy&paste.

ChatTTS/model/speaker.py

fumiama · 2024-08-11T06:51:01Z

Also, you can save the str to a txt file (with the ext .pt if you like) and use it just like a .pt file if you like.

ain-soph · 2024-08-11T06:54:03Z

Also, you can save the str to a txt file (with the ext .pt if you like) and use it just like a .pt file if you like.

"use it just like a .pt file if you like" -> You mean we could directly read the tensor via torch.load() instead of using pybase16384 to encode/decode? I didn't know that.

fumiama · 2024-08-11T06:55:14Z

Also, you can save the str to a txt file (with the ext .pt if you like) and use it just like a .pt file if you like.

"use it just like a .pt file if you like" -> You mean we could directly read the tensor via torch.load() instead of using pybase16384 to encode/decode? I didn't know that.

Just treat the str as a kind of binary.

fumiama

Thanks!

ain-soph added 3 commits August 10, 2024 14:25

Update speaker class

a7fb442

add inplace argument

677722a

update return type

9302a8a

fumiama requested changes Aug 11, 2024

View reviewed changes

ChatTTS/model/speaker.py Outdated Show resolved Hide resolved

fumiama added the enhancement New feature or request label Aug 11, 2024

revert the naming

f8f3f22

ain-soph requested a review from fumiama August 11, 2024 15:11

fumiama approved these changes Aug 14, 2024

View reviewed changes

fumiama merged commit d93ed8d into 2noise:dev Aug 14, 2024
5 checks passed

ain-soph deleted the speaker-support-finetuning branch August 14, 2024 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add finetuning support for `Speaker` class #679

Add finetuning support for `Speaker` class #679

ain-soph commented Aug 10, 2024 •

edited

Loading

fumiama left a comment

fumiama commented Aug 11, 2024

ain-soph commented Aug 11, 2024 •

edited

Loading

fumiama commented Aug 11, 2024

fumiama left a comment

Add finetuning support for Speaker class #679

Add finetuning support for Speaker class #679

Conversation

ain-soph commented Aug 10, 2024 • edited Loading

fumiama left a comment

Choose a reason for hiding this comment

fumiama commented Aug 11, 2024

ain-soph commented Aug 11, 2024 • edited Loading

fumiama commented Aug 11, 2024

fumiama left a comment

Choose a reason for hiding this comment

Add finetuning support for `Speaker` class #679

Add finetuning support for `Speaker` class #679

ain-soph commented Aug 10, 2024 •

edited

Loading

ain-soph commented Aug 11, 2024 •

edited

Loading