-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add finetuning support for Speaker
class
#679
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, I don't quite see the advantage and motivation to encode the speaker_embeds tensor to string.
People can still use the tensor but the .pt file can easily contains virus. The str encoding is equal to the numpy array dump but is easy to copy&paste.
Also, you can save the str to a txt file (with the ext .pt if you like) and use it just like a .pt file if you like. |
"use it just like a .pt file if you like" -> You mean we could directly read the tensor via |
Just treat the str as a kind of binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
To allow tuning for
speaker_embeds
tensor. The PR includes:_sample_random -> sample_random_tensor
to allow usage out of class.spk_emb
argument inSpeaker.apply()
changesstr -> Union[str, torch.Tensor]
inplace
argument to support gradient backward when setting toFalse
torch.Tensor
Besides, I don't quite see the advantage and motivation to encode the
speaker_embeds
tensor to string. To me, it only makes things more complex. I would personally prefer cputorch.Tensor
ornp.ndarray
cc @fumiama