Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example usage for SpeechT5ForSpeechToText #438

Merged
merged 1 commit into from
Dec 6, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions src/models.js
Original file line number Diff line number Diff line change
Expand Up @@ -3859,6 +3859,43 @@ export class SpeechT5Model extends SpeechT5PreTrainedModel { };

/**
* SpeechT5 Model with a speech encoder and a text decoder.
*
* **Example:** Generate speech from text with `SpeechT5ForSpeechToText`.
* ```javascript
* import { AutoTokenizer, AutoProcessor, SpeechT5ForTextToSpeech, SpeechT5HifiGan, Tensor } from '@xenova/transformers';
*
* // Load the tokenizer and processor
* const tokenizer = await AutoTokenizer.from_pretrained('Xenova/speecht5_tts');
* const processor = await AutoProcessor.from_pretrained('Xenova/speecht5_tts');
*
* // Load the models
* // NOTE: We use the unquantized versions as they are more accurate
* const model = await SpeechT5ForTextToSpeech.from_pretrained('Xenova/speecht5_tts', { quantized: false });
* const vocoder = await SpeechT5HifiGan.from_pretrained('Xenova/speecht5_hifigan', { quantized: false });
*
* // Load speaker embeddings from URL
* const speaker_embeddings_data = new Float32Array(
* await (await fetch('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin')).arrayBuffer()
* );
* const speaker_embeddings = new Tensor(
* 'float32',
* speaker_embeddings_data,
* [1, speaker_embeddings_data.length]
* )
*
* // Run tokenization
* const { input_ids } = tokenizer('Hello, my dog is cute');
*
* // Generate waveform
* const { waveform } = await model.generate_speech(input_ids, speaker_embeddings, { vocoder });
* console.log(waveform)
* // Tensor {
* // dims: [ 26112 ],
* // type: 'float32',
* // size: 26112,
* // data: Float32Array(26112) [ -0.00043630177970044315, -0.00018082228780258447, ... ],
* // }
* ```
*/
export class SpeechT5ForSpeechToText extends SpeechT5PreTrainedModel { }

Expand Down Expand Up @@ -3984,6 +4021,8 @@ export class SpeechT5ForTextToSpeech extends SpeechT5PreTrainedModel {

/**
* HiFi-GAN vocoder.
*
* See [SpeechT5ForSpeechToText](./models#module_models.SpeechT5ForSpeechToText) for example usage.
*/
export class SpeechT5HifiGan extends PreTrainedModel {
main_input_name = 'spectrogram';
Expand Down
Loading