-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for Kokoro via Replicate #1153
Conversation
@@ -22,6 +22,7 @@ export const REPLICATE_SUPPORTED_MODEL_IDS: ProviderMapping<ReplicateId> = { | |||
}, | |||
"text-to-speech": { | |||
"OuteAI/OuteTTS-0.3-500M": "jbilcke/oute-tts:39a59319327b27327fa3095149c5a746e7f2aee18c75055c3368237a6503cd26", | |||
"hexgrad/Kokoro-82M": "jaaari/kokoro-82m:dfdf537ba482b029e0a761699e6f55e9162cfd159270bfe0e44857caa5f275a6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add a test but you'll see, you'll need to change the param name
I was meaning to ask @jbilcke-hf to update the param name in jbilcke/oute-tts to text instead of inputs
Not sure if this will work, there is an input payload schema mismatch between text-to-speech models in Replicate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was too easy to be true, yeah will update the cog ID to a different to match the spec!
yes, imo you can remove the oute-tts one for now (but keep the test as skipped and ask if @jbilcke-hf can update his replicate model in the future) and change the param name to |
https://replicate.com/jbilcke/oute-tts was not Warm anyways so not great UX (a bit slow to start) |
I'll take care of doing the equivalent Python side. Also, it seems to me that we have more text-to-speech models using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, modulo my comment
const res = await client.textToSpeech({ | ||
model: "hexgrad/Kokoro-82M", | ||
provider: "replicate", | ||
text: "Kokoro is a frontier TTS model for its size of 1 Billion parameters", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it accept parameters?
Would be nice to add some to make sure they payload has the proper shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but it does not accept any of the parameters listed in TextToSpeechParameters
(see the schema of the model here). Since each model on Replicate runs in its own docker container, models of the same task can have different parameters (and even more problematic: different input/output format). In huggingface_hub, we added an argument extra_body
to allow users to pass any provider- or model-specific parameters.
Co-authored-by: Simon Brandeis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll merge this now, to be able to test it on the Hub 🔥
oops, i did not add the VCR tape => #1171 |
No description provided.