Skip to content

Commit

Permalink
TTS - Google Cloud API (#512)
Browse files Browse the repository at this point in the history
  • Loading branch information
TianTan2024 authored Mar 17, 2024
2 parents e6fa639 + 8f67ab1 commit f330880
Show file tree
Hide file tree
Showing 4 changed files with 119 additions and 1 deletion.
120 changes: 119 additions & 1 deletion Topics/Software_Engineering/Text_to_Speech.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,16 +127,134 @@ By using this code, the audio will be able to be played before the full file has

### Google Cloud TTS:

Google Cloud Text-to-Speech API is a powerful tool offered by Google Cloud Platform for converting text into natural-sounding speech. It utilizes advanced machine learning techniques to generate high-quality audio output, allowing developers to integrate speech synthesis capabilities into their applications with ease.


## Comparison Between the Three Models
*Price*: Based on the number of characters, $4 / 1M characters for Standard voice, will be more expensive depending on Feature. First 4 million characters is free for Standard voice each month.

*Voice choices*: Support most of languages. Only default voiceline, but can upgrage for other voicelines.

*Supported output formats*: MP3, Linear16, OGG Opus, and a number of other audio formats.

*Key features*: Custom voices, Long audio synthesis, Text and SSML support, Pitch tuning

**Set-up**

- Installing the Cloud Client Libraries for Python:
To install the package for an individual API like Cloud Storage, use a command similar to the following:
```bash
pip install --upgrade google-cloud-storage
```

- Install the gcloud CLI [here](https://cloud.google.com/sdk/docs/install).

**API aquirement**
Before you can begin using Text-to-Speech, you must enable the API in the Google Cloud Platform Console.

- Make sure billing is enabled for Text-to-Speech:
- A Google Cloud Platform (GCP) account. If you don’t have one, sign up for a free trial here.A Google Cloud Platform (GCP) account. If you don’t have one, sign up for a free trial [here](https://cloud.google.com/free?hl=en).

- Enable Text-to-Speech on a project:
- Sign in to [Google Cloud console](https://console.cloud.google.com/?_ga=2.129120079.235760447.1710698738-1413217027.1710698726&_gl=1*kikz2f*_up*MQ..&gclid=Cj0KCQjwhtWvBhD9ARIsAOP0GojMbBDoPoxjjjBzHoMMO_J0Q0Px3S3uaXCqBS0b4HzfBOtlK0klpAQaAkwMEALw_wcB&gclsrc=aw.ds) and Go to the [project selector page](https://console.cloud.google.com/projectselector2/home/dashboard?_ga=2.128728655.235760447.1710698738-1413217027.1710698726&_gl=1*195xabo*_up*MQ..&gclid=Cj0KCQjwhtWvBhD9ARIsAOP0GojMbBDoPoxjjjBzHoMMO_J0Q0Px3S3uaXCqBS0b4HzfBOtlK0klpAQaAkwMEALw_wcB&gclsrc=aw.ds)

- Once you have selected a project and linked it to a billing account, you can enable the Text-to-Speech API. Go to the **Search products and resources** bar at the top of the page and type in "speech". ![search products and resources](Text_to_Speech_CloudAPI.png)
Select the **Cloud Text-to-Speech API** from the list of results.![Cloud Text-to-Speech API](Text_to_Speech_CloudAPI_2.png)

- To try Text-to-Speech without linking it to your project, choose the **TRY THIS API** option. To enable the Text-to-Speech API for use with your project, click **ENABLE**.![Product details](Text_to_Speech_CloudAPI_3.png)

**Quickstart**
Text-to-Speech supports programmatic access. You can access the API in 2 ways: Clinet libraries and REST

- Clinet libraries:
Install the client library:
```bash
pip install --upgrade google-cloud-texttospeech
```

- To use the client library, you must first create a `TextToSpeechClient` object.
```python
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(text="Hello, World!")
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
```

-REST:
It is suggested to call this serevice with Google-providede client libraries. However, if you nned to use your own libraries to call this service, following information will help you make the API requests.



The service endpoint(base URL) for this API service is https://texttospeech.googleapis.com


A Discovery Document serves as a machine-readable blueprint detailing and facilitating the utilization of REST APIs. Its purpose lies in enabling the construction of client libraries, IDE plugins, and various tools that engage with Google APIs. Cloud Text-to-Speech API service provides the following Discovery Documents : [v1](https://texttospeech.googleapis.com/$discovery/rest?version=v1) and [v1beta1](https://texttospeech.googleapis.com/$discovery/rest?version=v1beta1).

Here is one example of text.sythesize:
```http
POST https://texttospeech.googleapis.com/v1/text:synthesize
```
Request Body:
```JSON
{
"input": {
object (SynthesisInput)
},
"voice": {
object (VoiceSelectionParams)
},
"audioConfig": {
object (AudioConfig)
}
}
```

Response body:

```JSON
{
"audioContent": string
}
```


## Comparison Between the Three Models

In summary, the choice between these TTS APIs depends on factors such as the level of customization needed, pricing considerations, ease of integration, and the specific requirements of your project or application. Google Cloud Text-to-Speech API and gTTS are suitable for general-purpose TTS tasks, while OpenAI's TTS models offer advanced capabilities and natural-sounding speech synthesis as the cost is much higher. Furthermore, while Google Cloud Text-to-Speech API and gTTS offer relatively straightforward pricing models, gTTS may be simpler to use for basic text-to-speech tasks. Ultimately, the choice depends on the specific requirements and preferences of the project.
## Reference
* [Text to Speech Explained](https://speechify.com/blog/text-to-speech-explained-a-comprehensive-guide/)
* [gTTS](https://pypi.org/project/gTTS/)
* [OpenAI-TTS](https://platform.openai.com/docs/guides/text-to-speech/)
* [Text-to-Speech documentation](https://cloud.google.com/text-to-speech/docs)
* [Speech-synthesis](https://www.w3.org/TR/speech-synthesis/)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f330880

Please sign in to comment.