Kokoro TTS Plugin for Obsidian

This plugin integrates the Kokoro TTS engine into Obsidian, providing high-quality text-to-speech with multiple voice options. It uses the lightweight Kokoro v0.19 model which offers natural-sounding speech in both American and British English.

I have only tested it in Windows 11, and it requires manual installation as it works with a locally hosted Kokoro TTS backend. The backend will use CUDA if available, or fallback to CPU.

NEW IN 1.3:

Added inline voice selection with ktts prefix
Added special voice settings for quoted and emphasized text
Improved quote handling:
- Support for all types of quotation marks (", ", ")
- Fixed processing of multiple quoted sections on the same line
- Better handling of quotes within sentences
Added voice selection for emphasized text (within asterisks)
- Text between * marks is now spoken without saying "asterisk"
- Can assign a different voice for emphasized text in settings
New Settings:
- Special text voices section for configuring distinct voices
- Toggle for enabling/disabling distinct voices feature
- Separate voice selection for quoted and emphasized text

Here is a demonstration of the inline generation. This particular audio generated at 250 characters per second, despite switching between every available voice.

Listen to the demo

This is the default narrator voice.

kttsbella"Hi, I'm Bella from the US!" kttsemma"And I'm Emma from the UK!"

kttssarah"Sarah here, also from the US." kttsisabella"Isabella speaking, from the UK!"

kttsadam"Adam here, representing the US male voices." kttsgeorge"George here, from the UK!"

kttsmichael"Michael speaking, another US voice." kttslewis"And Lewis, another UK voice!"

kttsnicole"Nicole here!" kttssky"And Sky, wrapping up our voice demo!"

*This text will use the emphasized voice setting*

"This quoted text will use the default quote voice setting"

kttsbella"You can also have multiple voices in one paragraph!" kttsemma"Indeed you can!" And back to the narrator.

NEW IN 1.2:

Force US/GB variant (very little impact)
Fixed audio chunk concatenation
Added message with generation stats (generation's total characters, chunks, time, and chunks/second)
Tested on CPU (50-100 character/second) and CUDA, Kokoro is astonishingly fast (>1500 character/second on 4090).

Features

Multiple voice options (Bella, Sarah, Adam, Michael, Emma, Isabella, George, Lewis, Nicole, Sky)
Support for both American and British English
Advanced voice control:
- Inline voice selection with ktts prefix
- Distinct voices for quoted and emphasized text
- Support for all types of quotation marks
- Emphasis markers (text) without speaking the markers
Text selection and full note reading
Audio file saving and auto-embedding
Configurable text processing (codeblocks, emphasis)
Simple interface with keyboard shortcuts and context menu

To do

Add the possibility of merging voices at inference
Add custom syntax options for saved audio file names

Prerequisites

Python Environment:

Note: The plugin requires NumPy < 2.0.0 for compatibility with PyTorch.
espeak-ng 1.51 (required for phonemization)
- Windows:
  - Download espeak-ng-1.51.msi from espeak-ng releases
  - Install the MSI package Note: Version 1.51 is required. Other versions may not work correctly.
Kokoro TTS model and voices
- Download model file (kokoro-v0_19.pth) and voice files from Kokoro-82M repository

Installation

Manual Installation

Create a kokoro-tts folder in your vault's .obsidian/plugins/ directory
Copy these files to the kokoro-tts folder:
- main.js
- manifest.json
- styles.css
- kokoro_backend.py
- requirements.txt

Install Python dependencies:

# If using conda (recommended):
conda activate kokoro-tts
pip install -r requirements.txt

# If using system Python:
cd YOUR_VAULT/.obsidian/plugins/kokoro-tts
pip install -r requirements.txt

Enable the plugin in Obsidian's Community Plugins settings

From Community Plugins (not available)

Configuration

Open Settings → Kokoro TTS
Configure the required paths:
- Python executable path (e.g., python or full path to Python)
- Model path: Full path to kokoro-v0_19.pth
- Voices path: Full path to the directory containing voice files
- Backend script path: Path to kokoro_backend.py (default is in plugin directory)
Configure optional settings:
- Voice selection
- Audio settings (auto-play, save, embed)
- Text processing options

File Locations

The plugin expects the following structure:

YOUR_VAULT/
├── .obsidian/
│   └── plugins/
│       └── kokoro-tts/
│           ├── main.js
│           ├── manifest.json
│           ├── styles.css
│           ├── kokoro_backend.py    # Python backend script
│           └── requirements.txt     # Python dependencies
└── tts-audio/                      # Generated audio files (if enabled)

Usage

Commands

The plugin adds three commands that you can use:

Read Highlighted Text: Reads the currently selected text
Read Whole Page: Reads the entire current note
Stop Speech: Stops the current speech playback

Access these commands through:

Command Palette (Ctrl/Cmd + P)
Custom hotkeys (configurable in Settings → Hotkeys)
Context menu (right-click on text)

Special Text Voices

The plugin can use different voices for:

Quoted text (any type of quotation marks)
Emphasized text (text between asterisks)

Configure these options in Settings → Kokoro TTS → Special text voices.

Inline Voice Selection

The plugin supports dynamic voice selection using inline codes:

Use ktts followed by a voice name before quoted text:

kttsbella"Hello everyone!" → Spoken by Bella
kttsemma"Lovely weather!" → Spoken by Emma
kttsadam"Hi there!" → Spoken by Adam

Available voice codes:

US Voices: bella, sarah, adam, michael, nicole, sky
GB Voices: emma, isabella, george, lewis

Notes:

Any unrecognized voice code will use the default selected voice
This feature works independently of the "Special text voices" settings
Voice codes are case-sensitive

Audio Files

If audio saving is enabled:

Files are saved in the same folder as the note being read
Files are named with the format: notename_timestamp_chunk.wav
Files can be automatically embedded in notes at the cursor position
Standard Obsidian audio player is used for playback
Use the "Stop Speech" command or button to stop playback

Troubleshooting

Use ctrl+shift+I to open the Obsidian developer console to identify errors.

Support

For plugin issues: GitHub Issues
For Kokoro TTS issues: Kokoro Discord

Credits

Kokoro TTS by hexgrad
StyleTTS2 architecture by Li et al.
espeak-ng for phonemization

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.codegpt		.codegpt
res		res
src		src
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc		.eslintrc
.gitignore		.gitignore
.npmrc		.npmrc
README.md		README.md
esbuild.config.mjs		esbuild.config.mjs
generation-stats.png		generation-stats.png
kokoro_backend.py		kokoro_backend.py
manifest.json		manifest.json
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
styles.css		styles.css
tsconfig.json		tsconfig.json
version-bump.mjs		version-bump.mjs
versions.json		versions.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kokoro TTS Plugin for Obsidian

Features

To do

Prerequisites

Installation

Manual Installation

From Community Plugins (not available)

Configuration

File Locations

Usage

Commands

Special Text Voices

Inline Voice Selection

Audio Files

Troubleshooting

Support

Credits

About

Releases 5

Packages

Languages

Mithadon/obsidian-kokoro-tts-plugin

Folders and files

Latest commit

History

Repository files navigation

Kokoro TTS Plugin for Obsidian

Features

To do

Prerequisites

Installation

Manual Installation

From Community Plugins (not available)

Configuration

File Locations

Usage

Commands

Special Text Voices

Inline Voice Selection

Audio Files

Troubleshooting

Support

Credits

About

Resources

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages