Alemon (also known as: アレモン, 柠檬君) DiffSinger is a high-quality AI singing voice synthesizing model based on DiffSinger. This model uses OpenUtau as renderer.
About DiffSinger:
https://github.com/openvpi/DiffSinger
About OpenUtau:
https://github.com/stakira/OpenUtau
- Before installation, please ensure that the lastest NSF-HiFiGAN vocoder for DiffSinger is already installed in OpenUtau. About this vocoder: https://github.com/openvpi/vocoders
- Download the .zip files in the Release page of this repository.
- Drag the .zip file into the main UI window of OpenUtau, and finish installation according to prompts.
- In OpenUtau's multitrack view, select the installed Alemon DiffSinger model as the active singer.
- Select DIFFS as the active phonemizer. Alemon DiffSinger uses a custom dictionary that comes with the model package, which is only supported by DIFFS.
- Select DiffSinger as the active engine.
Click on the selected track to create a new part. Double-click on the part on the selected track to open the piano roll interface.
The way to use DiffSinger model in OpenUtau is similar to using UTAU voicebank in OpenUtau. For detailed guide, please refer to OpenUtau Wiki.
Expression (parameters) panel is located at the bottom of the piano roll window. Switch active expressions and edit expression settings in the lower left corner. Between the piano roll window and the expressions panel is the phoneme panel, where you can drag to edit honeme length, start position, and edit phoneme names.
Alemon DiffSinger 2.1 and later versions support support the following curve parameters:
Name | Abbrv. | Description | Range | Default |
---|---|---|---|---|
dynamics(curve) | DYN | Volume control. The range corresponds to -24dB~12dB. | -240~120 | 0 |
velocity(curve) | VELC | Velocity control. | 0~200 | 100 |
pitch deviation(curve) | PIT | Pitch deviation. Can be drawn directly in the piano roll window using Pitch Tool | -1200~1200 | 0 |
gender(curve) | GENC | Gender parameter control. | -100~100 | 0 |
breathiness(curve) | BREC | Breathiness control. | -100~100 | 0 |
tension(curve) | TENC | Tension. Not compatible with VOIC in the same model package. Please refer to the model file name. | -100~100 | 0 |
voicing(curve) | VOIC | Voicing. Not compatible with TENC in the same model package. Please refer to the model file name. | 0~200(The default maximum range is 100, you can click on the gear in the lower left corner to set custom range) | 100 |
Alemon DiffSinger 2.1 and later versions support automatic pitch. Select the notes to be rendered, then, in the Batch Edits menu, select Notes->Load rendered pitch to render automatic pitch.
DiffSinger's automatic pitch rendering is based on a per sentence basis. Sentences are note segments with continuously connected phonemes. If you only select some notes in a sentence for automatic pitch rendering, the pitch of the entire sentence will be re-rendered.
In Alemon DiffSinger 2.1 and later versions, the Native versions can synthesize Mandarin Chinese and English, while the Multi versions can synthesize Mandarin Chinese, English, Japanese and French. Please refer to the model file name for the applicable languages.
The default language is Mandarin Chinese.
Recommended lyrics input method:
- Chinese: Pinyin (recommended), Chinese characters
- English: English spelling. Note: When English and Pinyin spelling conflict, Pinyin takes precedence. Use EN+English spelling to call out English phonemes. (Example: sing, ENsong)
In versions 2.1 and later, some English words support multiple pronunciations, which can be called up through the (1) suffix. (Example: record(1))
- Japanese (not supported in some versions): Hiragana
- French (not supported in some versions): FR+French spelling (Example: FRchanter)
Phonemes of all languages coexist in a fusion dictionary, which can be directly accessed by entering phonemes in "[]". (Example: [s IH NG])
Mandarin Chinese
Phoneme label | Language | Pinyin |
---|---|---|
E | zh | Vowel part of "ye" |
En | zh | Vowel part of "yan" |
a | zh | a |
ai | zh | ai |
an | zh | an |
ang | zh | ang |
ao | zh | ao |
e | zh | e |
ei | zh | ei |
en | zh | en |
eng | zh | eng |
er | zh | er |
i | zh | i |
i0 | zh | i0 |
ia | zh | ia |
ian | zh | ian |
iang | zh | iang |
iao | zh | iao |
ie | zh | ie |
in | in | in |
ing | zh | ing |
iong | zh | iong |
ir | zh | ir |
iu | zh | iu |
o | zh | o |
ong | zh | ong |
ou | zh | ou |
u | zh | u |
ua | zh | ua |
uai | zh | uai |
uan | zh | uan |
uang | zh | uang |
ui | zh | ui |
un | zh | un |
uo | zh | uo |
v | zh | ü |
van | zh | üan |
ve | zh | üe |
vn | zh | ün |
b | zh | b |
c | zh | c |
ch | zh | ch |
d | zh | d |
f | zh/ja/en/fr | f |
g | zh | g |
h | zh | h |
j | zh | j |
k | zh | k |
l | zh | l |
m | zh/ja/en/fr | m |
n | zh | n |
p | zh | p |
q | zh | q |
r | zh | r |
s | zh/ja/en/fr | s |
sh | zh | sh |
t | zh | t |
w | zh | w |
x | zh | x |
y | zh | y |
z | zh | z |
zh | zh | zh |
English (GenAm)
Phoneme label | Language | IPA |
---|---|---|
AA | en | ɑ |
AE | en | æ |
AH | en | ʌ |
AO | en | ɔ |
AW | en | aʊ |
AX | en | ɚ |
AY | en | aɪ |
EH | en | ɛ |
ER | en | ər |
EY | en | eɪ |
IH | en | ɪ |
IY | en | i |
OW | en | oʊ |
OY | en | ɔɪ |
UH | en | ʊ |
UW | en | u |
B | en | b |
CH | en | tʃ |
D | en | d |
DH | en | ð |
DX | en | ɾ |
DR | en | dr |
f | zh/ja/en/fr | f |
HH | en/ja | h |
JH | en | dʒ |
L | en | l |
m | zh/ja/en/fr | m |
N | en | n |
NG | en | ŋ |
P | en | p |
R | en | r |
s | zh/ja/en/fr | s |
SH | en | ʃ |
T | en | t |
TH | en | θ |
TR | en | tr |
W | en | w |
Y | en | j |
Z | en | z |
ZH | en | ʒ |
Japanese
Phoneme label | Language | Romaji |
---|---|---|
A | ja | a |
I | ja | i |
U | ja | u |
E1 | ja | e |
O | ja | o |
I. | ja | i (devoiced) |
U. | ja | u (devoiced) |
b1 | ja | b |
d1 | ja | d |
G | ja | g |
HH | en/ja | h |
J | ja | j |
K | ja | k |
m | zh/en/ja/fr | m |
n | zh/ja | n (consonant) |
p1 | ja | p |
t1 | ja | t |
V | ja | v |
y1 | ja | y |
z1 | ja | z |
ch1 | ja | ch |
F | ja | f |
r1 | ja | r |
s | zh/en/ja/fr | s |
sh1 | ja | sh |
ts | ja | ts |
nn | ja | n (moraic) |
ky | ja | ky |
gy | ja | gy |
ty | ja | ty |
dy | ja | dy |
hy | ja | hy |
by | ja | by |
my | ja | my |
ny | ja | ny |
py | ja | py |
ry | ja | ry |
French
Phoneme label | Language | IPA |
---|---|---|
ah4 | fr | a/ɑ |
eh4 | fr | e |
ae4 | fr | ɛ/ɛ: |
ee | fr | ə/ø |
oe | fr | œ |
ih4 | fr | i |
oh | fr | o |
oo | fr | ɔ |
ou4 | fr | u |
uh4 | fr | y |
en4 | fr | ɑ̃ |
in4 | fr | ɛ̃/œ̃ |
un4 | fr | œ̃ |
on4 | fr | ɔ̃ |
uy | fr | ɥ |
b4 | fr | b |
d4 | fr | d |
f | zh/en/fr | f |
g4 | fr | g |
k4 | fr | k |
l4 | fr | l |
m | zh/en/ja/fr | m |
n4 | fr | n |
p4 | fr | p |
r4 | fr | ʁ |
s | zh/en/ja/fr | s |
t4 | fr | t |
v4 | fr | v |
j4 | fr | ʒ |
y4 | fr | j |
w4 | fr | w |
ng4 | fr | ŋ |
rx | fr | r |
Special symbols:
- SP (Space)
- AP (Aspiration)
- CL (Japanese sokuon stop)
Voice Provider: KM
Data processing: Hisaka (hrukalive)
Data labeling: UG, Hisaka, Vivi, KM
Training: Hisaka
Character design: UG
Splash art: UG
The following public and private databases were used as augment data during the development of this model:
The use of the above databases has been authorized by the corresponding rights holders.