layout |
---|
default |
We present audio examples of our model IAFF-VC. To perform any-to-any voice conversion, we set two scenarios: seen-to-seen (S2S) and unseen-to-unseen(U2U). In each scenario, we present four converted speech: male(M)-male(M), female(F)-female(F), male(M)-female(F), and female(F)-male(M). Furthermore, we perform the conversion in multi-utterance using 3 samples. Same scenarios are set in multi-utterance.
S2S: seen-to-seen; U2U: unseen-to-unseen
F: Female; M:Male
S2S:
F-F:
-
source:
-
target:
-
converted:
F-M:
-
source:
-
target:
-
converted:
M-M:
-
source:
-
target:
-
converted:
M-F:
-
source:
-
target:
-
converted:
U2U:
F-F:
-
source:
-
target:
-
converted:
F-M:
-
source:
-
target:
-
converted:
M-M:
-
source:
-
target:
-
converted:
M-F:
-
source:
-
target:
-
converted:
S2S:
F-F:
-
source:
-
target:
-
converted:
F-M:
-
source:
-
target:
-
converted:
M-M:
-
source:
-
target:
-
converted:
M-F:
-
source:
-
target:
-
converted:
U2U:
F-F:
-
source:
-
target:
-
converted:
F-M:
-
source:
-
target:
-
converted:
M-M:
-
source:
-
target:
-
converted:
M-F:
-
source:
-
target:
-
converted: