You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from VITS architecture, we use VAE. To see reparameterization trick, in action, we predict a mean value and log_scale (log of variance). It is smartly predicted using 2xoutput_channels (since we want output_channel sized values , each value has one mean and corresponding variance). The mean could be directly used to sample but then we cannot backpropogate the errors on the computation graph.
Therefore, we can equivalently do (mean + noise * variance).
Another place this is useful is in sampling for reverse flows.
In this code in VITS :
It initialized a z shaped mel spec with random noise scaled using m_p and logs_p. This is called z_p then sent to flows to get a predicted z which is further sent to waveform_decoder.
This is just for anyone interested in the math and code relation.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
In the following line,
TTS/TTS/tts/layers/vits/networks.py
Line 287 in d309f50
from VITS architecture, we use VAE. To see reparameterization trick, in action, we predict a mean value and log_scale (log of variance). It is smartly predicted using 2xoutput_channels (since we want output_channel sized values , each value has one mean and corresponding variance). The mean could be directly used to sample but then we cannot backpropogate the errors on the computation graph.
Therefore, we can equivalently do (mean + noise * variance).
Another place this is useful is in sampling for reverse flows.
In this code in VITS :
TTS/TTS/tts/models/vits.py
Line 1154 in c2d15cd
It initialized a z shaped mel spec with random noise scaled using m_p and logs_p. This is called z_p then sent to flows to get a predicted z which is further sent to waveform_decoder.
This is just for anyone interested in the math and code relation.
Beta Was this translation helpful? Give feedback.
All reactions