-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation still taking an hour on a w/12 GB GPUs using quantized models #22
Comments
The warnings should be ignored. I am not sure how Gemini managed to extract this info given there should be only two sub stages for stage 1 (here 4 according to Gemini) if you generate only two segments. On the other hand there are probably more substages in stage 2. |
I confirm similar overall speeds on RTX3060 12GB, using profile 3. |
Hi, general question. Would this repo still get a speedup in beefy gpus like an A100? |
Oh I think it was hallucinating 😆 but I did get some value out of the analysis.
--Stage 1.1: Generating Sequence 1 out of 4
---Stage 1.2: Generating Sequence 2 out of 4
---Stage 1.3: Generating Sequence 3 out of 4
---Stage 1.4: Generating Sequence 4 out of 4
---Stage 2.1: Sampling Vocal track
Segment 1 / 5
['./output\\stage2\\atmospheric-space-rock-male-vocals_tp0@93_T1@0_rp1@2_maxtk3000_d95c6f5a-d656-40b0-866d-58127bed3ab0_vtrack.npy', './output\\stage2\\atmospheric-space-rock-male-vocals_tp0@93_T1@0_rp1@2_maxtk3000_d95c6f5a-d656-40b0-866d-58127bed3ab0_itrack.npy']
Stage 2 DONE.
Processing ./output\stage2\atmospheric-space-rock-male-vocals_tp0@93_T1@0_rp1@2_maxtk3000_d95c6f5a-d656-40b0-866d-58127bed3ab0_vtrack.npy
Compressed shape: (8, 5398)
Decoded in 0.07s (1579.49x RTF)
Saved: ./output\vocoder\stems\vtrack.mp3
Processing ./output\stage2\atmospheric-space-rock-male-vocals_tp0@93_T1@0_rp1@2_maxtk3000_d95c6f5a-d656-40b0-866d-58127bed3ab0_itrack.npy
Compressed shape: (8, 5398)
Decoded in 0.07s (1636.89x RTF)
Saved: ./output\vocoder\stems\itrack.mp3
Created mix: ./output\vocoder\mix\atmospheric-space-rock-male-vocals_tp0@93_T1@0_rp1@2_maxtk3000_d95c6f5a-d656-40b0-866d-58127bed3ab0_mixed.mp3
Successfully created 'atmospheric-space-rock-male-vocals_tp0@93_T1@0_rp1@2_maxtk3000_d95c6f5a-d656-40b0-866d-58127bed3ab0_mixed.mp3' with matched low-frequency energy. |
Unfortunaltely, the information related to the duration is missing (i need the last "tokens:" of each sequence). But anyway according to your log you are generating 4 segments which means 2 minutes song if each segment has 3000 tokens. If that's the case you are probably running out of VRAM and this triggers swapping between the cpu and the vram which is quite slow. Please check that you generate only 2 segments of 3000 tokens. |
Console output is below. Using defaults of 2 segments w/3000 tokens. This is repeatable using full, 12GB, and 10GB quantized models on my 3060 w/12GB VRAM.
Any ideas if the warnings here indicate an issue worth fixing, or is this normal?
Update: another user on Pinokio is seeing the same with a 4070s that also has 12GB VRAM. Discord link.
Had Gemini parse the console output to give me timing, which surpassed my expectations. :)
The text was updated successfully, but these errors were encountered: