You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[FINETUNE] [INFO] Initializing output directory: C:\Users\User\code\alltalk_tts\finetune\newvoice
[FINETUNE] [MODEL] Using device: cuda
[FINETUNE] [GPU] GPU Memory Status:
[FINETUNE] [GPU] Total: 12282.00 MB
[FINETUNE] [GPU] Used: 1282.73 MB
[FINETUNE] [GPU] Free: 10999.27 MB
[FINETUNE] [MODEL] Loading Whisper model: large-v2
[FINETUNE] [MODEL] Using mixed precision
[FINETUNE] [MODEL] Initializing Silero VAD
Using cache found in C:\Users\User/.cache\torch\hub\snakers4_silero-vad_master
[FINETUNE] [GPU] GPU Memory Status:
[FINETUNE] [GPU] Total: 12282.00 MB
[FINETUNE] [GPU] Used: 10870.58 MB
[FINETUNE] [GPU] Free: 1411.42 MB
[FINETUNE] [INFO] Using existing language setting
[FINETUNE] [AUDIO] Found 1 audio files to process
[FINETUNE] [INFO] Processing: sample1
[FINETUNE] [AUDIO] Original audio duration: 304.25s
[FINETUNE] [AUDIO] Processing with VAD
[FINETUNE] [SEG] Merged 0 segments into 22 segments with mid-range preference
[FINETUNE] [SEG] VAD processing: 22 original segments, 22 after merging
[FINETUNE] [SEG] Merged 0 segments into 22 segments with mid-range preference
[FINETUNE] [SEG] Merged 0 short segments
<...>
[FINETUNE] [AUDIO] Audio Processing Statistics:
[FINETUNE] [AUDIO] Total segments: 22
[FINETUNE] [AUDIO] Average duration: 14.42s
[FINETUNE] [AUDIO] Segments under minimum: 0
[FINETUNE] [AUDIO] Segments over maximum: 3
[FINETUNE] [DATA] Processing metadata and handling duplicates
[FINETUNE] [DUP] Found 15 files with multiple transcriptions
[FINETUNE] [DUP] wavs/sample1_00000020.wav: 4 occurrences
<...>
[FINETUNE] [DUP] Re-transcribing duplicate files to get best transcription
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000020.wav
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000003.wav
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000004.wav
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000014.wav
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000018.wav
Traceback (most recent call last):
File "C:\Users\User\code\alltalk_tts\finetune.py", line 3866, in preprocess_dataset
pd_train_meta, pd_eval_meta, pd_audio_total_size = format_audio_list(
^^^^^^^^^^^^^^^^^^
File "C:\Users\User\code\alltalk_tts\finetune.py", line 1367, in format_audio_list
best_transcriptions = handle_duplicates(
^^^^^^^^^^^^^^^^^^
File "C:\Users\User\code\alltalk_tts\finetune.py", line 1920, in handle_duplicates
"confidence": sum(s.get("confidence", 0) for s in result["segments"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ZeroDivisionError: division by zero
Traceback (most recent call last):
File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1945, in process_api
data = await self.postprocess_data(block_fn, result["prediction"], state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1717, in postprocess_data
self.validate_outputs(block_fn, predictions) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1691, in validate_outputs
raise ValueError(
ValueError: An event handler (preprocess_dataset) didn't receive enough output values (needed: 6, received: 3).
Wanted outputs:
[<gradio.components.label.Label object at 0x00000218E74B2950>, <gradio.components.textbox.Textbox object at 0x00000218E7A466D0>, <gradio.components.textbox.Textbox object at 0x00000218E74972D0>, <gradio.components.textbox.Textbox object at 0x00000218E2B38A10>, <gradio.components.textbox.Textbox object at 0x00000218E8CB5690>, <gradio.components.textbox.Textbox object at 0x00000218E8CCA110>]
Received outputs:
["The data processing was interrupted due to an error!! Please check the console to verify the full error message!
Error summary: Traceback (most recent call last):
File "C:\Users\User\code\alltalk_tts\finetune.py", line 3866, in preprocess_dataset
pd_train_meta, pd_eval_meta, pd_audio_total_size = format_audio_list(
^^^^^^^^^^^^^^^^^^
File "C:\Users\User\code\alltalk_tts\finetune.py", line 1367, in format_audio_list
best_transcriptions = handle_duplicates(
^^^^^^^^^^^^^^^^^^
File "C:\Users\User\code\alltalk_tts\finetune.py", line 1920, in handle_duplicates
"confidence": sum(s.get("confidence", 0) for s in result["segments"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ZeroDivisionError: division by zero
", "", ""]
Desktop (please complete the following information):
AllTalk was updated: Jan 8th 2025 (fresh setup)
Custom Python environment: no
Text-generation-webUI was updated: Jan 8th 2025 (fresh setup)
The text was updated successfully, but these errors were encountered:
Apologies for the late reply. I believe I have fix for this which I will apply at some time, however I am travelling currently for a family emergency. Will get back to this when I can.
diagnostics.log
During the finetuning process in AllTalk's .\start_finetune.bat, the script encounters a ZeroDivisionError shown below:
Steps to Reproduce:
start_finetune.bat
1 sample file uploaded:
ffmpeg output:
Text/logs
Desktop (please complete the following information):
AllTalk was updated: Jan 8th 2025 (fresh setup)
Custom Python environment: no
Text-generation-webUI was updated: Jan 8th 2025 (fresh setup)
The text was updated successfully, but these errors were encountered: