Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division by zero error when using generating dataset as part of .\start_finetune.bat #489

Open
raza-qazi opened this issue Jan 10, 2025 · 1 comment

Comments

@raza-qazi
Copy link

raza-qazi commented Jan 10, 2025

diagnostics.log

During the finetuning process in AllTalk's .\start_finetune.bat, the script encounters a ZeroDivisionError shown below:

Steps to Reproduce:

  1. run start_finetune.bat
  2. Upload New Audio Sample
  3. set project name
  4. Whisper Model: large-v3, large-v2 (both failing)
  5. Model Precision: Mixed
  6. Dataset Language: en
  7. Evaluation Data Split: 15
  8. BPE Tokenizer: Enabled
  9. VAD: enabled
  10. Min/Max audio length: default and 4-20 seconds (both failing)
  11. Clicked on Step 1 - Create dataset
  12. Error in Log

1 sample file uploaded:
ffmpeg output:

Input #0, wav, from 'sample1.wav':
  Duration: 00:05:04.25, bitrate: 705 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s

Text/logs

[FINETUNE] [INFO] Initializing output directory: C:\Users\User\code\alltalk_tts\finetune\newvoice
[FINETUNE] [MODEL] Using device: cuda
[FINETUNE] [GPU] GPU Memory Status:
[FINETUNE] [GPU] Total: 12282.00 MB
[FINETUNE] [GPU] Used:  1282.73 MB
[FINETUNE] [GPU] Free:  10999.27 MB
[FINETUNE] [MODEL] Loading Whisper model: large-v2
[FINETUNE] [MODEL] Using mixed precision
[FINETUNE] [MODEL] Initializing Silero VAD
Using cache found in C:\Users\User/.cache\torch\hub\snakers4_silero-vad_master
[FINETUNE] [GPU] GPU Memory Status:
[FINETUNE] [GPU] Total: 12282.00 MB
[FINETUNE] [GPU] Used:  10870.58 MB
[FINETUNE] [GPU] Free:  1411.42 MB
[FINETUNE] [INFO] Using existing language setting
[FINETUNE] [AUDIO] Found 1 audio files to process
[FINETUNE] [INFO] Processing: sample1
[FINETUNE] [AUDIO] Original audio duration: 304.25s
[FINETUNE] [AUDIO] Processing with VAD
[FINETUNE] [SEG] Merged 0 segments into 22 segments with mid-range preference
[FINETUNE] [SEG] VAD processing: 22 original segments, 22 after merging
[FINETUNE] [SEG] Merged 0 segments into 22 segments with mid-range preference
[FINETUNE] [SEG] Merged 0 short segments
<...>
[FINETUNE] [AUDIO] Audio Processing Statistics:
[FINETUNE] [AUDIO] Total segments: 22
[FINETUNE] [AUDIO] Average duration: 14.42s
[FINETUNE] [AUDIO] Segments under minimum: 0
[FINETUNE] [AUDIO] Segments over maximum: 3
[FINETUNE] [DATA] Processing metadata and handling duplicates
[FINETUNE] [DUP] Found 15 files with multiple transcriptions
[FINETUNE] [DUP] wavs/sample1_00000020.wav: 4 occurrences
<...>
[FINETUNE] [DUP] Re-transcribing duplicate files to get best transcription
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000020.wav
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000003.wav
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000004.wav
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000014.wav
[FINETUNE] [DUP] Re-transcribing wavs/sample1_00000018.wav
Traceback (most recent call last):
  File "C:\Users\User\code\alltalk_tts\finetune.py", line 3866, in preprocess_dataset
    pd_train_meta, pd_eval_meta, pd_audio_total_size = format_audio_list(
                                                       ^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\code\alltalk_tts\finetune.py", line 1367, in format_audio_list
    best_transcriptions = handle_duplicates(
                          ^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\code\alltalk_tts\finetune.py", line 1920, in handle_duplicates
    "confidence": sum(s.get("confidence", 0) for s in result["segments"])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ZeroDivisionError: division by zero
Traceback (most recent call last):
  File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1945, in process_api
    data = await self.postprocess_data(block_fn, result["prediction"], state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1717, in postprocess_data
    self.validate_outputs(block_fn, predictions)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\code\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1691, in validate_outputs
    raise ValueError(
ValueError: An event handler (preprocess_dataset) didn't receive enough output values (needed: 6, received: 3).
Wanted outputs:
    [<gradio.components.label.Label object at 0x00000218E74B2950>, <gradio.components.textbox.Textbox object at 0x00000218E7A466D0>, <gradio.components.textbox.Textbox object at 0x00000218E74972D0>, <gradio.components.textbox.Textbox object at 0x00000218E2B38A10>, <gradio.components.textbox.Textbox object at 0x00000218E8CB5690>, <gradio.components.textbox.Textbox object at 0x00000218E8CCA110>]
Received outputs:
    ["The data processing was interrupted due to an error!! Please check the console to verify the full error message! 
 Error summary: Traceback (most recent call last):
  File "C:\Users\User\code\alltalk_tts\finetune.py", line 3866, in preprocess_dataset
    pd_train_meta, pd_eval_meta, pd_audio_total_size = format_audio_list(
                                                       ^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\code\alltalk_tts\finetune.py", line 1367, in format_audio_list
    best_transcriptions = handle_duplicates(
                          ^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\code\alltalk_tts\finetune.py", line 1920, in handle_duplicates
    "confidence": sum(s.get("confidence", 0) for s in result["segments"])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ZeroDivisionError: division by zero
", "", ""]

Desktop (please complete the following information):
AllTalk was updated: Jan 8th 2025 (fresh setup)
Custom Python environment: no
Text-generation-webUI was updated: Jan 8th 2025 (fresh setup)

@erew123
Copy link
Owner

erew123 commented Jan 19, 2025

Hi @raza-qazi

Apologies for the late reply. I believe I have fix for this which I will apply at some time, however I am travelling currently for a family emergency. Will get back to this when I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@raza-qazi @erew123 and others