forked from Purfview/whisper-standalone-win
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchangelog.txt
193 lines (156 loc) · 5.6 KB
/
changelog.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
Log for Standalone Faster-Whisper & Faster-Whisper-XXL
### Note: It's not a comprehensive log.
## r192.3.4 [XXL]:
New feature: Write intermediate files to temp folder. [ignored on dump]
New feature: Can take JSON files as input to generate subtitles from it according to the settings.
New alternative VAD method : `pyannote_v3` [previous same named option renamed to "pyannote_onnx_v3"]
New arg: `--nullify_non_speech` [for WIP experiments]
## r192.3.3 [XXL]:
CTranslate2 updated to v4.2.0
Transcription on CPU is up to ~26% faster.
## r192.3.2 [XXL]:
Various improvements and bugfixes to `--vad_alt_method`
## r192.3:
Bugfix: 'one_word' was broken in r192.2.
## r192.2:
Allowed to use 'max_line_width' and 'max_line_count' without '--sentence'.
## r192.1.1 [XXL]:
`--ff_mdx_kim2`: Preprocess audio with MDX23 Kim vocal v2 model.
`--vad_alt_method`: Alternative VAD methods - "silero_v3", "silero_v4", "pyannote_v3", "auditok", "webrtc".
## r192.1:
Bugfix: It was unnecessarily going through ffmpeg routines. [introduced in r189.1]
## r189.1:
Supports `distil-large-v3` model.
Switched auto prompt for Serbian to the latin alphabet.
Auto-disable VAD when --clip_timestamps is in use.
JSON output has addition 'text' key with all text aggregated.
Fixed --task=translate and --postfix bug.
--highlight_words now supports --max_line_width and --max_line_count
Changed patience default.
Adjusted "--hallucination_silence_threshold"s score algo, and new `--hallucination_silence_th_temp`
Auto pseudo-vad th offsets for "large-v3", can be disabled with --v3_offsets_off.
`--language_detection_threshold`
`--language_detection_segments`
`--ff_track` - Audio track selection.
`--ff_fc` - Front-Center channel selection.
`--ff_dump` - Additionally prevents deletion of some intermediate audio files.
`--vad_dump` - Writes VAD timings to srt file.
Updated PyInstaller to v6.5.0
Fix Linux exec issue with forward compatibility. v2
## r186.1:
Fix quirks with "The last progress update is passed to the title bar." [r160.11]
Fix Linux exec issue with forward compatibility.
## r182.2:
Bugfix: PR705 was incorporated a bit incorrectly.
Bugfix: Bug with the custom prompt presets and task "translate".
## r182.1:
Includes PR705 & PR706
Excludes PR694 [aka CUDA12]
Skip micro chunks.
New args:
`--hallucination_silence_threshold`
`--clip_timestamps`
Updated PyAV to v11.0.0 [ffmpeg 6.0]
Updated PyInstaller to v6.4.0
Updated CTranslate2 to v3.24.0
## r172.4:
Bugfix: With some CPUs rnndn filters were not working.
Improved: `--sentence` doesn't treat ellipses (...) as the end of sentence, unless ellipses are at the end of a segment.
Changed default of `--initial_prompt` to `auto`.
## r172.3:
Better error handling for the audio filters.
## r172.2:
Increased "recursion" limit from 1000 to 10000.
## r172.1:
Supports "Distil-Whisper" models:
`distil-large-v2`
`distil-medium.en`
`distil-small.en`
New args:
`--max_new_tokens`
`--chunk_length`
## r167.5:
Additional option for `--prompt_reset_on_no_end`
Various audio filters:
`--ff_dump`
`--ff_mp3`
`--ff_sync`
`--ff_rnndn_sh`
`--ff_rnndn_xiph`
`--ff_fftdn`
`--ff_tempo`
`--ff_gate`
`--ff_speechnorm`
`--ff_loudnorm`
`--ff_silence_suppress`
`--ff_lowhighpass`
## r167.4:
Bugfix: Bug if couldn't detect the number of CPU's cores.
Updated psutil to v5.9.7
## r167.3:
Bugfix: Illogical "Avoid computing higher temperatures on no_speech". [It could cause bad hallucination loops]
https://github.com/openai/whisper/pull/1903
https://github.com/SYSTRAN/faster-whisper/pull/625
## r167.2:
Fix: Commit #165 was missing in `r167.1` release.
## r167.1:
`large-v3` now is using the official model.
Updated PyInstaller to v6.3.0
Updated CTranslate2 to v3.23.0
## r160.12:
Doesn't add to prompt segments triggered by the common hallucinations in hallucinations_list.
New parameters :
`--max_comma_cent`
`--min_dist_to_end`
## r160.11:
Bugfix: `--prompt_reset_on_temperature` was broken. Regression since r139.
Improved `--initial_prompt` default preset, there is alternative experimental `auto` preset.
Improved `--sentence` with the ignore words list, words like `Mr.` and ect..
The last progress update is passed to the title bar.
New parameters :
`--prompt_max`
`--reprompt` [enabled by default]
`--prompt_reset_on_no_end` [enabled by default]
## r160.10:
Bugfix: The final line disappeared if the final word wasn't punctuated when using `--sentence`.
## r160.9:
Bugfix: The final word could disappear in some combination of the new parameters from r160.8.
New parameter : --standard_asia
## r160.8:
Improved --skip to work with all output folders.
New parameters :
`--sentence`
`--standard`
`--max_comma`
`--max_gap`
`--max_line_width`
`--max_line_count`
## r160.7:
Bugfix: Wildcard input could fail if brackets were in a folder's name.
Improved `--one_word` with the second option.
Improved `--skip` to work with all output formats. If output is "all" - checks only "srt".
New `--check_files` argument.
Added a catch for invalid `audio` input.
Turns off word_timestamps if output format is "text".
Exposed options :
`--prefix`
`--suppress_blank`
`--without_timestamps`
`--max_initial_timestamp`
## r160.6:
Bugfix: Autodownload wasn't working for `large-v3` [ regression in r160.5 ]
## r160.5:
large-v3 use fp16 model by default. [Removed other v3 options]
New switch: --one_word
Updated Pyinstaller to v6.2.0
## r160.4:
Bugfix: -prompt=None wasn't working.
UTF-8 chars are now supported in SE's "console".
## r160.3:
'large-v3' model.
'Cantonese' language.
## r160.2:
Failed release with a code typo.
Was online for ~15 mins...
## r160.1:
Bugfix: Fixed bug on macOS -> https://github.com/Purfview/whisper-standalone-win/issues/86