-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#18 1) adding audio transcription #43
Conversation
Thank you so much for the PR <3 |
I try to further clean the code and Readme. I definitely need your eye on all this! I still have a stupid error in the UI: I display an icon and a text "record" as the action button to record the audio. When you click it, I switch the CSS classes, but when I click to stop the recording, then the icon disappears. You maybe more clever than me! |
Thank you for the work @ndrean! I'll take a look at it tomorrow! ❤️ |
Then the tests.....😥 |
ok, I found my mistake and corrected it. No more pb with the audio recording button! |
Lovely, thanks a ton @ndrean ! |
@LuchoTurtle The most interesting part is the next part, this one is "just" adding the JS audio capture and the Speech-to-Text model, following closely the first post below. I should cite my sources: |
I've made changes to the README. I kept most of the info, it was simply formatting and some writing style changes. I'm now trying to get this to work but since the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense :D
However, inside lib/app/models.ex
, we have a module that is testable and manages all the models that need to be downloaded for image captioning.
I believe we should do the same thing for the whisper
models. I can rename the models.ex
-> caption_models.ex
and use the same template for speech_to_text.ex
(which we can rename to semantic_search_models.ex
.
Doing this makes it much easier to test and mock the models (since we've already done so with models.ex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, totally, I wait for your clever changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried it and the recording and audio transcription seems to be working smoothly!
Thank you!
I'm marking this to "request some changes" but I'll gladly do the.
The only thing missing (besides the feedback I've given) are the tests. I can take care of both, if you want to :)
Ah yes, my version updated via brew, so I had to update the code. It is not disruptive (besides the order of the arguments in As for the tests, I started to add a hard-coded audio file, use it as an input to the model and check that the result is what you expect. I did not commit this and I really prefer, trust and am impressed by your style of dealing with tests. Space for progression for me, definitively. So yes! Please go on! |
@LuchoTurtle One point I added in 2) that might be interesting to add already in 1) is a Spinner component since the same code is used in 2 places (caption and transcription). defmodule AppWeb.Spinner do
use Phoenix.Component
attr :spin, :boolean, default: false
def spin(assigns) do
~H"""
<div :if={@spin} role="status">
<div class="relative w-6 h-6 animate-spin rounded-full bg-gradient-to-r from-purple-400 via-blue-500 to-red-400 ">
<div class="absolute top-1/2 left-1/2 transform -translate-x-1/2 -translate-y-1/2 w-3 h-3 bg-gray-200 rounded-full border-2 border-white">
</div>
</div>
</div>
"""
end
end You can use it in place of the current code with: <div class="flex mt-2 space-x-1.5 items-center font-bold text-gray-900 text-xl">
<span>Description: </span>
<!-- Spinner -->
<AppWeb.Spinner.spin spin={@running?} />
<%= if @label do %>
<span class="text-gray-700 font-light"><%= @label %></span>
<% else %>
<span class="text-gray-300 font-light">Waiting for image input.</span>
<% end %>
</div>
# and
...
<audio id="audio" controls></audio>
<AppWeb.Spinner.spin spin={@speech_spin} /> |
@LuchoTurtle Help me with GIT. Should I merge your commit into my branch? If so, should I just merge it? |
@ndrean Check https://adiati.com/git-how-to-fetch-a-branch-from-the-upstream-to-the-local-repo-in-5-steps. |
I merged your changes. In "models.ex", you have a guard Dialyzer warning when you use Instead, you should use info =
if Map.get(model, :local_featurizer) do
{:ok, featurizer} = Bumblebee.load_featurizer(loading_settings)
Map.put(info, :featurizer, featurizer)
else
info
end In my next code update, I modified this if you agree. |
Check the last push: I modified Application.Ex and Models to include Whisper. |
Thanks for the changes :) I'm trying to run
Weird stuff, since the model is clearly downloaded :/ |
Running |
…de is contained in `models.ex`.
…EADME with ffmpeg.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #43 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 3 3
Lines 85 94 +9
=========================================
+ Hits 85 94 +9 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndrean I've made changes:
- changed the
models.ex
and put all model-related code there to maintain consistency and single responsibility. - added some notes on the README regarding
ffmpeg
and dividing some texts you've wrote into chapters (just to be more clear to the person reading). - added a test to get coverage back to 100%.
I think this looks good for now but I'll wait for your approval to see if everything is cool to later work on the second PR #45 you've awesomely implemented :D
@LuchoTurtle Much better indeed! |
I give the green light 🟢 😃 |
This is the first push where I add audio-to-text. I did not made any tests so far. The idea is more to check if the Readme is understandable and the code meaningful.
If you are still interested, I will continue with the second step: