Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#18 1) adding audio transcription #43

Merged
merged 29 commits into from
Jan 23, 2024
Merged

#18 1) adding audio transcription #43

merged 29 commits into from
Jan 23, 2024

Conversation

ndrean
Copy link
Collaborator

@ndrean ndrean commented Jan 11, 2024

This is the first push where I add audio-to-text. I did not made any tests so far. The idea is more to check if the Readme is understandable and the code meaningful.

If you are still interested, I will continue with the second step:

  • modify the DB to add the HNSWLib index.
  • add embeddings
  • add HNSWLib and the knn-search

@LuchoTurtle
Copy link
Member

LuchoTurtle commented Jan 12, 2024

Thank you so much for the PR <3
If you want me to take a look at it, even if it's not ready and for feedback, you can ping me :)
Once I have time, I'll take a look and give you the green light to continue if you want to (though that usually goes without saying, your code is usually impeccable :D)

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 14, 2024

@LuchoTurtle

I try to further clean the code and Readme. I definitely need your eye on all this!

I still have a stupid error in the UI: I display an icon and a text "record" as the action button to record the audio. When you click it, I switch the CSS classes, but when I click to stop the recording, then the icon disappears. You maybe more clever than me!

@LuchoTurtle
Copy link
Member

Thank you for the work @ndrean! I'll take a look at it tomorrow! ❤️

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 14, 2024

Then the tests.....😥

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 15, 2024

ok, I found my mistake and corrected it. No more pb with the audio recording button!

@LuchoTurtle
Copy link
Member

Lovely, thanks a ton @ndrean !
I'm taking longer than usual with other stuff but I'll try to get to this today, even if it's later at night. Sorry about that! Can't wait 😍

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 15, 2024

@LuchoTurtle The most interesting part is the next part, this one is "just" adding the JS audio capture and the Speech-to-Text model, following closely the first post below.

I should cite my sources:

  1. https://dockyard.com/blog/2023/03/07/audio-speech-recognition-in-elixir-with-whisper-bumblebee?utm_source=elixir-merge
  2. https://dockyard.com/blog/2023/01/11/semantic-search-with-phoenix-axon-bumblebee-and-exfaiss

@LuchoTurtle
Copy link
Member

I've made changes to the README. I kept most of the info, it was simply formatting and some writing style changes.
I've also added comments to micro.js.

I'm now trying to get this to work but since the elixir version was changed in the PR, I'm trying to install v16.

README.md Show resolved Hide resolved
Copy link
Member

@LuchoTurtle LuchoTurtle Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense :D

However, inside lib/app/models.ex, we have a module that is testable and manages all the models that need to be downloaded for image captioning.

I believe we should do the same thing for the whisper models. I can rename the models.ex -> caption_models.ex and use the same template for speech_to_text.ex (which we can rename to semantic_search_models.ex.

Doing this makes it much easier to test and mock the models (since we've already done so with models.ex.

Copy link
Collaborator Author

@ndrean ndrean Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, totally, I wait for your clever changes

Copy link
Member

@LuchoTurtle LuchoTurtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried it and the recording and audio transcription seems to be working smoothly!
Thank you!

I'm marking this to "request some changes" but I'll gladly do the.
The only thing missing (besides the feedback I've given) are the tests. I can take care of both, if you want to :)

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 16, 2024

since the elixir version was changed in the PR, I'm trying to install v16.

Ah yes, my version updated via brew, so I had to update the code. It is not disruptive (besides the order of the arguments in File.stream I believe; source.

As for the tests, I started to add a hard-coded audio file, use it as an input to the model and check that the result is what you expect. I did not commit this and I really prefer, trust and am impressed by your style of dealing with tests. Space for progression for me, definitively.

So yes! Please go on!

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 16, 2024

@LuchoTurtle One point I added in 2) that might be interesting to add already in 1) is a Spinner component since the same code is used in 2 places (caption and transcription).

defmodule AppWeb.Spinner do
  use Phoenix.Component

  attr :spin, :boolean, default: false

  def spin(assigns) do
    ~H"""
    <div :if={@spin} role="status">
      <div class="relative w-6 h-6 animate-spin rounded-full bg-gradient-to-r from-purple-400 via-blue-500 to-red-400 ">
        <div class="absolute top-1/2 left-1/2 transform -translate-x-1/2 -translate-y-1/2 w-3 h-3 bg-gray-200 rounded-full border-2 border-white">
        </div>
      </div>
    </div>
    """
  end
end

You can use it in place of the current code with:

<div class="flex mt-2 space-x-1.5 items-center font-bold text-gray-900 text-xl">
  <span>Description: </span>
  <!-- Spinner -->
  <AppWeb.Spinner.spin spin={@running?} />

  <%= if @label do %>
    <span class="text-gray-700 font-light"><%= @label %></span>
  <% else %>
    <span class="text-gray-300 font-light">Waiting for image input.</span>
  <% end %>
</div>

# and
...
<audio id="audio" controls></audio>
<AppWeb.Spinner.spin spin={@speech_spin} />

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 17, 2024

@LuchoTurtle Help me with GIT. Should I merge your commit into my branch? If so, should I just merge it?

@LuchoTurtle
Copy link
Member

@ndrean Check https://adiati.com/git-how-to-fetch-a-branch-from-the-upstream-to-the-local-repo-in-5-steps.
Another alternative that makes it much easier is to use Github's CLI, which makes it much easier to work with PRs from forks. It's what I use :) For example, to get this PR on my local computer, I simply type gh pr checkout 43.

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 20, 2024

@LuchoTurtle

I merged your changes.
I integrated the whisper model into "models.ex".

In "models.ex", you have a guard Dialyzer warning when you use info = if (model.load_featurizer) do... because the first argument of the if macro is not a boolean.

Instead, you should use Map.get/3. The verbose explanation is because it can return nil, and if executes the else part if the condition is false or nil so you can omit the != nil condition.

info = 
  if Map.get(model, :local_featurizer)  do
    {:ok, featurizer} = Bumblebee.load_featurizer(loading_settings)
    Map.put(info, :featurizer, featurizer)
  else
    info
  end

In my next code update, I modified this if you agree.

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 21, 2024

Check the last push: I modified Application.Ex and Models to include Whisper.
I basically added an argument to App.Models.verify_and_download_models. I also renamed the image model to captioning_prod_model (and the test to captioning_test_model). I hope this is inline with your thoughts.

@LuchoTurtle
Copy link
Member

Thanks for the changes :)

I'm trying to run mix c but I can't seem to run the tests, I get the following error:

** (Mix) Could not start application app: exited in: App.Application.start(:normal, [])
    ** (EXIT) an exception was raised:
        ** (MatchError) no match of right hand side value: {:error, "could not find file in local cache and outgoing traffic is disabled, url: https://huggingface.co/api/models/openai/whisper-small/tree/main"}
            (app 0.1.0) lib/app/models.ex:159: App.Models.load_offline_model/1
            (app 0.1.0) lib/app/models.ex:119: App.Models.whisper_serving/0
            (app 0.1.0) lib/app/application.ex:52: App.Application.start/2
            (kernel 9.2) application_master.erl:293: :application_master.start_it_old/4

Weird stuff, since the model is clearly downloaded :/

@LuchoTurtle
Copy link
Member

Running mix phx.server still works though :)

Copy link

codecov bot commented Jan 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (442bb69) 100.00% compared to head (98caedd) 100.00%.

❗ Current head 98caedd differs from pull request most recent head 9f4cdf4. Consider uploading reports for the commit 9f4cdf4 to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #43   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            3         3           
  Lines           85        94    +9     
=========================================
+ Hits            85        94    +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@LuchoTurtle LuchoTurtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndrean I've made changes:

  • changed the models.ex and put all model-related code there to maintain consistency and single responsibility.
  • added some notes on the README regarding ffmpeg and dividing some texts you've wrote into chapters (just to be more clear to the person reading).
  • added a test to get coverage back to 100%.

I think this looks good for now but I'll wait for your approval to see if everything is cool to later work on the second PR #45 you've awesomely implemented :D

@ndrean
Copy link
Collaborator Author

ndrean commented Jan 22, 2024

@LuchoTurtle Much better indeed!
So if you agree, this will be the starting point of the next part. Please confirm before I do anything.

@LuchoTurtle
Copy link
Member

I give the green light 🟢 😃

@LuchoTurtle LuchoTurtle merged commit b86f01d into dwyl:main Jan 23, 2024
1 check passed
@LuchoTurtle LuchoTurtle mentioned this pull request Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants