#18 1) adding audio transcription #43

ndrean · 2024-01-11T04:22:46Z

This is the first push where I add audio-to-text. I did not made any tests so far. The idea is more to check if the Readme is understandable and the code meaningful.

If you are still interested, I will continue with the second step:

modify the DB to add the HNSWLib index.
add embeddings
add HNSWLib and the knn-search

LuchoTurtle · 2024-01-12T22:11:48Z

Thank you so much for the PR <3
If you want me to take a look at it, even if it's not ready and for feedback, you can ping me :)
Once I have time, I'll take a look and give you the green light to continue if you want to (though that usually goes without saying, your code is usually impeccable :D)

ndrean · 2024-01-14T23:11:25Z

@LuchoTurtle

I try to further clean the code and Readme. I definitely need your eye on all this!

I still have a stupid error in the UI: I display an icon and a text "record" as the action button to record the audio. When you click it, I switch the CSS classes, but when I click to stop the recording, then the icon disappears. You maybe more clever than me!

LuchoTurtle · 2024-01-14T23:15:21Z

Thank you for the work @ndrean! I'll take a look at it tomorrow! ❤️

ndrean · 2024-01-14T23:19:56Z

Then the tests.....😥

ndrean · 2024-01-15T18:10:17Z

ok, I found my mistake and corrected it. No more pb with the audio recording button!

LuchoTurtle · 2024-01-15T18:40:10Z

Lovely, thanks a ton @ndrean !
I'm taking longer than usual with other stuff but I'll try to get to this today, even if it's later at night. Sorry about that! Can't wait 😍

ndrean · 2024-01-15T19:09:51Z

@LuchoTurtle The most interesting part is the next part, this one is "just" adding the JS audio capture and the Speech-to-Text model, following closely the first post below.

I should cite my sources:

LuchoTurtle · 2024-01-16T03:23:02Z

I've made changes to the README. I kept most of the info, it was simply formatting and some writing style changes.
I've also added comments to micro.js.

I'm now trying to get this to work but since the elixir version was changed in the PR, I'm trying to install v16.

README.md

LuchoTurtle · 2024-01-16T03:37:58Z

lib/app/speech_to_text.ex

This makes sense :D

However, inside lib/app/models.ex, we have a module that is testable and manages all the models that need to be downloaded for image captioning.

I believe we should do the same thing for the whisper models. I can rename the models.ex -> caption_models.ex and use the same template for speech_to_text.ex (which we can rename to semantic_search_models.ex.

Doing this makes it much easier to test and mock the models (since we've already done so with models.ex.

Yes, totally, I wait for your clever changes

LuchoTurtle

I've tried it and the recording and audio transcription seems to be working smoothly!
Thank you!

I'm marking this to "request some changes" but I'll gladly do the.
The only thing missing (besides the feedback I've given) are the tests. I can take care of both, if you want to :)

ndrean · 2024-01-16T12:27:22Z

since the elixir version was changed in the PR, I'm trying to install v16.

Ah yes, my version updated via brew, so I had to update the code. It is not disruptive (besides the order of the arguments in File.stream I believe; source.

As for the tests, I started to add a hard-coded audio file, use it as an input to the model and check that the result is what you expect. I did not commit this and I really prefer, trust and am impressed by your style of dealing with tests. Space for progression for me, definitively.

So yes! Please go on!

ndrean · 2024-01-16T16:03:59Z

@LuchoTurtle One point I added in 2) that might be interesting to add already in 1) is a Spinner component since the same code is used in 2 places (caption and transcription).

defmodule AppWeb.Spinner do
  use Phoenix.Component

  attr :spin, :boolean, default: false

  def spin(assigns) do
    ~H"""
    <div :if={@spin} role="status">
      <div class="relative w-6 h-6 animate-spin rounded-full bg-gradient-to-r from-purple-400 via-blue-500 to-red-400 ">
        <div class="absolute top-1/2 left-1/2 transform -translate-x-1/2 -translate-y-1/2 w-3 h-3 bg-gray-200 rounded-full border-2 border-white">
        </div>
      </div>
    </div>
    """
  end
end

You can use it in place of the current code with:

<div class="flex mt-2 space-x-1.5 items-center font-bold text-gray-900 text-xl">
  <span>Description: </span>
  <!-- Spinner -->
  <AppWeb.Spinner.spin spin={@running?} />

  <%= if @label do %>
    <span class="text-gray-700 font-light"><%= @label %></span>
  <% else %>
    <span class="text-gray-300 font-light">Waiting for image input.</span>
  <% end %>
</div>

# and
...
<audio id="audio" controls></audio>
<AppWeb.Spinner.spin spin={@speech_spin} />

ndrean · 2024-01-17T15:03:08Z

@LuchoTurtle Help me with GIT. Should I merge your commit into my branch? If so, should I just merge it?

LuchoTurtle · 2024-01-17T15:15:59Z

@ndrean Check https://adiati.com/git-how-to-fetch-a-branch-from-the-upstream-to-the-local-repo-in-5-steps.
Another alternative that makes it much easier is to use Github's CLI, which makes it much easier to work with PRs from forks. It's what I use :) For example, to get this PR on my local computer, I simply type gh pr checkout 43.

ndrean · 2024-01-20T14:40:40Z

@LuchoTurtle

I merged your changes.
I integrated the whisper model into "models.ex".

In "models.ex", you have a guard Dialyzer warning when you use info = if (model.load_featurizer) do... because the first argument of the if macro is not a boolean.

Instead, you should use Map.get/3. The verbose explanation is because it can return nil, and if executes the else part if the condition is false or nil so you can omit the != nil condition.

info = 
  if Map.get(model, :local_featurizer)  do
    {:ok, featurizer} = Bumblebee.load_featurizer(loading_settings)
    Map.put(info, :featurizer, featurizer)
  else
    info
  end

In my next code update, I modified this if you agree.

ndrean · 2024-01-21T14:55:19Z

Check the last push: I modified Application.Ex and Models to include Whisper.
I basically added an argument to App.Models.verify_and_download_models. I also renamed the image model to captioning_prod_model (and the test to captioning_test_model). I hope this is inline with your thoughts.

LuchoTurtle · 2024-01-21T22:08:12Z

Thanks for the changes :)

I'm trying to run mix c but I can't seem to run the tests, I get the following error:

** (Mix) Could not start application app: exited in: App.Application.start(:normal, [])
    ** (EXIT) an exception was raised:
        ** (MatchError) no match of right hand side value: {:error, "could not find file in local cache and outgoing traffic is disabled, url: https://huggingface.co/api/models/openai/whisper-small/tree/main"}
            (app 0.1.0) lib/app/models.ex:159: App.Models.load_offline_model/1
            (app 0.1.0) lib/app/models.ex:119: App.Models.whisper_serving/0
            (app 0.1.0) lib/app/application.ex:52: App.Application.start/2
            (kernel 9.2) application_master.erl:293: :application_master.start_it_old/4

Weird stuff, since the model is clearly downloaded :/

LuchoTurtle · 2024-01-21T22:15:40Z

Running mix phx.server still works though :)

lib/app/models.ex

…de is contained in `models.ex`.

…EADME with ffmpeg.

codecov · 2024-01-22T03:45:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (442bb69) 100.00% compared to head (98caedd) 100.00%.

❗ Current head 98caedd differs from pull request most recent head 9f4cdf4. Consider uploading reports for the commit 9f4cdf4 to get more accurate results

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #43   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            3         3           
  Lines           85        94    +9     
=========================================
+ Hits            85        94    +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

LuchoTurtle

@ndrean I've made changes:

changed the models.ex and put all model-related code there to maintain consistency and single responsibility.
added some notes on the README regarding ffmpeg and dividing some texts you've wrote into chapters (just to be more clear to the person reading).
added a test to get coverage back to 100%.

I think this looks good for now but I'll wait for your approval to see if everything is cool to later work on the second PR #45 you've awesomely implemented :D

ndrean · 2024-01-22T11:53:31Z

@LuchoTurtle Much better indeed!
So if you agree, this will be the starting point of the next part. Please confirm before I do anything.

LuchoTurtle · 2024-01-22T12:16:58Z

I give the green light 🟢 😃

ndrean added 5 commits January 10, 2024 23:15

adding audio transcription

581c94e

modified Readme

c5b366f

modified Readme

f4b0d7e

modified Readme

feee811

modified Readme

992724f

LuchoTurtle mentioned this pull request Jan 12, 2024

#18 2) Embedding & knn #45

Closed

ndrean added 2 commits January 12, 2024 20:49

cleaning micro.js

4db947c

cleaning audio

f8070e4

changed error button

c637363

ndrean added 3 commits January 15, 2024 13:20

changed error button

5d64d93

changed error button & Readme

4507dc4

changed error button & Readme

204d2bf

ndrean and others added 3 commits January 15, 2024 15:16

added source part 1 Readme

e2ef6f4

corrected Readme

445cc4c

fix: Going over README with new Semantic Search section.

b92221c

LuchoTurtle reviewed Jan 16, 2024

View reviewed changes

README.md Show resolved Hide resolved

LuchoTurtle reviewed Jan 16, 2024

View reviewed changes

LuchoTurtle requested changes Jan 16, 2024

View reviewed changes

fix: Add missing content to README.

90d9813

Merge branch 'dwyl:main' into audio-transcription

232f41f

ndrean added 4 commits January 20, 2024 10:46

modified Models for Whisper

be9ea3e

refactor Readme

be40d62

modified parallel start-up to check due to HD full

f6b8a74

modified Application & models to include Whisper

1a28955

line 158 Models forgot to restore boolean

2aef3c2

LuchoTurtle reviewed Jan 21, 2024

View reviewed changes

lib/app/models.ex Show resolved Hide resolved

LuchoTurtle added 7 commits January 22, 2024 02:16

fix: Making models work with mix c. Refactoring so model-related co…

8117b96

…de is contained in `models.ex`.

chore: Basic refactoring for testing with whisper.

41a3489

chore: Spacing

bfce616

feat: Adding test.

f637cde

chore: Updating Elixir version so tests can run.

72d7b11

chore: Adding ffmpeg to github actions and adding pre-requisites in R…

492af91

…EADME with ffmpeg.

chore: Refactoring README to be more clear.

98caedd

LuchoTurtle approved these changes Jan 22, 2024

View reviewed changes

typos

9f4cdf4

LuchoTurtle merged commit b86f01d into dwyl:main Jan 23, 2024
1 check passed

LuchoTurtle mentioned this pull request Jan 23, 2024

Semantic search added #47

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#18 1) adding audio transcription #43

#18 1) adding audio transcription #43

ndrean commented Jan 11, 2024 •

edited

Loading

LuchoTurtle commented Jan 12, 2024 •

edited

Loading

ndrean commented Jan 14, 2024

LuchoTurtle commented Jan 14, 2024

ndrean commented Jan 14, 2024

ndrean commented Jan 15, 2024

LuchoTurtle commented Jan 15, 2024

ndrean commented Jan 15, 2024 •

edited

Loading

LuchoTurtle commented Jan 16, 2024

LuchoTurtle Jan 16, 2024 •

edited

Loading

ndrean Jan 16, 2024 •

edited

Loading

LuchoTurtle left a comment

ndrean commented Jan 16, 2024 •

edited

Loading

ndrean commented Jan 16, 2024 •

edited

Loading

ndrean commented Jan 17, 2024

LuchoTurtle commented Jan 17, 2024

ndrean commented Jan 20, 2024 •

edited

Loading

ndrean commented Jan 21, 2024

LuchoTurtle commented Jan 21, 2024

LuchoTurtle commented Jan 21, 2024

codecov bot commented Jan 22, 2024 •

edited

Loading

LuchoTurtle left a comment •

edited

Loading

ndrean commented Jan 22, 2024 •

edited

Loading

LuchoTurtle commented Jan 22, 2024

#18 1) adding audio transcription #43

#18 1) adding audio transcription #43

Conversation

ndrean commented Jan 11, 2024 • edited Loading

LuchoTurtle commented Jan 12, 2024 • edited Loading

ndrean commented Jan 14, 2024

LuchoTurtle commented Jan 14, 2024

ndrean commented Jan 14, 2024

ndrean commented Jan 15, 2024

LuchoTurtle commented Jan 15, 2024

ndrean commented Jan 15, 2024 • edited Loading

LuchoTurtle commented Jan 16, 2024

LuchoTurtle Jan 16, 2024 • edited Loading

Choose a reason for hiding this comment

ndrean Jan 16, 2024 • edited Loading

Choose a reason for hiding this comment

LuchoTurtle left a comment

Choose a reason for hiding this comment

ndrean commented Jan 16, 2024 • edited Loading

ndrean commented Jan 16, 2024 • edited Loading

ndrean commented Jan 17, 2024

LuchoTurtle commented Jan 17, 2024

ndrean commented Jan 20, 2024 • edited Loading

ndrean commented Jan 21, 2024

LuchoTurtle commented Jan 21, 2024

LuchoTurtle commented Jan 21, 2024

codecov bot commented Jan 22, 2024 • edited Loading

Codecov Report

LuchoTurtle left a comment • edited Loading

Choose a reason for hiding this comment

ndrean commented Jan 22, 2024 • edited Loading

LuchoTurtle commented Jan 22, 2024

ndrean commented Jan 11, 2024 •

edited

Loading

LuchoTurtle commented Jan 12, 2024 •

edited

Loading

ndrean commented Jan 15, 2024 •

edited

Loading

LuchoTurtle Jan 16, 2024 •

edited

Loading

ndrean Jan 16, 2024 •

edited

Loading

ndrean commented Jan 16, 2024 •

edited

Loading

ndrean commented Jan 16, 2024 •

edited

Loading

ndrean commented Jan 20, 2024 •

edited

Loading

codecov bot commented Jan 22, 2024 •

edited

Loading

LuchoTurtle left a comment •

edited

Loading

ndrean commented Jan 22, 2024 •

edited

Loading