diff --git a/README.md b/README.md index ac7ab0f..ce93ec8 100644 --- a/README.md +++ b/README.md @@ -10,14 +10,10 @@ Let's use `Elixir` machine learning capabilities to build an application that performs **image captioning** -and **semantic search** -to search for uploaded images +and **semantic searching** +to look for uploaded images with your voice! 🎙️ -

- -

-
@@ -89,29 +85,31 @@ with your voice! 🎙️ - [4.1.2.3 Using the embeddings to semantically search images](#4123-using-the-embeddings-to-semantically-search-images) - [4.1.2.4 Creating embeddings when uploading images](#4124-creating-embeddings-when-uploading-images) - [4.1.2.5 Update the LiveView view](#4125-update-the-liveview-view) + - [5. Tweaking our UI](#5-tweaking-our-ui) - [_Please_ star the repo! ⭐️](#please-star-the-repo-️)
## Why? 🤷 -Building our [app](https://github.com/dwyl/app), +Whilst building our [app](https://github.com/dwyl/app), we consider `images` an _essential_ medium of communication. -You personally may have a collection of images that you want to caption -and semantically retrieve them fast. +We needed a fully-offline capable (no 3rd party APIs/Services) image captioning service +using state-of-the-art pre-trained image and embedding models to describe images uploaded in our +[`App`](https://github.com/dwyl/app). By adding a way of captioning images, we make it _easy_ for people to suggest meta tags that describe images so they become **searchable**. ## What? 💭 -This run-through will create a simple -`Phoenix` web application -that will allow you to choose/drag an image -and automatically caption the image. +A step-by-step tutorial building a fully functional +`Phoenix LiveView` web application that allows anyone +to upload an image and have it described +and searchable. In addition to this, -the app will allow the user to record an audio +the app will allow the person to record an audio which describes the image they want to find. The audio will be transcribed into text @@ -119,6 +117,15 @@ and be semantically queryable. We do this by encoding the image captions as vectors and running `knn search` on them. +We'll be using three different models: + +- Salesforce's BLIP model [`blip-image-captioning-large`](https://huggingface.co/Salesforce/blip-image-captioning-large) +for image captioning. +- OpenAI's speech recognition model +[`whisper-small`](https://huggingface.co/openai/whisper-small). +- [`sentence-transformers/paraphrase-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2) +embedding model. + ## Who? 👤 This tutorial is aimed at `Phoenix` beginners @@ -146,7 +153,7 @@ You'll learn how to do this _yourself_, so grab some coffee and let's get cracki This section will be divided into two sections. One will go over **image captioning** while the second one will expand the application -by adding **semantic search**. +by adding **semantic searching**. ## Prerequisites @@ -169,13 +176,18 @@ In addition to this, **_some_ knowledge of `AWS`** - what it is, what an `S3` bu ## 🌄 Image Captioning in `Elixir` -In this section, we'll start building our application -with `Bumblebee` that supports Transformer models. -At the end of this section, -you'll have a fully functional application -that receives an image, -processes it accordingly -and captions it. +> In this section, we'll start building our application +> with `Bumblebee` that supports Transformer models. +> At the end of this section, +> you'll have a fully functional application +> that receives an image, +> processes it accordingly +> and captions it. + + +

+ +

@@ -3171,13 +3183,18 @@ and all of the code inside the [`_comparison`](./_comparison/) folder. +
## 🔍 Semantic search -> Imagine a person wants to see an image that was uploaded -> under a certain theme. -> One way to solve this problem is to perform a **_full-text_ search query** on specific words among these image captions. +> In this section, we will focus on implementing a +> **_full-text_ search query** through the captions of the images. +> At the end of this, +> you'll be able to transcribe audio, +> create embeddings from the audio transcription +> and search the closest related image. +

@@ -3416,12 +3433,14 @@ so this part of your code will shrink to: ```elixir - - -<%= if @label do %> - <%= @label %> +<%= if @upload_running? do %> + <% else %> - Waiting for image input. + <%= if @label do %> + <%= @label %> + <% else %> + Waiting for image input. + <% end %> <% end %> ``` @@ -5344,9 +5363,9 @@ def handle_progress(:image_list, entry, socket) when entry.done? do )} # Otherwise, if there was an error uploading the image, we log the error and show it to the person. - %{error: errors} -> - Logger.warning("⚠️ Error uploading image. #{inspect(errors)}") - {:noreply, push_event(socket, "toast", %{message: "Image couldn't be uploaded to S3"})} + %{error: error} -> + Logger.warning("⚠️ Error uploading image. #{inspect(error)}") + {:noreply, push_event(socket, "toast", %{message: "Image couldn't be uploaded to S3.\n#{error}"})} end end ``` @@ -5951,13 +5970,14 @@ and update it as so: class="flex mt-2 space-x-1.5 items-center font-bold text-gray-900 text-xl" > Transcription: - - <%= if @transcription do %> - <%= @transcription %> + <%= if @audio_running? do %> + <% else %> - Waiting for audio input. + <%= if @transcription do %> + <%= @transcription %> + <% else %> + Waiting for audio input. + <% end %> <% end %>


@@ -6033,6 +6053,408 @@ You've expanded your knowledge in key areas of machine learning and artificial intelligence, that is increasingly becoming more prevalent! + +### 5. Tweaking our UI + +Now that we have all the features we want in our application, +let's make it prettier! +As it stands, it's responsive enough. +But we can always make it better! + +We're going to show you the changes you're going to need to make +and then explain it to you what it means! + +Head over to `lib/app_web/live/page_live.html.heex` and change it like so: + +```html +