Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic search added #47

Merged
merged 105 commits into from
Mar 4, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
140c49c
first trial
ndrean Jan 22, 2024
c78f111
corrected dialyzer error case
ndrean Jan 23, 2024
4bf1276
corrected dialyzer error case
ndrean Jan 23, 2024
76ac055
corrected dialyzer error case in Readme
ndrean Jan 23, 2024
41ea744
align transcription output with caption
ndrean Jan 23, 2024
d61913d
align transcription output & image with caption
ndrean Jan 23, 2024
bfca6d2
add index in db
ndrean Jan 23, 2024
85ef304
loading from DB if index file erased
ndrean Jan 24, 2024
67476f2
loading from DB if index file erased
ndrean Jan 24, 2024
c88ef83
loading from DB if index file erased: cleaning
ndrean Jan 24, 2024
9c65846
with Ecto.Multi & transaction
ndrean Jan 24, 2024
ffc4739
added SHA to get unique files
ndrean Jan 24, 2024
b82df7d
created unique index on :sha1 for images & schema capture
ndrean Jan 24, 2024
db26ee7
changed SHA1 test to early failure
ndrean Jan 24, 2024
7993b5d
refactor SHA & doc with functions
ndrean Jan 25, 2024
6f99a58
added pg index on image_idx
ndrean Jan 25, 2024
aebaa10
added optimistic lock & refactored Multi
ndrean Jan 25, 2024
631c5d7
modified image saving sequence & keep lock on index file
ndrean Jan 27, 2024
1bf22cf
refactored to remove credo warnings for depth
ndrean Jan 27, 2024
a1897a1
refactored to remove credo warnings for depth
ndrean Jan 27, 2024
78c6597
recreate index, uuid on tmp.wav, try/rescue, Embedding in Models
ndrean Feb 3, 2024
5529a16
added check_load to Models
ndrean Feb 3, 2024
00a03e4
starting to modify Readme
ndrean Feb 3, 2024
860eae1
starting to modify Readme
ndrean Feb 3, 2024
b5dee2c
continue Readme
ndrean Feb 4, 2024
e0a60d3
added plug_cowboy ~>2.7 since bug and check_index on start-up
ndrean Feb 5, 2024
e6c6041
removed index from socket state and put into KnnIndex GenServer
ndrean Feb 5, 2024
4d414da
added on_mount for Index integrity => 404 warning
ndrean Feb 6, 2024
69dff02
refactored to remove nested credo warnings
ndrean Feb 6, 2024
d9ad772
changed order in transaction
ndrean Feb 7, 2024
336efdb
tested release
ndrean Feb 8, 2024
28ba1e2
added starte tests
ndrean Feb 8, 2024
c4684d3
remove duplicated
ndrean Feb 8, 2024
f5a5fd8
remove duplicated
ndrean Feb 8, 2024
ef4fc75
remove duplicated
ndrean Feb 8, 2024
8fc5f7f
chore: Trying to get `mix c` to work.
LuchoTurtle Feb 9, 2024
5bff3c9
chore: Mix test now runs. Though the tests fail.
LuchoTurtle Feb 10, 2024
d1182a9
rm on_mount and add integrity in GS init
ndrean Feb 10, 2024
5dc9c90
forgot hnwslib 0.14; credo warning...
ndrean Feb 10, 2024
49d4bd0
changed HnwslibIndex test file
ndrean Feb 10, 2024
6f2c3ea
chore: Changing html in hopes to fix test.
LuchoTurtle Feb 11, 2024
7950425
chore: Tests now can be executed properly and reset after every run.
LuchoTurtle Feb 12, 2024
7f46e6c
chore: Simplifying test runs.
LuchoTurtle Feb 12, 2024
de7f544
fix: Ignoring index files (shouldn't be on git).
LuchoTurtle Feb 12, 2024
c8e4c5d
chore: Forcing tests to run sync.
LuchoTurtle Feb 12, 2024
905b2db
chore: Adding tests for image.
LuchoTurtle Feb 13, 2024
74669db
refactored check_integrity for GenServer testing.
ndrean Feb 13, 2024
368cf3a
refactored check_integrity for GenServer testing.
ndrean Feb 13, 2024
23cb3e0
chore: Adding testing to hnswlib_index schema.
LuchoTurtle Feb 13, 2024
0f1f904
Merge branch 'semantic' of https://github.com/ndrean/image-classifier…
LuchoTurtle Feb 13, 2024
e564de5
removed edge case and added indexes1,2.bin"
ndrean Feb 13, 2024
0ebdb38
removed edge case and added indexes_gen_test_1,2.bin"
ndrean Feb 13, 2024
282498b
Merge branch 'semantic' of https://github.com/ndrean/image-classifier…
LuchoTurtle Feb 13, 2024
fb3d122
chore: Remove unused test.
LuchoTurtle Feb 14, 2024
5548ea1
comments on geneserver init tests corrected
ndrean Feb 14, 2024
9af8a07
comments on genserver init tests corrected
ndrean Feb 14, 2024
1a2952c
chore: Removing unnecessary code. It won't ever be used because it's …
LuchoTurtle Feb 15, 2024
e8a2d4b
Merge branch 'semantic' of https://github.com/ndrean/image-classifier…
LuchoTurtle Feb 15, 2024
ccc582a
adding test on early stop Index empty
ndrean Feb 15, 2024
80247d3
chore: Changing test timeout, since it takes more than a minute.
LuchoTurtle Feb 15, 2024
2a7d127
Merge branch 'semantic' of https://github.com/ndrean/image-classifier…
LuchoTurtle Feb 15, 2024
d7ffec5
fix: Fixing failing test of notification when audio is uploaded on em…
LuchoTurtle Feb 15, 2024
1550ff4
add GenServer knn_search nil test
ndrean Feb 15, 2024
874e901
continue GS tests
ndrean Feb 16, 2024
685395a
end GS tests
ndrean Feb 16, 2024
19d6b2e
update bump
ndrean Feb 16, 2024
38d8d3d
tests doc
ndrean Feb 16, 2024
daf4a84
improved tests doc
ndrean Feb 16, 2024
8da6f40
tests on image operations
ndrean Feb 16, 2024
dddaae0
tests on image operations
ndrean Feb 16, 2024
dd49fb9
moved from Cowboy to Bandit & test correction
ndrean Feb 16, 2024
b9a4132
moved from Cowboy to Bandit & test correction
ndrean Feb 16, 2024
b85dbc1
chore: Adding resetting with empty indexes helper to tests and coveri…
LuchoTurtle Feb 17, 2024
666660a
chore: Adding failed index "please retry" test.
LuchoTurtle Feb 17, 2024
a9dffbd
fix: Fixing bucket error and testing it.
LuchoTurtle Feb 18, 2024
0aa1a3e
fix: Fixing upload error handling and partial image edge case tested.
LuchoTurtle Feb 18, 2024
77b3395
chore: Formatting hnswlib_index.ex
LuchoTurtle Feb 18, 2024
b5154ea
chore: Commenting and formatting knn_index.ex.
LuchoTurtle Feb 18, 2024
a8839f6
chore: Commenting and formatting models.
LuchoTurtle Feb 18, 2024
5fe4f7f
chore: Removing unused code and commenting.
LuchoTurtle Feb 18, 2024
7a68517
chore: Page_live.ex general formatting.
LuchoTurtle Feb 18, 2024
57d895a
chore: Formatting README (before image captioning).
LuchoTurtle Feb 19, 2024
540b08e
chore: Fixing some Image Captioning section errors.
LuchoTurtle Feb 19, 2024
58e915b
chore: Fixing typos and numbering.
LuchoTurtle Feb 19, 2024
da01dcd
chore: Formatting and fixing typos on the Semantic Search part.
LuchoTurtle Feb 19, 2024
777b412
chore: Formatting the README and adding `hnswlib_index` schema code.
LuchoTurtle Feb 19, 2024
40070c3
readme: Add section of image schema changes.
LuchoTurtle Feb 21, 2024
6f1fe4f
chore: Renaming socket assigns.
LuchoTurtle Feb 22, 2024
9993b98
readme: Adding section for page_live
LuchoTurtle Feb 22, 2024
90f0368
readme: Adding view section.
LuchoTurtle Feb 23, 2024
6fe7202
Merge branch 'main' into semantic
LuchoTurtle Feb 23, 2024
9d42ca9
fix: Fixing mix.lock
LuchoTurtle Feb 23, 2024
579c5fb
chore: Normalizing all loggers.
LuchoTurtle Feb 23, 2024
e1675ad
fix: Fixing models loading while testing and on prod.
LuchoTurtle Feb 23, 2024
8b0dca5
readme: Updating README.
LuchoTurtle Feb 23, 2024
12204a5
minor changes on redundant & shorter code: :if and remove dir creationH
ndrean Feb 24, 2024
2528176
Update README.md
ndrean Feb 24, 2024
a34f132
Update README.md
ndrean Feb 24, 2024
b2c7882
Update README.md
ndrean Feb 24, 2024
96f6ccd
Update README.md
ndrean Feb 24, 2024
72693bf
padding to record button
ndrean Feb 24, 2024
9a11eab
readme: Adding example gif.
LuchoTurtle Feb 24, 2024
e679569
Merge branch 'main' into semantic
LuchoTurtle Mar 3, 2024
a9c2d83
merge: Fixing conflicts.
LuchoTurtle Mar 3, 2024
7b5287d
chore: Not using cowboy.
LuchoTurtle Mar 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
688 changes: 686 additions & 2 deletions README.md

Large diffs are not rendered by default.

10 changes: 4 additions & 6 deletions assets/js/micro.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,23 +11,21 @@ export default {
blue = ["bg-blue-500", "hover:bg-blue-700"],
pulseGreen = ["bg-green-500", "hover:bg-green-700", "animate-pulse"];


_this = this;

// Adding event listener for "click" event
recordButton.addEventListener("click", () => {

// Check if it's recording.
// If it is, we stop the record and update the elements.
if (mediaRecorder && mediaRecorder.state === "recording") {
mediaRecorder.stop();
// audioChunks.getAudioTracks()[0].stop();
text.textContent = "Record";
}
}

// Otherwise, it means the user wants to start recording.
else {
navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {

// Instantiate MediaRecorder
mediaRecorder = new MediaRecorder(stream);
mediaRecorder.start();
Expand All @@ -39,7 +37,7 @@ export default {

// Add "dataavailable" event handler
mediaRecorder.addEventListener("dataavailable", (event) => {
audioChunks.push(event.data);
event.data.size > 0 && audioChunks.push(event.data);
});

// Add "stop" event handler for when the recording stops.
Expand All @@ -57,4 +55,4 @@ export default {
}
});
},
};
};
20 changes: 10 additions & 10 deletions lib/app/application.ex
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ defmodule App.Application do

@impl true
def start(_type, _args) do

App.Models.verify_and_download_models()

children = [
Expand All @@ -18,17 +17,18 @@ defmodule App.Application do
# Start the PubSub system
{Phoenix.PubSub, name: App.PubSub},
# Nx serving for the embedding
# App.TextEmbedding,
App.TextEmbedding,
App.KnnIndex,

# Nx serving for Speech-to-Text
{Nx.Serving,
serving:
if Application.get_env(:app, :use_test_models) == true do
App.Models.audio_serving_test()
else
App.Models.audio_serving()
end,
name: Whisper},
serving:
if Application.get_env(:app, :use_test_models) == true do
App.Models.audio_serving_test()
else
App.Models.audio_serving()
end,
name: Whisper},
# Nx serving for image classifier
{Nx.Serving,
serving:
Expand All @@ -39,7 +39,7 @@ defmodule App.Application do
end,
name: ImageClassifier},
{GenMagic.Server, name: :gen_magic},

# Adding a supervisor
{Task.Supervisor, name: App.TaskSupervisor},
# Start the Endpoint (http/https)
Expand Down
48 changes: 48 additions & 0 deletions lib/app/hnswlib_index.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
defmodule App.HnswlibIndex do
use Ecto.Schema

alias App.HnswlibIndex
alias App.Repo

require Logger

schema "hnswlib_index" do
field(:file, :binary)
end

def changeset(struct \\ %__MODULE__{}, params \\ %{}) do
struct
|> Ecto.Changeset.cast(params, [:id, :file])
|> Ecto.Changeset.validate_required([:id])
end

def save() do
path = App.KnnIndex.get_index_path()
file = File.read!(path)

Repo.get!(HnswlibIndex, 1)
|> HnswlibIndex.changeset(%{file: file})
|> Repo.update()
end

def maybe_load_index_from_db(space, dim, max_elements) do
Repo.get_by(HnswlibIndex, id: 1)
|> case do
nil ->
# create a singleton row
HnswlibIndex.changeset(%__MODULE__{}, %{id: 1})
|> Repo.insert()

Logger.info("New Index")
HNSWLib.Index.new(space, dim, max_elements)

index_file ->
Logger.info("Loading Index from DB")
path = App.KnnIndex.get_index_path()
# save on disk
File.write!(path, index_file.file)
# load the index file
HNSWLib.Index.load_index(space, dim, path)
end
end
end
35 changes: 31 additions & 4 deletions lib/app/image.ex
Original file line number Diff line number Diff line change
Expand Up @@ -8,26 +8,53 @@ defmodule App.Image do
field(:width, :integer)
field(:url, :string)
field(:height, :integer)
field(:idx, :integer)
field(:sha1, :string)

timestamps(type: :utc_datetime)
end

def changeset(image, params \\ %{}) do
image
|> Ecto.Changeset.cast(params, [:url, :description, :width, :height])
|> Ecto.Changeset.cast(params, [:url, :description, :width, :height, :idx, :sha1])
LuchoTurtle marked this conversation as resolved.
Show resolved Hide resolved
|> Ecto.Changeset.validate_required([:url, :description, :width, :height])
|> Ecto.Changeset.unique_constraint(:sha1, name: :images_sha1_index)
end

@doc """
Uploads the given image to S3
and adds the image information to the database.
"""
def insert(image) do
%Image{}
|> changeset(image)
|> Repo.insert!()
{:ok,
%Image{}
|> changeset(image)
|> Repo.insert!()}
end

def check_sha1(sha1) do
App.Repo.get_by(App.Image, %{sha1: sha1})
|> case do
nil ->
:ok

_ ->
nil
end
end

# def check_sha(image) do
# {:ok,
# App.Repo.get_by(App.Image, %{sha1: image.sha1})
# |> case do
# nil ->
# image

# _ ->
# nil
# end}
# end

@doc """
Uploads the given image to S3.
Returns {:ok, response} if the upload is successful.
Expand Down
42 changes: 42 additions & 0 deletions lib/app/knn_index.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
defmodule App.KnnIndex do
use GenServer

@indexes "indexes.bin"
@upload_dir Application.app_dir(:app, ["priv", "static", "uploads"])

def start_link(_) do
GenServer.start_link(__MODULE__, {}, name: __MODULE__)
end

def init(_) do
File.mkdir_p!(@upload_dir)

path = get_index_path()
space = :cosine
dim = 384
max_elements = 200

require Logger

case File.exists?(path) do
false ->
{:ok, _index} = App.HnswlibIndex.maybe_load_index_from_db(space, dim, max_elements)

true ->
Logger.info("Existing Index")
{:ok, _index} = HNSWLib.Index.load_index(space, dim, path)
end
end

def get_index_path do
Path.join([@upload_dir, @indexes])
end

def load_index do
GenServer.call(__MODULE__, :load)
end

def handle_call(:load, _from, state) do
{:reply, state, state}
end
end
20 changes: 0 additions & 20 deletions lib/app/speech_to_text.ex

This file was deleted.

38 changes: 38 additions & 0 deletions lib/app/text_embedding.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
defmodule App.TextEmbedding do
use GenServer

def start_link(_) do
GenServer.start_link(__MODULE__, {}, name: __MODULE__)
end

def init(_) do
model_info = nil
tokenizer = nil
{:ok, {model_info, tokenizer}, {:continue, :load}}
end

def handle_continue(:load, {_, _}) do
transformer = "sentence-transformers/paraphrase-MiniLM-L6-v2"

{:ok, %{model: _model, params: _params} = model_info} =
Bumblebee.load_model({:hf, transformer})

{:ok, tokenizer} =
Bumblebee.load_tokenizer({:hf, transformer})

require Logger
Logger.info("Transformer loaded")
{:noreply, {model_info, tokenizer}}
end

# called in Liveview `mount`
def serve() do
GenServer.call(__MODULE__, :serve)
end

def handle_call(:serve, _from, {model_info, tokenizer} = state) do
embedding_serving = Bumblebee.Text.TextEmbedding.text_embedding(model_info, tokenizer)

{:reply, embedding_serving, state}
end
end
Loading
Loading