candidate release january 2025

AstraBert · Jan 14, 2025 · d95999e · d95999e
1 parent 6d6951f
commit d95999e
Show file tree

Hide file tree

Showing 28 changed files with 804 additions and 11,443 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,46 @@
+# Contributing to `conda`
+
+Do you want to contribute to this project? Make sure to read this guidelines first :)
+
+## Issue
+
+**When to do it**:
+
+- You found bugs but you don't know how to solve them or don't have time/will to do the solve
+- You want new features but you don't know how to implement them or don't have time/will to do the implementation
+
+> ⚠️ _Always check open and closed issues before you submit yours to avoid duplicates_
+
+**How to do it**:
+
+- Open an issue
+- Give the issue a meaningful title (short but effective problem description)
+- Describe the problem following the issue template
+
+## Traditional contribution
+
+**When to do it**:
+
+- You found bugs and corrected them
+- You optimized/improved the code
+- You added new features that you think could be useful to others
+
+**How to do it**:
+
+1. Fork this repository
+2. Commit your changes
+3. Submit pull request (make sure to provide a thorough description of the changes)
+
+
+## Showcase your PrAIvateSearch
+
+**When to do it**:
+
+- You modified the base application with new features but you don't want/can't merge them with the original PrAIvateSearch
+
+**How to do it**:
+
+- Go to [_GitHub Discussions > Show and tell_](https://github.com/AstraBert/PrAIvateSearch/discussions/categories/show-and-tell) page
+- Open a new discussion there, describing your PrAIvateSearch application
+
+### Thanks for contributing!
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2024 Astra Clelia Bertelli
+Copyright (c) 2025 Clelia (Astra) Bertelli
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -1,134 +1,74 @@
 <h1 align="center">qdurllm</h1>
 <h2 align="center">Search your favorite websites and chat with them, on your desktop🌐</h2>
 
+# Docs in active development!👷‍♀️
 
-<div align="center">
-    <img src="https://img.shields.io/github/languages/top/AstraBert/qdurllm" alt="GitHub top language">
-   <img src="https://img.shields.io/github/commit-activity/t/AstraBert/qdurllm" alt="GitHub commit activity">
-   <img src="https://img.shields.io/badge/Status-stable-green" alt="Static Badge">
-   <img src="https://img.shields.io/badge/Release-v0.0.0-purple" alt="Static Badge">
-   <img src="https://img.shields.io/docker/image-size/astrabert/local-search-application
-   " alt="Docker image size">
-   <img src="https://img.shields.io/badge/Supported_platforms-Windows/macOS/Linux-brown" alt="Static Badge">
-   <div>
-        <img src="./imgs/qdurllm.png" alt="Flowchart" align="center">
-        <p><i>Flowchart for qdurllm</i></p>
-   </div>
-</div>
+They will be soon available on: https://astrabert.github.io/qdurllm/
 
-**qdurllm** (**Qd**rant **URL**s and **L**arge **L**anguage **M**odels) is a local search engine that lets you select and upload URL content to a vector database: after that, you can search, retrieve and chat with this content.
+In the meantime, refer to the **Quickstart guide** in this README!
 
-This is provisioned through a multi-container Docker application, leveraging Qdrant, Langchain, llama.cpp, quantized Gemma and Gradio.
+## Quickstart
 
-## Demo!
+### 1. Prerequisites
 
-Head over to the [demo space on HuggingFace](https://huggingface.co/spaces/as-cle-bert/qdurllm-demo)🦀
+- [`conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) package manager
+- [`docker`](https://www.docker.com/) and [`docker compose`](https://docs.docker.com/compose/).
 
-## Requirements
+### 2. Installation
 
-The only requirement is to have `docker` and `docker-compose`.
+> [!IMPORTANT]
+> _This is only for the pre-release of `v1.0.0`, i.e. `v1.0.0-rc.0`_
 
-If you don't have them, make sure to install them [here](https://docs.docker.com/get-docker/).
-
-## Installation
-
-You can install the application by cloning the GitHub repository
+1. Clone the `january-2025` branch of this GitHub repo:
 
 ```bash
-git clone https://github.com/AstraBert/qdurllm.git
-cd qdurllm
+git clone -b january-2025 --single-branch https://github.com/AstraBert/qdurllm.git
+cd qdurllm/
 ```
 
-Or you can simply paste the following text into a `compose.yaml` file:
-
-```yaml
-networks:
-  mynet:
-    driver: bridge
-services:
-  local-search-application:
-    image: astrabert/local-search-application
-    networks:
-      - mynet
-    ports:
-      - "7860:7860"
-  qdrant:
-    image: qdrant/qdrant
-    ports:
-      - "6333:6333"
-    volumes:
-      - "./qdrant_storage:/qdrant/storage"
-    networks:
-      - mynet
-  llama_server:
-    image: astrabert/llama.cpp-gemma
-    ports:
-      - "8000:8000"
-    networks:
-      - mynet
-```
+2. Create the `conda` environment:
 
-Placing the file in whatever directory you want in your file system.
+```bash
+conda env create -f environment.yml
+```
 
-Prior to running the application, you can optionally pull all the needed images from Docker hub:
+3. Pull `qdrant` from Docker Hub:
 
 ```bash
 docker pull qdrant/qdrant
-docker pull astrabert/llama.cpp-gemma
-docker pull astrabert/local-search-application
 ```
 
-## How does it work?
-
-When launched (see [Usage](#usage)), the application runs three containers:
-
-- `qdrant`(port 6333): serves as vector database provider for semantic search-based retrieval
-- `llama.cpp-gemma`(port 8000): this is an implementation of a [quantized Gemma model](https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF) provided by LMStudio and Google, served with `llama.cpp` server. This works for text-generation scopes, enriching the search experience of the user.
-- `local-search-application`(port 7860): a Gradio tabbed interface with:
-    + The possibility to upload one or multiple contents by specifying the URL (thanks to Langchain)
-    + The possibility to chat with the uploaded URLs thanks to `llama.cpp-gemma`
-    + The possibility to perform a direct search that leverages double-layered retrieval with `all-MiniLM-L6-v2` (that identifies the 10 best matches) and `sentence-t5-base` (that re-encodes the 10 best matches and extracts the best hit from them) - this is the same RAG implementation used in combination with `llama.cpp-gemma`. Wanna see how double-layered RAG performs compared to single-layered RAG? Head over [here](./double-layered-rag-benchmarks/)!
-
-> _The overall computational burden is light enough to make the application run not only GPUless, but also with low RAM availability (>=8GB, although it can take up to 10 mins for Gemma to respond on 8GB RAM)._ 
-
-## Usage
-
-### Run it
+### 3. Launching
 
-You can make the application work with the following - really simple - command, which has to be run within the same directory where you stored your `compose.yaml` file:
+1. Launch `qdrant` vector database services with `docker compose` (from within the `qdurllm` folder):
 
 ```bash
-docker compose up -d
+docker compose up
 ```
 
-If you've already pulled all the images, you'll find the application running at `http://localhost:7860` or `http://0.0.0.0:7860` in less than a minute. 
+2. Activate the `qdurllm` conda environment you just created:
 
-If you have not pulled the images, you'll have to wait that their installation is complete before actually using the application.
-
-### Use it
-
-Once the app is loaded, you'll find a first tab in which you can write the URLs whose content you want to interact with:
-
-![upload_URLs](./imgs/tutorial1.png)
-
-Now that your URLs are uploaded, you can either chat with their content through `llama.cpp-gemma`:
-
-![chat_with_URLs](./imgs/tutorial2.png)
+```bash
+conda activate qdurllm
+```
 
-> _Note that you can also set parameters like maximum output tokens, temperature, repetition penalty and generation seed_
+3. Go inside the `app` directory and launch the Gradio application:
 
-Or you can use double-layered-retrieval semantic search to query your URL content(s) directly:
+```bash
+cd app/
+python3 app.py
+```
 
-![direct_search](./imgs/tutorial3.png)
+You should see the app running on `http://localhost:7860` once all the models are downloaded from HuggingFace Hub.
 
-## License and rights of usage
+## Relies on
 
-The software is (and will always be) open-source, provided under [MIT license](./LICENSE).
+- [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct), with Apache 2.0 license
+- [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base), with Apache 2.0 license
+- [prithivida/Splade_PP_en_v1](https://huggingface.co/prithivida/Splade_PP_en_v1), with Apache 2.0 license
 
-Anyone can use, modify and redistribute any portion of it, as long as the author, [Astra Clelia Bertelli](https://astrabert.vercel.app) is cited.
 
-## Contributions and funding
+## Give feedback!
 
-Contribution are always more than welcome! Feel free to flag issues, open PRs or [contact the author](mailto:[email protected]) to suggest any changes, request features or improve the code.
+Comment on the [**discussion thread created for this release**](https://github.com/AstraBert/qdurllm/discussions) with your feedback or create [**issues**](https://github.com/AstraBert/qdurllm/issues) :)
 
-If you found the application useful, please consider [funding it](https://github.com/sponsors/AstraBert) in order to allow improvements!
diff --git a/_config.yml b/_config.yml
diff --git a/app/__pycache__/loadUrls.cpython-312.pyc b/app/__pycache__/loadUrls.cpython-312.pyc
diff --git a/app/__pycache__/rag.cpython-312.pyc b/app/__pycache__/rag.cpython-312.pyc
diff --git a/app/__pycache__/texInference.cpython-312.pyc b/app/__pycache__/texInference.cpython-312.pyc
diff --git a/app/app.py b/app/app.py
@@ -0,0 +1,68 @@
+from rag import client, SemanticCache, NeuralSearcher, dense_encoder, sparse_encoder
+from texInference import pipe
+from loadUrls import urlload, to_db
+import gradio as gr
+import time
+
+
+searcher = NeuralSearcher("memory", client, dense_encoder, sparse_encoder)
+semantic_cache = SemanticCache(client, dense_encoder, "semantic_cache")
+
+
+def upload2qdrant(url):
+    global client
+    documents = urlload(url)
+    if type(documents) == list:
+        try:
+            to_db(documents)
+            return "URLs successfully uploaded to Qdrant collection!"
+        except Exception as e:
+            return f"An error occured: {e}"
+    else:
+        return documents
+
+demo0 = gr.Interface(fn=upload2qdrant, title="Upload URL content to Qdrant", inputs=gr.Textbox(label="URL(s)", info="Add one URL or more (if more, you should provide them comma-separated, like this: URL1,URL2,...,URLn)"), outputs=gr.Textbox(label="Logs"))
+
+
+def reply(message, history, ntokens, rep_pen, temp, topp, systemins):
+    sr = semantic_cache.search_cache(message)
+    if sr:
+        response = sr
+        this_hist = ''
+        for c in response:
+            this_hist+=c
+            time.sleep(0.001)
+            yield this_hist
+    else:
+        context, url = searcher.search_text(message)
+        prompt = [{"role": "system", "content": systemins}, {"role": "user", "content": f"This is the context information to reply to my prompt:\n\n{context}"}, {"role": "user", "content": message}]
+        results = pipe(prompt, temp, topp, ntokens, rep_pen)
+        results = results.split("<|im_start|>assistant\n")[1]
+        response = results.replace("<|im_end|>", "")
+        semantic_cache.upload_to_cache(message, response)
+        this_hist = ''
+        for c in response:
+            this_hist+=c
+            time.sleep(0.001)
+            yield this_hist
+
+def direct_search(input_text):
+    context, url = searcher.search_text(input_text)
+    return context, f"Reference website [here]({url})"
+
+demo2 = gr.Interface(fn=direct_search, inputs=gr.Textbox(label="Search Query", placeholder="Input your search query here...", ), outputs=[gr.Textbox(label="Retrieved Content"), gr.Markdown(label="URL")], title="Search your URLs")
+
+user_max_new_tokens = gr.Slider(0, 4096, value=512, label="Max new tokens", info="Select max output tokens (higher number of tokens will result in a longer latency)")
+user_max_temperature = gr.Slider(0, 1, value=0.1, step=0.1, label="Temperature", info="Select generation temperature")
+user_max_rep_pen = gr.Slider(0, 10, value=1.2, step=0.1, label="Repetition penalty", info="Select repetition penalty")
+user_top_p = gr.Slider(0.1, 1, value=1, step=0.1, label="top_p", info="Select top_p for the generation")
+system_ins = gr.Textbox(label="System Prompt", info="Insert your system prompt here", value="You are an helpful web searching assistant. You reply based on the contextual information you are provided with and on your knowledge.")
+additional_accordion = gr.Accordion(label="Parameters to be set before you start chatting", open=True)
+demo1 = gr.ChatInterface(fn=reply, title="Chat with your URLs", additional_inputs=[user_max_new_tokens, user_max_temperature, user_max_rep_pen, user_top_p, system_ins], additional_inputs_accordion=additional_accordion)
+
+my_theme = gr.themes.Soft(primary_hue=gr.themes.colors.rose, secondary_hue=gr.themes.colors.pink)
+
+demo = gr.TabbedInterface([demo0, demo1, demo2], ["Upload URLs", "Chat with URLs", "Direct Search"], theme=my_theme)
+
+if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7860)
diff --git a/app/loadUrls.py b/app/loadUrls.py
@@ -0,0 +1,29 @@
+from langchain_community.document_loaders.url import UnstructuredURLLoader
+from langchain.text_splitter import CharacterTextSplitter
+from rag import upload_text_to_qdrant, client
+from typing import List, Dict
+
+def urlload(urls: str) -> List[Dict[str,str]]:
+    links = urls.split(",")
+    try:
+        loader = UnstructuredURLLoader(
+            urls=links, method="elements", 
+            strategy="fast"
+        )
+        docs = loader.load()
+        text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
+        pages = text_splitter.split_documents(docs)
+        contents = [{"text": pages[i].page_content, "url": pages[i].metadata["source"]} for i in range(len(pages))]
+        return contents
+    except Exception as e:
+        return f"An error occurred while parsing the URLs: {e}"
+
+
+def to_db(contents = List[Dict[str, str]]) -> None:
+    c = 0
+    for content in contents:
+        upload_text_to_qdrant(client, "memory", content, c)
+        c+=1
+    return
+
+