-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
28 changed files
with
804 additions
and
11,443 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Contributing to `conda` | ||
|
||
Do you want to contribute to this project? Make sure to read this guidelines first :) | ||
|
||
## Issue | ||
|
||
**When to do it**: | ||
|
||
- You found bugs but you don't know how to solve them or don't have time/will to do the solve | ||
- You want new features but you don't know how to implement them or don't have time/will to do the implementation | ||
|
||
> ⚠️ _Always check open and closed issues before you submit yours to avoid duplicates_ | ||
**How to do it**: | ||
|
||
- Open an issue | ||
- Give the issue a meaningful title (short but effective problem description) | ||
- Describe the problem following the issue template | ||
|
||
## Traditional contribution | ||
|
||
**When to do it**: | ||
|
||
- You found bugs and corrected them | ||
- You optimized/improved the code | ||
- You added new features that you think could be useful to others | ||
|
||
**How to do it**: | ||
|
||
1. Fork this repository | ||
2. Commit your changes | ||
3. Submit pull request (make sure to provide a thorough description of the changes) | ||
|
||
|
||
## Showcase your PrAIvateSearch | ||
|
||
**When to do it**: | ||
|
||
- You modified the base application with new features but you don't want/can't merge them with the original PrAIvateSearch | ||
|
||
**How to do it**: | ||
|
||
- Go to [_GitHub Discussions > Show and tell_](https://github.com/AstraBert/PrAIvateSearch/discussions/categories/show-and-tell) page | ||
- Open a new discussion there, describing your PrAIvateSearch application | ||
|
||
### Thanks for contributing! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,134 +1,74 @@ | ||
<h1 align="center">qdurllm</h1> | ||
<h2 align="center">Search your favorite websites and chat with them, on your desktop🌐</h2> | ||
|
||
# Docs in active development!👷♀️ | ||
|
||
<div align="center"> | ||
<img src="https://img.shields.io/github/languages/top/AstraBert/qdurllm" alt="GitHub top language"> | ||
<img src="https://img.shields.io/github/commit-activity/t/AstraBert/qdurllm" alt="GitHub commit activity"> | ||
<img src="https://img.shields.io/badge/Status-stable-green" alt="Static Badge"> | ||
<img src="https://img.shields.io/badge/Release-v0.0.0-purple" alt="Static Badge"> | ||
<img src="https://img.shields.io/docker/image-size/astrabert/local-search-application | ||
" alt="Docker image size"> | ||
<img src="https://img.shields.io/badge/Supported_platforms-Windows/macOS/Linux-brown" alt="Static Badge"> | ||
<div> | ||
<img src="./imgs/qdurllm.png" alt="Flowchart" align="center"> | ||
<p><i>Flowchart for qdurllm</i></p> | ||
</div> | ||
</div> | ||
They will be soon available on: https://astrabert.github.io/qdurllm/ | ||
|
||
**qdurllm** (**Qd**rant **URL**s and **L**arge **L**anguage **M**odels) is a local search engine that lets you select and upload URL content to a vector database: after that, you can search, retrieve and chat with this content. | ||
In the meantime, refer to the **Quickstart guide** in this README! | ||
|
||
This is provisioned through a multi-container Docker application, leveraging Qdrant, Langchain, llama.cpp, quantized Gemma and Gradio. | ||
## Quickstart | ||
|
||
## Demo! | ||
### 1. Prerequisites | ||
|
||
Head over to the [demo space on HuggingFace](https://huggingface.co/spaces/as-cle-bert/qdurllm-demo)🦀 | ||
- [`conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) package manager | ||
- [`docker`](https://www.docker.com/) and [`docker compose`](https://docs.docker.com/compose/). | ||
|
||
## Requirements | ||
### 2. Installation | ||
|
||
The only requirement is to have `docker` and `docker-compose`. | ||
> [!IMPORTANT] | ||
> _This is only for the pre-release of `v1.0.0`, i.e. `v1.0.0-rc.0`_ | ||
If you don't have them, make sure to install them [here](https://docs.docker.com/get-docker/). | ||
|
||
## Installation | ||
|
||
You can install the application by cloning the GitHub repository | ||
1. Clone the `january-2025` branch of this GitHub repo: | ||
|
||
```bash | ||
git clone https://github.com/AstraBert/qdurllm.git | ||
cd qdurllm | ||
git clone -b january-2025 --single-branch https://github.com/AstraBert/qdurllm.git | ||
cd qdurllm/ | ||
``` | ||
|
||
Or you can simply paste the following text into a `compose.yaml` file: | ||
|
||
```yaml | ||
networks: | ||
mynet: | ||
driver: bridge | ||
services: | ||
local-search-application: | ||
image: astrabert/local-search-application | ||
networks: | ||
- mynet | ||
ports: | ||
- "7860:7860" | ||
qdrant: | ||
image: qdrant/qdrant | ||
ports: | ||
- "6333:6333" | ||
volumes: | ||
- "./qdrant_storage:/qdrant/storage" | ||
networks: | ||
- mynet | ||
llama_server: | ||
image: astrabert/llama.cpp-gemma | ||
ports: | ||
- "8000:8000" | ||
networks: | ||
- mynet | ||
``` | ||
2. Create the `conda` environment: | ||
|
||
Placing the file in whatever directory you want in your file system. | ||
```bash | ||
conda env create -f environment.yml | ||
``` | ||
|
||
Prior to running the application, you can optionally pull all the needed images from Docker hub: | ||
3. Pull `qdrant` from Docker Hub: | ||
|
||
```bash | ||
docker pull qdrant/qdrant | ||
docker pull astrabert/llama.cpp-gemma | ||
docker pull astrabert/local-search-application | ||
``` | ||
|
||
## How does it work? | ||
|
||
When launched (see [Usage](#usage)), the application runs three containers: | ||
|
||
- `qdrant`(port 6333): serves as vector database provider for semantic search-based retrieval | ||
- `llama.cpp-gemma`(port 8000): this is an implementation of a [quantized Gemma model](https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF) provided by LMStudio and Google, served with `llama.cpp` server. This works for text-generation scopes, enriching the search experience of the user. | ||
- `local-search-application`(port 7860): a Gradio tabbed interface with: | ||
+ The possibility to upload one or multiple contents by specifying the URL (thanks to Langchain) | ||
+ The possibility to chat with the uploaded URLs thanks to `llama.cpp-gemma` | ||
+ The possibility to perform a direct search that leverages double-layered retrieval with `all-MiniLM-L6-v2` (that identifies the 10 best matches) and `sentence-t5-base` (that re-encodes the 10 best matches and extracts the best hit from them) - this is the same RAG implementation used in combination with `llama.cpp-gemma`. Wanna see how double-layered RAG performs compared to single-layered RAG? Head over [here](./double-layered-rag-benchmarks/)! | ||
|
||
> _The overall computational burden is light enough to make the application run not only GPUless, but also with low RAM availability (>=8GB, although it can take up to 10 mins for Gemma to respond on 8GB RAM)._ | ||
## Usage | ||
|
||
### Run it | ||
### 3. Launching | ||
|
||
You can make the application work with the following - really simple - command, which has to be run within the same directory where you stored your `compose.yaml` file: | ||
1. Launch `qdrant` vector database services with `docker compose` (from within the `qdurllm` folder): | ||
|
||
```bash | ||
docker compose up -d | ||
docker compose up | ||
``` | ||
|
||
If you've already pulled all the images, you'll find the application running at `http://localhost:7860` or `http://0.0.0.0:7860` in less than a minute. | ||
2. Activate the `qdurllm` conda environment you just created: | ||
|
||
If you have not pulled the images, you'll have to wait that their installation is complete before actually using the application. | ||
|
||
### Use it | ||
|
||
Once the app is loaded, you'll find a first tab in which you can write the URLs whose content you want to interact with: | ||
|
||
![upload_URLs](./imgs/tutorial1.png) | ||
|
||
Now that your URLs are uploaded, you can either chat with their content through `llama.cpp-gemma`: | ||
|
||
![chat_with_URLs](./imgs/tutorial2.png) | ||
```bash | ||
conda activate qdurllm | ||
``` | ||
|
||
> _Note that you can also set parameters like maximum output tokens, temperature, repetition penalty and generation seed_ | ||
3. Go inside the `app` directory and launch the Gradio application: | ||
|
||
Or you can use double-layered-retrieval semantic search to query your URL content(s) directly: | ||
```bash | ||
cd app/ | ||
python3 app.py | ||
``` | ||
|
||
![direct_search](./imgs/tutorial3.png) | ||
You should see the app running on `http://localhost:7860` once all the models are downloaded from HuggingFace Hub. | ||
|
||
## License and rights of usage | ||
## Relies on | ||
|
||
The software is (and will always be) open-source, provided under [MIT license](./LICENSE). | ||
- [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct), with Apache 2.0 license | ||
- [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base), with Apache 2.0 license | ||
- [prithivida/Splade_PP_en_v1](https://huggingface.co/prithivida/Splade_PP_en_v1), with Apache 2.0 license | ||
|
||
Anyone can use, modify and redistribute any portion of it, as long as the author, [Astra Clelia Bertelli](https://astrabert.vercel.app) is cited. | ||
|
||
## Contributions and funding | ||
## Give feedback! | ||
|
||
Contribution are always more than welcome! Feel free to flag issues, open PRs or [contact the author](mailto:[email protected]) to suggest any changes, request features or improve the code. | ||
Comment on the [**discussion thread created for this release**](https://github.com/AstraBert/qdurllm/discussions) with your feedback or create [**issues**](https://github.com/AstraBert/qdurllm/issues) :) | ||
|
||
If you found the application useful, please consider [funding it](https://github.com/sponsors/AstraBert) in order to allow improvements! |
This file was deleted.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
from rag import client, SemanticCache, NeuralSearcher, dense_encoder, sparse_encoder | ||
from texInference import pipe | ||
from loadUrls import urlload, to_db | ||
import gradio as gr | ||
import time | ||
|
||
|
||
searcher = NeuralSearcher("memory", client, dense_encoder, sparse_encoder) | ||
semantic_cache = SemanticCache(client, dense_encoder, "semantic_cache") | ||
|
||
|
||
def upload2qdrant(url): | ||
global client | ||
documents = urlload(url) | ||
if type(documents) == list: | ||
try: | ||
to_db(documents) | ||
return "URLs successfully uploaded to Qdrant collection!" | ||
except Exception as e: | ||
return f"An error occured: {e}" | ||
else: | ||
return documents | ||
|
||
demo0 = gr.Interface(fn=upload2qdrant, title="Upload URL content to Qdrant", inputs=gr.Textbox(label="URL(s)", info="Add one URL or more (if more, you should provide them comma-separated, like this: URL1,URL2,...,URLn)"), outputs=gr.Textbox(label="Logs")) | ||
|
||
|
||
def reply(message, history, ntokens, rep_pen, temp, topp, systemins): | ||
sr = semantic_cache.search_cache(message) | ||
if sr: | ||
response = sr | ||
this_hist = '' | ||
for c in response: | ||
this_hist+=c | ||
time.sleep(0.001) | ||
yield this_hist | ||
else: | ||
context, url = searcher.search_text(message) | ||
prompt = [{"role": "system", "content": systemins}, {"role": "user", "content": f"This is the context information to reply to my prompt:\n\n{context}"}, {"role": "user", "content": message}] | ||
results = pipe(prompt, temp, topp, ntokens, rep_pen) | ||
results = results.split("<|im_start|>assistant\n")[1] | ||
response = results.replace("<|im_end|>", "") | ||
semantic_cache.upload_to_cache(message, response) | ||
this_hist = '' | ||
for c in response: | ||
this_hist+=c | ||
time.sleep(0.001) | ||
yield this_hist | ||
|
||
def direct_search(input_text): | ||
context, url = searcher.search_text(input_text) | ||
return context, f"Reference website [here]({url})" | ||
|
||
demo2 = gr.Interface(fn=direct_search, inputs=gr.Textbox(label="Search Query", placeholder="Input your search query here...", ), outputs=[gr.Textbox(label="Retrieved Content"), gr.Markdown(label="URL")], title="Search your URLs") | ||
|
||
user_max_new_tokens = gr.Slider(0, 4096, value=512, label="Max new tokens", info="Select max output tokens (higher number of tokens will result in a longer latency)") | ||
user_max_temperature = gr.Slider(0, 1, value=0.1, step=0.1, label="Temperature", info="Select generation temperature") | ||
user_max_rep_pen = gr.Slider(0, 10, value=1.2, step=0.1, label="Repetition penalty", info="Select repetition penalty") | ||
user_top_p = gr.Slider(0.1, 1, value=1, step=0.1, label="top_p", info="Select top_p for the generation") | ||
system_ins = gr.Textbox(label="System Prompt", info="Insert your system prompt here", value="You are an helpful web searching assistant. You reply based on the contextual information you are provided with and on your knowledge.") | ||
additional_accordion = gr.Accordion(label="Parameters to be set before you start chatting", open=True) | ||
demo1 = gr.ChatInterface(fn=reply, title="Chat with your URLs", additional_inputs=[user_max_new_tokens, user_max_temperature, user_max_rep_pen, user_top_p, system_ins], additional_inputs_accordion=additional_accordion) | ||
|
||
my_theme = gr.themes.Soft(primary_hue=gr.themes.colors.rose, secondary_hue=gr.themes.colors.pink) | ||
|
||
demo = gr.TabbedInterface([demo0, demo1, demo2], ["Upload URLs", "Chat with URLs", "Direct Search"], theme=my_theme) | ||
|
||
if __name__ == "__main__": | ||
demo.launch(server_name="0.0.0.0", server_port=7860) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
from langchain_community.document_loaders.url import UnstructuredURLLoader | ||
from langchain.text_splitter import CharacterTextSplitter | ||
from rag import upload_text_to_qdrant, client | ||
from typing import List, Dict | ||
|
||
def urlload(urls: str) -> List[Dict[str,str]]: | ||
links = urls.split(",") | ||
try: | ||
loader = UnstructuredURLLoader( | ||
urls=links, method="elements", | ||
strategy="fast" | ||
) | ||
docs = loader.load() | ||
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) | ||
pages = text_splitter.split_documents(docs) | ||
contents = [{"text": pages[i].page_content, "url": pages[i].metadata["source"]} for i in range(len(pages))] | ||
return contents | ||
except Exception as e: | ||
return f"An error occurred while parsing the URLs: {e}" | ||
|
||
|
||
def to_db(contents = List[Dict[str, str]]) -> None: | ||
c = 0 | ||
for content in contents: | ||
upload_text_to_qdrant(client, "memory", content, c) | ||
c+=1 | ||
return | ||
|
||
|
Oops, something went wrong.