Skip to content

Commit

Permalink
candidate release january 2025
Browse files Browse the repository at this point in the history
  • Loading branch information
AstraBert committed Jan 14, 2025
1 parent 6d6951f commit d95999e
Show file tree
Hide file tree
Showing 28 changed files with 804 additions and 11,443 deletions.
46 changes: 46 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Contributing to `conda`

Do you want to contribute to this project? Make sure to read this guidelines first :)

## Issue

**When to do it**:

- You found bugs but you don't know how to solve them or don't have time/will to do the solve
- You want new features but you don't know how to implement them or don't have time/will to do the implementation

> ⚠️ _Always check open and closed issues before you submit yours to avoid duplicates_
**How to do it**:

- Open an issue
- Give the issue a meaningful title (short but effective problem description)
- Describe the problem following the issue template

## Traditional contribution

**When to do it**:

- You found bugs and corrected them
- You optimized/improved the code
- You added new features that you think could be useful to others

**How to do it**:

1. Fork this repository
2. Commit your changes
3. Submit pull request (make sure to provide a thorough description of the changes)


## Showcase your PrAIvateSearch

**When to do it**:

- You modified the base application with new features but you don't want/can't merge them with the original PrAIvateSearch

**How to do it**:

- Go to [_GitHub Discussions > Show and tell_](https://github.com/AstraBert/PrAIvateSearch/discussions/categories/show-and-tell) page
- Open a new discussion there, describing your PrAIvateSearch application

### Thanks for contributing!
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2024 Astra Clelia Bertelli
Copyright (c) 2025 Clelia (Astra) Bertelli

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
134 changes: 37 additions & 97 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,134 +1,74 @@
<h1 align="center">qdurllm</h1>
<h2 align="center">Search your favorite websites and chat with them, on your desktop🌐</h2>

# Docs in active development!👷‍♀️

<div align="center">
<img src="https://img.shields.io/github/languages/top/AstraBert/qdurllm" alt="GitHub top language">
<img src="https://img.shields.io/github/commit-activity/t/AstraBert/qdurllm" alt="GitHub commit activity">
<img src="https://img.shields.io/badge/Status-stable-green" alt="Static Badge">
<img src="https://img.shields.io/badge/Release-v0.0.0-purple" alt="Static Badge">
<img src="https://img.shields.io/docker/image-size/astrabert/local-search-application
" alt="Docker image size">
<img src="https://img.shields.io/badge/Supported_platforms-Windows/macOS/Linux-brown" alt="Static Badge">
<div>
<img src="./imgs/qdurllm.png" alt="Flowchart" align="center">
<p><i>Flowchart for qdurllm</i></p>
</div>
</div>
They will be soon available on: https://astrabert.github.io/qdurllm/

**qdurllm** (**Qd**rant **URL**s and **L**arge **L**anguage **M**odels) is a local search engine that lets you select and upload URL content to a vector database: after that, you can search, retrieve and chat with this content.
In the meantime, refer to the **Quickstart guide** in this README!

This is provisioned through a multi-container Docker application, leveraging Qdrant, Langchain, llama.cpp, quantized Gemma and Gradio.
## Quickstart

## Demo!
### 1. Prerequisites

Head over to the [demo space on HuggingFace](https://huggingface.co/spaces/as-cle-bert/qdurllm-demo)🦀
- [`conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) package manager
- [`docker`](https://www.docker.com/) and [`docker compose`](https://docs.docker.com/compose/).

## Requirements
### 2. Installation

The only requirement is to have `docker` and `docker-compose`.
> [!IMPORTANT]
> _This is only for the pre-release of `v1.0.0`, i.e. `v1.0.0-rc.0`_
If you don't have them, make sure to install them [here](https://docs.docker.com/get-docker/).

## Installation

You can install the application by cloning the GitHub repository
1. Clone the `january-2025` branch of this GitHub repo:

```bash
git clone https://github.com/AstraBert/qdurllm.git
cd qdurllm
git clone -b january-2025 --single-branch https://github.com/AstraBert/qdurllm.git
cd qdurllm/
```

Or you can simply paste the following text into a `compose.yaml` file:

```yaml
networks:
mynet:
driver: bridge
services:
local-search-application:
image: astrabert/local-search-application
networks:
- mynet
ports:
- "7860:7860"
qdrant:
image: qdrant/qdrant
ports:
- "6333:6333"
volumes:
- "./qdrant_storage:/qdrant/storage"
networks:
- mynet
llama_server:
image: astrabert/llama.cpp-gemma
ports:
- "8000:8000"
networks:
- mynet
```
2. Create the `conda` environment:

Placing the file in whatever directory you want in your file system.
```bash
conda env create -f environment.yml
```

Prior to running the application, you can optionally pull all the needed images from Docker hub:
3. Pull `qdrant` from Docker Hub:

```bash
docker pull qdrant/qdrant
docker pull astrabert/llama.cpp-gemma
docker pull astrabert/local-search-application
```

## How does it work?

When launched (see [Usage](#usage)), the application runs three containers:

- `qdrant`(port 6333): serves as vector database provider for semantic search-based retrieval
- `llama.cpp-gemma`(port 8000): this is an implementation of a [quantized Gemma model](https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF) provided by LMStudio and Google, served with `llama.cpp` server. This works for text-generation scopes, enriching the search experience of the user.
- `local-search-application`(port 7860): a Gradio tabbed interface with:
+ The possibility to upload one or multiple contents by specifying the URL (thanks to Langchain)
+ The possibility to chat with the uploaded URLs thanks to `llama.cpp-gemma`
+ The possibility to perform a direct search that leverages double-layered retrieval with `all-MiniLM-L6-v2` (that identifies the 10 best matches) and `sentence-t5-base` (that re-encodes the 10 best matches and extracts the best hit from them) - this is the same RAG implementation used in combination with `llama.cpp-gemma`. Wanna see how double-layered RAG performs compared to single-layered RAG? Head over [here](./double-layered-rag-benchmarks/)!

> _The overall computational burden is light enough to make the application run not only GPUless, but also with low RAM availability (>=8GB, although it can take up to 10 mins for Gemma to respond on 8GB RAM)._
## Usage

### Run it
### 3. Launching

You can make the application work with the following - really simple - command, which has to be run within the same directory where you stored your `compose.yaml` file:
1. Launch `qdrant` vector database services with `docker compose` (from within the `qdurllm` folder):

```bash
docker compose up -d
docker compose up
```

If you've already pulled all the images, you'll find the application running at `http://localhost:7860` or `http://0.0.0.0:7860` in less than a minute.
2. Activate the `qdurllm` conda environment you just created:

If you have not pulled the images, you'll have to wait that their installation is complete before actually using the application.

### Use it

Once the app is loaded, you'll find a first tab in which you can write the URLs whose content you want to interact with:

![upload_URLs](./imgs/tutorial1.png)

Now that your URLs are uploaded, you can either chat with their content through `llama.cpp-gemma`:

![chat_with_URLs](./imgs/tutorial2.png)
```bash
conda activate qdurllm
```

> _Note that you can also set parameters like maximum output tokens, temperature, repetition penalty and generation seed_
3. Go inside the `app` directory and launch the Gradio application:

Or you can use double-layered-retrieval semantic search to query your URL content(s) directly:
```bash
cd app/
python3 app.py
```

![direct_search](./imgs/tutorial3.png)
You should see the app running on `http://localhost:7860` once all the models are downloaded from HuggingFace Hub.

## License and rights of usage
## Relies on

The software is (and will always be) open-source, provided under [MIT license](./LICENSE).
- [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct), with Apache 2.0 license
- [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base), with Apache 2.0 license
- [prithivida/Splade_PP_en_v1](https://huggingface.co/prithivida/Splade_PP_en_v1), with Apache 2.0 license

Anyone can use, modify and redistribute any portion of it, as long as the author, [Astra Clelia Bertelli](https://astrabert.vercel.app) is cited.

## Contributions and funding
## Give feedback!

Contribution are always more than welcome! Feel free to flag issues, open PRs or [contact the author](mailto:[email protected]) to suggest any changes, request features or improve the code.
Comment on the [**discussion thread created for this release**](https://github.com/AstraBert/qdurllm/discussions) with your feedback or create [**issues**](https://github.com/AstraBert/qdurllm/issues) :)

If you found the application useful, please consider [funding it](https://github.com/sponsors/AstraBert) in order to allow improvements!
1 change: 0 additions & 1 deletion _config.yml

This file was deleted.

Binary file added app/__pycache__/loadUrls.cpython-312.pyc
Binary file not shown.
Binary file added app/__pycache__/rag.cpython-312.pyc
Binary file not shown.
Binary file added app/__pycache__/texInference.cpython-312.pyc
Binary file not shown.
68 changes: 68 additions & 0 deletions app/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
from rag import client, SemanticCache, NeuralSearcher, dense_encoder, sparse_encoder
from texInference import pipe
from loadUrls import urlload, to_db
import gradio as gr
import time


searcher = NeuralSearcher("memory", client, dense_encoder, sparse_encoder)
semantic_cache = SemanticCache(client, dense_encoder, "semantic_cache")


def upload2qdrant(url):
global client
documents = urlload(url)
if type(documents) == list:
try:
to_db(documents)
return "URLs successfully uploaded to Qdrant collection!"
except Exception as e:
return f"An error occured: {e}"
else:
return documents

demo0 = gr.Interface(fn=upload2qdrant, title="Upload URL content to Qdrant", inputs=gr.Textbox(label="URL(s)", info="Add one URL or more (if more, you should provide them comma-separated, like this: URL1,URL2,...,URLn)"), outputs=gr.Textbox(label="Logs"))


def reply(message, history, ntokens, rep_pen, temp, topp, systemins):
sr = semantic_cache.search_cache(message)
if sr:
response = sr
this_hist = ''
for c in response:
this_hist+=c
time.sleep(0.001)
yield this_hist
else:
context, url = searcher.search_text(message)
prompt = [{"role": "system", "content": systemins}, {"role": "user", "content": f"This is the context information to reply to my prompt:\n\n{context}"}, {"role": "user", "content": message}]
results = pipe(prompt, temp, topp, ntokens, rep_pen)
results = results.split("<|im_start|>assistant\n")[1]
response = results.replace("<|im_end|>", "")
semantic_cache.upload_to_cache(message, response)
this_hist = ''
for c in response:
this_hist+=c
time.sleep(0.001)
yield this_hist

def direct_search(input_text):
context, url = searcher.search_text(input_text)
return context, f"Reference website [here]({url})"

demo2 = gr.Interface(fn=direct_search, inputs=gr.Textbox(label="Search Query", placeholder="Input your search query here...", ), outputs=[gr.Textbox(label="Retrieved Content"), gr.Markdown(label="URL")], title="Search your URLs")

user_max_new_tokens = gr.Slider(0, 4096, value=512, label="Max new tokens", info="Select max output tokens (higher number of tokens will result in a longer latency)")
user_max_temperature = gr.Slider(0, 1, value=0.1, step=0.1, label="Temperature", info="Select generation temperature")
user_max_rep_pen = gr.Slider(0, 10, value=1.2, step=0.1, label="Repetition penalty", info="Select repetition penalty")
user_top_p = gr.Slider(0.1, 1, value=1, step=0.1, label="top_p", info="Select top_p for the generation")
system_ins = gr.Textbox(label="System Prompt", info="Insert your system prompt here", value="You are an helpful web searching assistant. You reply based on the contextual information you are provided with and on your knowledge.")
additional_accordion = gr.Accordion(label="Parameters to be set before you start chatting", open=True)
demo1 = gr.ChatInterface(fn=reply, title="Chat with your URLs", additional_inputs=[user_max_new_tokens, user_max_temperature, user_max_rep_pen, user_top_p, system_ins], additional_inputs_accordion=additional_accordion)

my_theme = gr.themes.Soft(primary_hue=gr.themes.colors.rose, secondary_hue=gr.themes.colors.pink)

demo = gr.TabbedInterface([demo0, demo1, demo2], ["Upload URLs", "Chat with URLs", "Direct Search"], theme=my_theme)

if __name__ == "__main__":
demo.launch(server_name="0.0.0.0", server_port=7860)
29 changes: 29 additions & 0 deletions app/loadUrls.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
from langchain_community.document_loaders.url import UnstructuredURLLoader
from langchain.text_splitter import CharacterTextSplitter
from rag import upload_text_to_qdrant, client
from typing import List, Dict

def urlload(urls: str) -> List[Dict[str,str]]:
links = urls.split(",")
try:
loader = UnstructuredURLLoader(
urls=links, method="elements",
strategy="fast"
)
docs = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
pages = text_splitter.split_documents(docs)
contents = [{"text": pages[i].page_content, "url": pages[i].metadata["source"]} for i in range(len(pages))]
return contents
except Exception as e:
return f"An error occurred while parsing the URLs: {e}"


def to_db(contents = List[Dict[str, str]]) -> None:
c = 0
for content in contents:
upload_text_to_qdrant(client, "memory", content, c)
c+=1
return


Loading

0 comments on commit d95999e

Please sign in to comment.