Skip to content

Commit

Permalink
Feature: End-to-end RAG pipeline testing and Gradio frontend, HF Smol…
Browse files Browse the repository at this point in the history
…Agents, Comet, Opik, and MongoDB Atlas backend vectorDB with ParentChild search-via langchain (#pr_online_pipeline_integration_002.notes.md)
  • Loading branch information
ernestol0817 committed Jan 21, 2025
1 parent 0589ba0 commit 275f8e5
Show file tree
Hide file tree
Showing 2 changed files with 90 additions and 0 deletions.
90 changes: 90 additions & 0 deletions pr_online_pipeline_integration_002.notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# **Summarized PR Notes**

## **Release Notes**

### **Features**

#### **1. Enhanced Documentation**
- Updated `main.py` with more verbose and beginner-friendly code documentation for clarity and ease of understanding.
- File: `src/second_brain_online/main.py`

#### **2. OPik Integration**
- Integrated OPik for monitoring pipeline performance and evaluation tracking.
- Updated files:
- `src/second_brain_online/ONLINE_README.md`
- `src/second_brain_online/pyproject.toml`
- `src/second_brain_online/main.py`

#### **3. Dynamic Dataset Creation**
- Implemented a dynamic dataset creation feature utilizing OPik for generating evaluation datasets from the `rag_data` collection in MongoDB Atlas.
- New file:
- `src/second_brain_online/opik_eval_dataset.py`

#### **4. Evaluation Script**
- Added a script to perform model evaluation using OPik’s latest best practices for LLM evaluation.
- Reference: [OPik Evaluation Documentation](https://www.comet.com/docs/opik/evaluation/evaluate_your_llm/#1-add-tracking-to-your-llm-application)
- New file:
- `src/second_brain_online/opik_online_pipline_eval.py`

#### **5. SmolAgents Integration**
- Successfully integrated SmolAgents for intelligent query routing between RAG and LLM inference.
- Updated file:
- `src/second_brain_online/main.py`
- Enabled advanced workflow control using SmolAgents for efficient document retrieval and LLM interactions.

---

### **Modified Files**

#### **Core Configuration**
- Updated `.env.example` and `pyproject.toml` to include OPik-related dependencies.

#### **Pipeline Components**
- Enhanced `src/second_brain_online/main.py` with better documentation and integrated OPik monitoring.
- Added SmolAgents for intelligent query routing and document retrieval.

#### **Infrastructure**
- Updated `src/second_brain_online/ONLINE_README.md` to reflect OPik monitoring and evaluation integration.

#### **Build and Test**
- Incorporated OPik pipeline tracking into the project configuration and evaluation scripts.
- Modified `uv.lock` to support updates.

---

### **New Files**

#### **1. Dynamic Dataset Creation Script**
- File: `src/second_brain_online/opik_eval_dataset.py`

#### **2. Evaluation Pipeline Script**
- File: `src/second_brain_online/opik_online_pipline_eval.py`

---

### **Bug Fixes**

- **No changes.**

---

### **Summary of Changes**

- **Additions**:
- Detailed code documentation for `src/second_brain_online/main.py`.
- Integration of OPik for monitoring and evaluation.
- Dynamic dataset creation script for evaluation.
- Evaluation script adhering to OPik’s guidelines.
- SmolAgents for advanced query routing and workflow control.
- **Modifications**:
- Updated project dependencies and documentation to support OPik and SmolAgents.

---

### **Outstanding Work To Be Completed**

- **Rank Fusion and Re-Ranking**:
- Advanced integration of rank fusion techniques for improved RAG performance.

---

Empty file added src/__init__.py
Empty file.

0 comments on commit 275f8e5

Please sign in to comment.