rLLM

rLLM (relationLLM) focuses on LLM-powered relational data learning, prioritizing: Accuracy, Efficiency, and Economy.

Dependencies

We recommmend 4-bit quantized Gemma 2b model, which can be Downloaded from HuggingFace.
In practice, the above Gemma 2b model is too weak to generate accurate responses. We use Mistral-7B model from ollama.

We recommend a light BERT-like model all-MiniLM-L6-v2 to make sentence embedding, which can be obtained directly from HuggingFace.
The embedding function during constructing the VectorStore database include GPT4AllEmbedding and HuggingFaceEmbedding.

RAG is utilized to refine the generation process for more accurate prediction and less hallucinated results.
The implementation code referes the paper Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflection
Self-RAG is a strategy for RAG that incorporates self-reflection(grading) on retrieved documents and generations. In the paper, a few decisions are made:
- Should I retrieve documents
  - Input: x (question), y (generation)
  - Decides when to retrieve D chunks with R
  - Output: {yes, no, continue}
- Are the retrieved passages D relevant to the question x
  - Input: x (question), d(chunk) for d in D
  - d provides useful information to solve x
  - output: {relevant, irrelevant}
- Are the LLM generation from each chunk in D is relevant to the chunk (hallucinations, etc) -
  - Input: x (question), d (chunk), y (generation) for d in D
  - All of the verification-worthy statements in y (generation) are supported by d
  - Output: {fully supported, partially supported, no support
- The LLM generation from each chunk in D is a useful response to x (question) -
  - Input: x (question), y (generation) for d in D
  - y (generation) is a useful response to x (question).
  - Output: {yes, no}