Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluations #286

Closed
5 of 7 tasks
bjwswang opened this issue Nov 24, 2023 · 5 comments
Closed
5 of 7 tasks

Evaluations #286

bjwswang opened this issue Nov 24, 2023 · 5 comments
Assignees
Milestone

Comments

@bjwswang
Copy link
Collaborator

bjwswang commented Nov 24, 2023

Overall workflow

TO BE DEFINED

Evaluation Types

RAG Evaluation

@Lanture1064 @bjwswang

Our current RAG solution flow :

  1. Dataset/VersionedDataset provides source files
  2. Dataprocessing process source files to QA csv with QAGenerationChain
  3. Knowledgebase generates embeddings from the QA csv file to vectorstore
  4. RetrievalQAChain do similarity search against knowledgebase
  5. LLM do chat with the similarity searched content as the context

Based on our research ,we decide to use this evaluation framework https://github.com/explodinggradients/ragas

Subtasks:

Evaluation Lifecycle management

@0xff-dev @bjwswang

For definitions:

For task runner:

For apiserver:

Overall Workflow

evaluation_jobflow drawio

@nkwangleiGIT
Copy link
Contributor

@Lanture1064 pls do further investigation.

@nkwangleiGIT
Copy link
Contributor

here is another project related to evaluate:
https://github.com/promptfoo/promptfoo

@bjwswang bjwswang added this to the v0.2.0 milestone Dec 13, 2023
@nkwangleiGIT
Copy link
Contributor

nkwangleiGIT commented Dec 23, 2023

Some other thoughts:

  • 支持 Prompt 在不同 LLM 下的评估,生成测试报告
    • RAG 评估、RAG Question Generation
    • 自动生成问题,分析问题质量,过滤掉相似度不高的问题
    • 评估指标:检索评估 - Hit Rate、MRR,回答评估 - 公正性、相关性、一致性等

@bjwswang bjwswang changed the title RAG Evaluations Evaluations Jan 4, 2024
@bjwswang
Copy link
Collaborator Author

bjwswang commented Jan 4, 2024

Some other thoughts:

  • 支持 Prompt 在不同 LLM 下的评估,生成测试报告

    • RAG 评估、RAG Question Generation
    • 自动生成问题,分析问题质量,过滤掉相似度不高的问题
    • 评估指标:检索评估 - Hit Rate、MRR,回答评估 - 公正性、相关性、一致性等
  1. RAG问题生成这里采用数据处理服务提供的能力即可
  2. ragas的指标(metrics)比较完善。 一些框架的对比可以在 https://shimo.im/docs/1d3aMzQ1wXHnwV3g

@bjwswang
Copy link
Collaborator Author

This story can be closed. For other features like support tektonci and other llms ,we can track in individual issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants