Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/enhance harness report to include detailed score counts and grouped results #1132

Conversation

chakravarthik27
Copy link
Collaborator

This pull request introduces several changes to the langtest package, focusing on enhancing the evaluation framework and improving code structure. The key changes include the addition of the EvalTemplate class, modifications to the is_pass_llm_eval function, and updates to the model_report function.

Enhancements to Evaluation Framework:

  • Addition of EvalTemplate Class: Introduced the EvalTemplate class in langtest/metrics/llm_eval.py to build a prompt for evaluating student answers based on a given rubric. This class includes a method build_prompt that constructs a grading prompt. (langtest/metrics/llm_eval.py)

  • Updates to is_pass_llm_eval Function: Modified the is_pass_llm_eval function in langtest/utils/custom_types/helpers.py to accept an eval_template parameter. This allows for customizable evaluation templates, improving the flexibility of the evaluation process. (langtest/utils/custom_types/helpers.py) [1] [2]

Code Structure and Typing Improvements:

  • Typing Enhancements: Updated type annotations to include Mapping and Union for better type safety and clarity. (langtest/metrics/llm_eval.py, langtest/utils/custom_types/helpers.py) [1] [2]

  • Changes in BaseQASample Class: Modified the config attribute in the BaseQASample class to use a Mapping type for better structure and clarity. (langtest/utils/custom_types/sample.py)

Reporting Improvements:

  • Enhanced model_report Function: Improved the model_report function to handle multiple keys in the summary dictionary, calculate pass rates more accurately, and rearrange the columns in the final report for better readability. (langtest/utils/report_utils.py)

These changes collectively enhance the flexibility, readability, and maintainability of the codebase.

@chakravarthik27 chakravarthik27 self-assigned this Oct 26, 2024
@chakravarthik27 chakravarthik27 merged commit 70a7d3a into release/2.5.0 Nov 18, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance Harness Report to Include Detailed Score Counts and Grouped Results
1 participant