Added parallel processing for metric evaluations + Progress Bar (Error handling done properly) #107

abhishekchauhan15 · 2024-11-19T11:25:37Z

Pull Request Template

Description

This pull request introduces parallel evaluation support for running multiple evaluations simultaneously for metric_list . By using max_workers and max_evaluations_per_thread parameters, we distribute metric evaluations across multiple threads, enabling concurrent processing. For example, when evaluating 5 metrics with max_workers=2, the system splits the work into smaller chunks (2-2-1), allowing two workers to process metrics simultaneously. This parallelization reduces the total evaluation time, especially for large metric sets. If we process 100 metrics, instead of evaluating them sequentially (which could take 100 units of time), parallel processing with 2 workers could theoretically complete in roughly 50 units of time, providing a significant performance boost. The max_evaluations_per_thread parameter further optimizes this by controlling the workload per thread, preventing any single thread from becoming a bottleneck.

Related Issue

1.4. Add Batch Evaluation Support

Enable running multiple evaluations simultaneously.
Include progress tracking and error handling.

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Evaluation Tests Documentation

This document outlines the details of the evaluation tests implemented to ensure proper functionality and robustness.

Test Descriptions

`test_evaluation_initialization`: Verifies Proper Setup

Purpose: Ensures the evaluation instance is correctly initialized with the required attributes:
- Project name
- Trace ID
- Engine
- Session objects

`test_chunk_metrics`: Tests Metric Chunking

Purpose: Validates that metrics are properly divided into smaller chunks for parallel processing.
Outcome: Ensures optimal workload distribution across processes.

`test_evaluate_empty_metric_list`: Validates Empty List Handling

Purpose: Confirms that the system raises a ValueError when attempting to evaluate an empty list of metrics.

`test_evaluate_with_invalid_metric`: Tests Invalid Metric Handling

Purpose: Ensures the system handles non-existent metrics gracefully without crashing the evaluation process.

`test_evaluate_with_valid_metrics`: Tests Valid Metric Evaluation

Purpose: Verifies that valid metrics (e.g., goal_decomposition_efficiency, goal_fulfillment_rate) are properly evaluated.
Outcome: Ensures the evaluation results are stored successfully.

`test_get_results`: Verifies Result Retrieval

Purpose: Confirms that evaluation results are:
- Successfully stored in the database.
- Retrieved with the correct format and content.

`test_get_trace_data_invalid_id`: Tests Invalid Trace ID Handling

Purpose: Validates that the system properly handles non-existent trace IDs and raises appropriate errors.

`test_parallel_processing_configuration`: Tests Parallel Processing

Purpose: Ensures that parallel processing settings (e.g., workers and thread limits) are correctly applied.
Outcome: Confirms that the system handles parallel metric evaluation effectively.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

Additional Context

Progess Bar

Parallel Processing

Impact on Roadmap

Optimizes performance
Prevents system crashes

abhishekchauhan15 added 4 commits November 18, 2024 20:26

parallel processing added in evaluation module

d103626

added tests + updated the docs

32cb7fe

Include progress tracking and error handling

446d94b

max_evaluations_per_thread added + updated docs with example

58897b1

abhishekchauhan15 changed the title ~~Added parallel processing + Progress Bar (Error handling done properly)~~ Added parallel processing for metric evaluations + Progress Bar (Error handling done properly) Nov 19, 2024

Delete tests/test_agentneo.py

8c88119

vijayc9 merged commit d011c6f into raga-ai-hub:hackathon Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added parallel processing for metric evaluations + Progress Bar (Error handling done properly) #107

Added parallel processing for metric evaluations + Progress Bar (Error handling done properly) #107

abhishekchauhan15 commented Nov 19, 2024

Added parallel processing for metric evaluations + Progress Bar (Error handling done properly) #107

Added parallel processing for metric evaluations + Progress Bar (Error handling done properly) #107

Conversation

abhishekchauhan15 commented Nov 19, 2024

Pull Request Template

Description

Related Issue

Type of Change

How Has This Been Tested?

Evaluation Tests Documentation

Test Descriptions

test_evaluation_initialization: Verifies Proper Setup

test_chunk_metrics: Tests Metric Chunking

test_evaluate_empty_metric_list: Validates Empty List Handling

test_evaluate_with_invalid_metric: Tests Invalid Metric Handling

test_evaluate_with_valid_metrics: Tests Valid Metric Evaluation

test_get_results: Verifies Result Retrieval

test_get_trace_data_invalid_id: Tests Invalid Trace ID Handling

test_parallel_processing_configuration: Tests Parallel Processing

Checklist:

Additional Context

Impact on Roadmap

`test_evaluation_initialization`: Verifies Proper Setup

`test_chunk_metrics`: Tests Metric Chunking

`test_evaluate_empty_metric_list`: Validates Empty List Handling

`test_evaluate_with_invalid_metric`: Tests Invalid Metric Handling

`test_evaluate_with_valid_metrics`: Tests Valid Metric Evaluation

`test_get_results`: Verifies Result Retrieval

`test_get_trace_data_invalid_id`: Tests Invalid Trace ID Handling

`test_parallel_processing_configuration`: Tests Parallel Processing