Optimizing Feedback Functions
In this release, we add important changes for improving the alignment of their LLM-Judge evals to human evaluations.
Global Improvement of Groundedness Feedback
The first is the global improvement of the groundedness feedback function (benchmarks and methods forthcoming). We invite any users to submit feedback (positive or negative) on the effectiveness of the new groundedness function using GitHub Issues or Discussions.
You can view the addition of new groundedness criteria in the GitHub diff below.
New levers for aligning feedback functions
The second change is that we add new easy-to-use levers for you to change the behavior of feedback functions using few-shot examples and custom criteria. Early customers have seen useful benefit in aligning their feedback functions to their collected expert evaluations using these levers.
Adding custom criteria to a feedback function
custom_criteria = """
A positive sentiment should be expressed with an extremely encouraging and enthusiastic tone.
"""
provider.sentiment(
"When you're ready to start your business, you'll be amazed at how much you can achieve!",
criteria=custom_criteria,
)
Adding few-shot examples to guide feedback functions
from trulens.feedback.v2 import feedback
fewshot_relevance_examples_list = [
(
{
"query": "What are the key considerations when starting a small business?",
"response": "You should focus on building relationships with mentors and industry leaders. Networking can provide insights, open doors to opportunities, and help you avoid common pitfalls.",
},
3,
),
]
provider.relevance(
"What are the key considerations when starting a small business?",
"Find a mentor who can guide you through the early stages and help you navigate common challenges.",
examples=fewshot_relevance_examples_list,
)
What's Changed
- Feedback customization (including few-shot examples) by @sfc-gh-jreini in #1674
- Custom criteria for feedback by @sfc-gh-jreini in #1705
- Update groundedness criteria (with more optimized prompt) by @sfc-gh-dhuang in #1710
- Allow existing tables to be used in ground truth datasets by @sfc-gh-dhuang in #1698
Bug Fixes
- Allow passthrough of feedback parameters including temperature, groundedness configs in the
Feedback
class by @sfc-gh-jreini in #1674 - Remove / retire sql instrumentation in Cortex Endpoint by @sfc-gh-dhuang in #1715
- Poetry < 2.0.0 by @sfc-gh-jreini in #1709
- Update docs to use postgres + psycopg in order to avoid known issues with psycopg2 by @sfc-gh-gtokernliang in #1701
- Update prpr example notebook to reflect latest Cortex provider API by @sfc-gh-dhuang in #1712
Preparations for Open Telemetry compatibility
- Introduce Event table for ORM to prepare for OTEL traces by @sfc-gh-gtokernliang in #1692
- Prototype OTEL exporter by @sfc-gh-gtokernliang in #1694
- Prototype @Instrument with OTEL by @sfc-gh-gtokernliang in #1693
- Move
main_input
,main_output
, and_extract_content
out of app.py by @sfc-gh-gtokernliang in #1706 - Move span-related validation + setting logic out of instrument.py by @sfc-gh-gtokernliang in #1707
Full Changelog: trulens-1.2.11...trulens-1.3.0