-
Notifications
You must be signed in to change notification settings - Fork 206
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* only require keys if they are being used * only require keys if they are being used * add missing file * add missing file * refactor fixes * refactor fixes * run_dashboard() in tru * run_dashboard() in tru * minor * minor * small fixes * small fixes * null handling for cost and tokens * null handling for cost and tokens * help run_dashboard get leaderboard file * help run_dashboard get leaderboard file * made enpoints singletons and added checking of feedback parameters * made enpoints singletons and added checking of feedback parameters * format * format * actually format * actually format * make config and logo avaialable in pkg dist * make config and logo avaialable in pkg dist * quickstart * quickstart * Update README.md * Update README.md * Update README.md * Update README.md * add comments to quickstart notebook * add comments to quickstart notebook * update readme * update readme * docs and bugfixes on keys * docs and bugfixes on keys * que hora es? * que hora es? * mv folder structures * mv folder structures * fix imports * fix imports * more namechange * more namechange * rename notebook * rename notebook * manifest approach for noncode files * manifest approach for noncode files * take out local write * take out local write * doc fixes * doc fixes * mv docs to from eval_chain * mv docs to from eval_chain * change docs to eval * change docs to eval * add feedbacks * add feedbacks * add colab notebooks * add colab notebooks * add feedback functions docs * add feedback functions docs * documentation plus format * documentation plus format * huggingface docstring * huggingface docstring * cleanup old feedback functions * cleanup old feedback functions * add model agreement * add model agreement * set to our expected release functions * set to our expected release functions * remove pkg resource stream * remove pkg resource stream * parallelized some things, moved utilts to a new utils file, added the db logging and feedback eval back to the truchain class, removed record return from truchain but you can use call_with_record for old behaviour * parallelized some things, moved utilts to a new utils file, added the db logging and feedback eval back to the truchain class, removed record return from truchain but you can use call_with_record for old behaviour * feedback serialization and out of chain evaluation * feedback serialization and out of chain evaluation * fixes * fixes * moved feedback evaluation back to where the chains are running * moved feedback evaluation back to where the chains are running * singleton bugfix * singleton bugfix * remove example app * remove example app * small fix to empty db * small fix to empty db * remove print * remove print * cleaning up public interfaces and quickstart * cleaning up public interfaces and quickstart * write streamlit config in run_dashboard * write streamlit config in run_dashboard * remove type var for older python * remove type var for older python * work * work * updated quickstart * updated quickstart * fixes * fixes * minor * minor * Update tru_db.py add back chain_id to get_records * Update tru_db.py add back chain_id to get_records * Fix typing issue * Fix typing issue * ux stuff * ux stuff * millify * millify * remove commented out code * remove commented out code * remove obsolte generic dashboards * remove obsolte generic dashboards * work * work * clear * clear * ux updates * ux updates * small fixes * small fixes * ux * ux * ux * ux * work * work * small fixes * small fixes * start/stop dashboard * start/stop dashboard * misc * misc * threading fixes * threading fixes * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Create README.md * Create README.md * Update welcome.md * Update welcome.md * fix welcome symlink * fix welcome symlink * fix links * fix links * update quickstart md * update quickstart md * fix readme and quickstarts * fix readme and quickstarts * fix record call * fix record call * fix truchain and tc naming confusion * fix truchain and tc naming confusion * cleanup and bugfix * cleanup and bugfix * remove colab * remove colab * remove confusing documentation * remove confusing documentation * chain docs * chain docs * Add files via upload * Add files via upload * Add files via upload * Add files via upload * image paths * image paths * README updates * README updates * Remove ability to provide own database for now * remove colab * update image paths * remove .env.example * quickstart * Updated tru documentation * change hora es * few more docs * versioning --------- Co-authored-by: Josh Reini <[email protected]> Co-authored-by: Josh Reini <[email protected]> Co-authored-by: piotrm <[email protected]> Co-authored-by: Piotr Mardziel <[email protected]> Co-authored-by: Shayak Sen <[email protected]>
- Loading branch information
1 parent
655db8b
commit 56c17a2
Showing
168 changed files
with
7,387 additions
and
112 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
SHELL := /bin/bash | ||
CONDA_ENV := demo3 | ||
CONDA := source $$(conda info --base)/etc/profile.d/conda.sh ; conda activate ; conda activate $(CONDA_ENV) | ||
|
||
format: | ||
$(CONDA); bash format.sh | ||
|
||
lab: | ||
$(CONDA); jupyter lab --ip=0.0.0.0 --no-browser --ServerApp.token=deadbeef |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Welcome to TruLens! | ||
|
||
 | ||
|
||
TruLens provides a set of tools for developing and monitoring neural nets, including large language models. This includes both tools for evaluation of LLMs and LLM-based applications with TruLens-Eval and deep learning explainability with TruLens-Explain. TruLens-Eval and TruLens-Explain are housed in separate packages and can be used independently. | ||
|
||
**TruLens-Eval** contains instrumentation and evaluation tools for large language model (LLM) based applications. It supports the iterative development and monitoring of a wide range of LLM applications by wrapping your application to log key metadata across the entire chain (or off chain if your project does not use chains) on your local machine. Importantly, it also gives you the tools you need to evaluate the quality of your LLM-based applications. | ||
|
||
For more information, see [TruLens-Eval Documentation](trulens_eval/install.md). | ||
|
||
**TruLens-Explain** is a cross-framework library for deep learning explainability. It provides a uniform abstraction over a number of different frameworks. It provides a uniform abstraction layer over TensorFlow, Pytorch, and Keras and allows input and internal explanations. | ||
|
||
For more information, see [TruLens-Explain Documentation](trulens_explain/install.md). |
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Tru | ||
|
||
::: trulens_eval.trulens_eval.tru.Tru |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Feedback Functions | ||
|
||
::: trulens_eval.trulens_eval.tru_feedback |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Tru Chain | ||
|
||
::: trulens_eval.trulens_eval.tru_chain |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
## Getting access to TruLens | ||
|
||
These installation instructions assume that you have conda installed and added to your path. | ||
|
||
1. Create a virtual environment (or modify an existing one). | ||
``` | ||
conda create -n "<my_name>" python=3 # Skip if using existing environment. | ||
conda activate <my_name> | ||
``` | ||
|
||
2. [Pip installation] Install the trulens-eval pip package. | ||
``` | ||
pip install trulens-eval | ||
``` | ||
|
||
3. [Local installation] If you would like to develop or modify trulens, you can download the source code by cloning the trulens repo. | ||
``` | ||
git clone https://github.com/truera/trulens.git | ||
``` | ||
|
||
4. [Locall installation] Install the trulens repo. | ||
``` | ||
cd trulens/trulens_eval | ||
pip install -e . | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,267 @@ | ||
## Quickstart | ||
|
||
### Playground | ||
|
||
To quickly play around with the TruLens Eval library, download this notebook: [trulens_eval_quickstart.ipynb](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval_quickstart.ipynb). | ||
|
||
|
||
### Install & Use | ||
|
||
Install trulens-eval from pypi. | ||
|
||
``` | ||
pip install trulens-eval | ||
``` | ||
|
||
Imports from langchain to build app, trulens for evaluation | ||
|
||
```python | ||
from IPython.display import JSON | ||
# imports from langchain to build app | ||
from langchain import PromptTemplate | ||
from langchain.chains import LLMChain | ||
from langchain.chat_models import ChatOpenAI | ||
from langchain.prompts.chat import ChatPromptTemplate | ||
from langchain.prompts.chat import HumanMessagePromptTemplate | ||
# imports from trulens to log and get feedback on chain | ||
from trulens_eval import tru | ||
from trulens_eval import tru_chain | ||
tru = Tru() | ||
``` | ||
|
||
### API Keys | ||
|
||
Our example chat app and feedback functions call external APIs such as OpenAI or Huggingface. You can add keys by setting the environment variables. | ||
|
||
#### In Python | ||
|
||
```python | ||
import os | ||
os.environ["OPENAI_API_KEY"] = "..." | ||
``` | ||
#### In Terminal | ||
|
||
```bash | ||
export OPENAI_API_KEY = "..." | ||
``` | ||
|
||
### Create a basic LLM chain to evaluate | ||
|
||
This example uses langchain and OpenAI, but the same process can be followed with any framework and model provider. Once you've created your chain, just call TruChain to wrap it. Doing so allows you to capture the chain metadata for logging. | ||
|
||
```python | ||
full_prompt = HumanMessagePromptTemplate( | ||
prompt=PromptTemplate( | ||
template="Provide a helpful response with relevant background information for the following: {prompt}", | ||
input_variables=["prompt"], | ||
) | ||
) | ||
chat_prompt_template = ChatPromptTemplate.from_messages([full_prompt]) | ||
|
||
chat = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.9) | ||
|
||
chain = LLMChain(llm=chat, prompt=chat_prompt_template) | ||
|
||
# wrap with truchain to instrument your chain | ||
tc = tru_chain.TruChain(chain) | ||
``` | ||
|
||
### Set up logging and instrumentation | ||
|
||
Make the first call to your LLM Application. The instrumented chain can operate like the original but can also produce a log or "record" of the chain execution. | ||
|
||
```python | ||
prompt_input = 'que hora es?' | ||
gpt3_response, record = tc(prompt_input) | ||
``` | ||
|
||
We can log the records but first we need to log the chain itself. | ||
|
||
```python | ||
tru.add_chain(chain_json=truchain.json) | ||
``` | ||
|
||
Now we can log the record: | ||
```python | ||
tru.add_record( | ||
prompt=prompt_input, # prompt input | ||
response=gpt3_response['text'], # LLM response | ||
record_json=record # record is returned by the TruChain wrapper | ||
) | ||
``` | ||
|
||
## Evaluate Quality | ||
|
||
Following the request to your app, you can then evaluate LLM quality using feedback functions. This is completed in a sequential call to minimize latency for your application, and evaluations will also be logged to your local machine. | ||
|
||
To get feedback on the quality of your LLM, you can use any of the provided feedback functions or add your own. | ||
|
||
To assess your LLM quality, you can provide the feedback functions to tru.run_feedback() in a list as shown below. Here we'll just add a simple language match checker. | ||
```python | ||
from trulens_eval.tru_feedback import Feedback, Huggingface | ||
|
||
os.environ["HUGGINGFACE_API_KEY"] = "..." | ||
|
||
# Initialize Huggingface-based feedback function collection class: | ||
hugs = Huggingface() | ||
|
||
# Define a language match feedback function using HuggingFace. | ||
f_lang_match = Feedback(hugs.language_match).on( | ||
text1="prompt", text2="response" | ||
) | ||
|
||
# Run feedack functions. This might take a moment if the public api needs to load the language model used by the feedback function. | ||
feedback_result = f_lang_match.run_on_record( | ||
chain_json=truchain.json, record_json=record | ||
) | ||
|
||
JSON(feedback_result) | ||
|
||
# We can also run a collection of feedback functions | ||
feedback_results = tru.run_feedback_functions( | ||
record_json=record, | ||
feedback_functions=[f_lang_match] | ||
) | ||
display(feedback_results) | ||
``` | ||
|
||
After capturing feedback, you can then log it to your local database | ||
```python | ||
tru.add_feedback(feedback_results) | ||
``` | ||
|
||
### Automatic logging | ||
The above logging and feedback function evaluation steps can be done by TruChain. | ||
```python | ||
truchain = TruChain( | ||
chain, | ||
chain_id='Chain1_ChatApplication', | ||
feedbacks=[f_lang_match], | ||
tru=tru | ||
) | ||
# Note: providing `db: TruDB` causes the above constructor to log the wrapped chain in the database specified. | ||
# Note: any `feedbacks` specified here will be evaluated and logged whenever the chain is used. | ||
|
||
truchain("This will be automatically logged.") | ||
``` | ||
|
||
### Out-of-band Feedback evaluation | ||
|
||
In the above example, the feedback function evaluation is done in the same process as the chain evaluation. The alternative approach is the use the provided persistent evaluator started via `tru.start_deferred_feedback_evaluator`. Then specify the `feedback_mode` for `TruChain` as `deferred` to let the evaluator handle the feedback functions. | ||
|
||
For demonstration purposes, we start the evaluator here but it can be started in another process. | ||
```python | ||
truchain: TruChain = TruChain( | ||
chain, | ||
chain_id='Chain1_ChatApplication', | ||
feedbacks=[f_lang_match], | ||
tru=tru, | ||
feedback_mode="deferred" | ||
) | ||
|
||
tru.start_evaluator() | ||
truchain("This will be logged by deferred evaluator.") | ||
tru.stop_evaluator() | ||
``` | ||
|
||
|
||
### Run the dashboard! | ||
```python | ||
tru.run_dashboard() # open a streamlit app to explore | ||
# tru.stop_dashboard() # stop if needed | ||
``` | ||
|
||
### Chain Leaderboard: Quickly identify quality issues. | ||
|
||
Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up. | ||
|
||
Note: Average feedback values are returned and displayed in a range from 0 (worst) to 1 (best). | ||
|
||
 | ||
|
||
To dive deeper on a particular chain, click "Select Chain". | ||
|
||
### Understand chain performance with Evaluations | ||
|
||
To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more. | ||
|
||
The evaluations tab provides record-level metadata and feedback on the quality of your LLM application. | ||
|
||
 | ||
|
||
Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain. | ||
|
||
 | ||
|
||
If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page. | ||
|
||
### Out-of-the-box Feedback Functions | ||
See: <https://www.trulens.org/trulens_eval/api/tru_feedback/> | ||
|
||
#### Relevance | ||
|
||
This evaluates the *relevance* of the LLM response to the given text by LLM prompting. | ||
|
||
Relevance is currently only available with OpenAI ChatCompletion API. | ||
|
||
#### Sentiment | ||
|
||
This evaluates the *positive sentiment* of either the prompt or response. | ||
|
||
Sentiment is currently available to use with OpenAI, HuggingFace or Cohere as the model provider. | ||
|
||
* The OpenAI sentiment feedback function prompts a Chat Completion model to rate the sentiment from 1 to 10, and then scales the response down to 0-1. | ||
* The HuggingFace sentiment feedback function returns a raw score from 0 to 1. | ||
* The Cohere sentiment feedback function uses the classification endpoint and a small set of examples stored in feedback_prompts.py to return either a 0 or a 1. | ||
|
||
#### Model Agreement | ||
|
||
Model agreement uses OpenAI to attempt an honest answer at your prompt with system prompts for correctness, and then evaluates the aggreement of your LLM response to this model on a scale from 1 to 10. The agreement with each honest bot is then averaged and scaled from 0 to 1. | ||
|
||
#### Language Match | ||
|
||
This evaluates if the language of the prompt and response match. | ||
|
||
Language match is currently only available to use with HuggingFace as the model provider. This feedback function returns a score in the range from 0 to 1, where 1 indicates match and 0 indicates mismatch. | ||
|
||
#### Toxicity | ||
|
||
This evaluates the toxicity of the prompt or response. | ||
|
||
Toxicity is currently only available to be used with HuggingFace, and uses a classification endpoint to return a score from 0 to 1. The feedback function is negated as not_toxicity, and returns a 1 if not toxic and a 0 if toxic. | ||
|
||
#### Moderation | ||
|
||
The OpenAI Moderation API is made available for use as feedback functions. This includes hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic. Each is negated (ex: not_hate) so that a 0 would indicate that the moderation rule is violated. These feedback functions return a score in the range 0 to 1. | ||
|
||
## Adding new feedback functions | ||
|
||
Feedback functions are an extensible framework for evaluating LLMs. You can add your own feedback functions to evaluate the qualities required by your application by updating trulens_eval/tru_feedback.py. If your contributions would be useful for others, we encourage you to contribute to trulens! | ||
|
||
Feedback functions are organized by model provider into Provider classes. | ||
|
||
The process for adding new feedback functions is: | ||
1. Create a new Provider class or locate an existing one that applies to your feedback function. If your feedback function does not rely on a model provider, you can create a standalone class: | ||
|
||
```python | ||
class StandAlone(Provider): | ||
def __init__(self): | ||
pass | ||
``` | ||
|
||
2. Add a new feedback function method to your selected class. Your new method can either take a single text (str) as a parameter or both promopt (str) and response (str). It should return a float between 0 (worst) and 1 (best). | ||
|
||
```python | ||
def feedback(self, text: str) -> float: | ||
""" | ||
Describe how the model works | ||
Parameters: | ||
text (str): Text to evaluate. | ||
Can also be prompt (str) and response (str). | ||
Returns: | ||
float: A value between 0 (worst) and 1 (best). | ||
""" | ||
return float | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Attribution Methods | ||
|
||
::: trulens_explain.trulens.nn.attribution |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Distributions of Interest | ||
|
||
::: trulens_explain.trulens.nn.distributions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Model Wrappers | ||
|
||
::: trulens_explain.trulens.nn.models |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Quantities of Interest | ||
|
||
::: trulens_explain.trulens.nn.quantities |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Slices | ||
|
||
::: trulens_explain.trulens.nn.slices |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Visualization Methods | ||
|
||
::: trulens_explain.trulens.visualizations |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
Oops, something went wrong.