Skip to content

Commit

Permalink
Llm release (#118)
Browse files Browse the repository at this point in the history
* only require keys if they are being used

* only require keys if they are being used

* add missing file

* add missing file

* refactor fixes

* refactor fixes

* run_dashboard() in tru

* run_dashboard() in tru

* minor

* minor

* small fixes

* small fixes

* null handling for cost and tokens

* null handling for cost and tokens

* help run_dashboard get leaderboard file

* help run_dashboard get leaderboard file

* made enpoints singletons and added checking of feedback parameters

* made enpoints singletons and added checking of feedback parameters

* format

* format

* actually format

* actually format

* make config and logo avaialable in pkg dist

* make config and logo avaialable in pkg dist

* quickstart

* quickstart

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* add comments to quickstart notebook

* add comments to quickstart notebook

* update readme

* update readme

* docs and bugfixes on keys

* docs and bugfixes on keys

* que hora es?

* que hora es?

* mv folder structures

* mv folder structures

* fix imports

* fix imports

* more namechange

* more namechange

* rename notebook

* rename notebook

* manifest approach for noncode files

* manifest approach for noncode files

* take out local write

* take out local write

* doc fixes

* doc fixes

* mv docs to from eval_chain

* mv docs to from eval_chain

* change docs to eval

* change docs to eval

* add feedbacks

* add feedbacks

* add colab notebooks

* add colab notebooks

* add feedback functions docs

* add feedback functions docs

* documentation plus format

* documentation plus format

* huggingface docstring

* huggingface docstring

* cleanup old feedback functions

* cleanup old feedback functions

* add model agreement

* add model agreement

* set to our expected release functions

* set to our expected release functions

* remove pkg resource stream

* remove pkg resource stream

* parallelized some things,
moved utilts to a new utils file,
added the db logging and feedback eval
back to the truchain class,
removed record return from truchain but
you can use call_with_record for old behaviour

* parallelized some things,
moved utilts to a new utils file,
added the db logging and feedback eval
back to the truchain class,
removed record return from truchain but
you can use call_with_record for old behaviour

* feedback serialization and out of chain evaluation

* feedback serialization and out of chain evaluation

* fixes

* fixes

* moved feedback evaluation back to where the chains are running

* moved feedback evaluation back to where the chains are running

* singleton bugfix

* singleton bugfix

* remove example app

* remove example app

* small fix to empty db

* small fix to empty db

* remove print

* remove print

* cleaning up public interfaces and quickstart

* cleaning up public interfaces and quickstart

* write streamlit config in run_dashboard

* write streamlit config in run_dashboard

* remove type var for older python

* remove type var for older python

* work

* work

* updated quickstart

* updated quickstart

* fixes

* fixes

* minor

* minor

* Update tru_db.py

add back chain_id to get_records

* Update tru_db.py

add back chain_id to get_records

* Fix typing issue

* Fix typing issue

* ux stuff

* ux stuff

* millify

* millify

* remove commented out code

* remove commented out code

* remove obsolte generic dashboards

* remove obsolte generic dashboards

* work

* work

* clear

* clear

* ux updates

* ux updates

* small fixes

* small fixes

* ux

* ux

* ux

* ux

* work

* work

* small fixes

* small fixes

* start/stop dashboard

* start/stop dashboard

* misc

* misc

* threading fixes

* threading fixes

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Create README.md

* Create README.md

* Update welcome.md

* Update welcome.md

* fix welcome symlink

* fix welcome symlink

* fix links

* fix links

* update quickstart md

* update quickstart md

* fix readme and quickstarts

* fix readme and quickstarts

* fix record call

* fix record call

* fix truchain and tc naming confusion

* fix truchain and tc naming confusion

* cleanup and bugfix

* cleanup and bugfix

* remove colab

* remove colab

* remove confusing documentation

* remove confusing documentation

* chain docs

* chain docs

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* image paths

* image paths

* README updates

* README updates

* Remove ability to provide own database for now

* remove colab

* update image paths

* remove .env.example

* quickstart

* Updated tru documentation

* change hora es

* few more docs

* versioning

---------

Co-authored-by: Josh Reini <[email protected]>
Co-authored-by: Josh Reini <[email protected]>
Co-authored-by: piotrm <[email protected]>
Co-authored-by: Piotr Mardziel <[email protected]>
Co-authored-by: Shayak Sen <[email protected]>
  • Loading branch information
6 people authored May 23, 2023
1 parent 655db8b commit 56c17a2
Show file tree
Hide file tree
Showing 168 changed files with 7,387 additions and 112 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,7 @@ MANIFEST
# Notebook tests generate these files:
imagenet_class_index.json
imagenet_class_index.json.*

*/.env
*/*.db
*/*.sqlite
9 changes: 9 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
SHELL := /bin/bash
CONDA_ENV := demo3
CONDA := source $$(conda info --base)/etc/profile.d/conda.sh ; conda activate ; conda activate $(CONDA_ENV)

format:
$(CONDA); bash format.sh

lab:
$(CONDA); jupyter lab --ip=0.0.0.0 --no-browser --ServerApp.token=deadbeef
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Welcome to TruLens!

![TruLens](https://www.trulens.org/Assets/image/Neural_Network_Explainability.png)

TruLens provides a set of tools for developing and monitoring neural nets, including large language models. This includes both tools for evaluation of LLMs and LLM-based applications with TruLens-Eval and deep learning explainability with TruLens-Explain. TruLens-Eval and TruLens-Explain are housed in separate packages and can be used independently.

**TruLens-Eval** contains instrumentation and evaluation tools for large language model (LLM) based applications. It supports the iterative development and monitoring of a wide range of LLM applications by wrapping your application to log key metadata across the entire chain (or off chain if your project does not use chains) on your local machine. Importantly, it also gives you the tools you need to evaluate the quality of your LLM-based applications.

For more information, see [TruLens-Eval Documentation](trulens_eval/install.md).

**TruLens-Explain** is a cross-framework library for deep learning explainability. It provides a uniform abstraction over a number of different frameworks. It provides a uniform abstraction layer over TensorFlow, Pytorch, and Keras and allows input and internal explanations.

For more information, see [TruLens-Explain Documentation](trulens_explain/install.md).
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
Binary file added docs/Assets/image/Chain_Explore.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/Assets/image/Evaluations.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/Assets/image/Leaderboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/Assets/image/TruLens_Architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
4 changes: 2 additions & 2 deletions trulens_explain/docs/conf.py → docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
# -- Project information -----------------------------------------------------

project = 'trulens'
copyright = '2020, Klas Leino'
author = 'Klas Leino'
copyright = '2023, TruEra'
author = 'TruEra'

# -- General configuration ---------------------------------------------------

Expand Down
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file added docs/trulens_eval/Assets/image/Chain_Explore.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/trulens_eval/Assets/image/Evaluations.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/trulens_eval/Assets/image/Leaderboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/trulens_eval/api/tru.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Tru

::: trulens_eval.trulens_eval.tru.Tru
3 changes: 3 additions & 0 deletions docs/trulens_eval/api/tru_feedback.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Feedback Functions

::: trulens_eval.trulens_eval.tru_feedback
3 changes: 3 additions & 0 deletions docs/trulens_eval/api/truchain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Tru Chain

::: trulens_eval.trulens_eval.tru_chain
27 changes: 27 additions & 0 deletions docs/trulens_eval/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
## Getting access to TruLens

These installation instructions assume that you have conda installed and added to your path.

1. Create a virtual environment (or modify an existing one).
```
conda create -n "<my_name>" python=3 # Skip if using existing environment.
conda activate <my_name>
```

2. [Pip installation] Install the trulens-eval pip package.
```
pip install trulens-eval
```

3. [Local installation] If you would like to develop or modify trulens, you can download the source code by cloning the trulens repo.
```
git clone https://github.com/truera/trulens.git
```

4. [Locall installation] Install the trulens repo.
```
cd trulens/trulens_eval
pip install -e .
```


267 changes: 267 additions & 0 deletions docs/trulens_eval/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
## Quickstart

### Playground

To quickly play around with the TruLens Eval library, download this notebook: [trulens_eval_quickstart.ipynb](https://github.com/truera/trulens/blob/main/trulens_eval/trulens_eval_quickstart.ipynb).


### Install & Use

Install trulens-eval from pypi.

```
pip install trulens-eval
```

Imports from langchain to build app, trulens for evaluation

```python
from IPython.display import JSON
# imports from langchain to build app
from langchain import PromptTemplate
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate
from langchain.prompts.chat import HumanMessagePromptTemplate
# imports from trulens to log and get feedback on chain
from trulens_eval import tru
from trulens_eval import tru_chain
tru = Tru()
```

### API Keys

Our example chat app and feedback functions call external APIs such as OpenAI or Huggingface. You can add keys by setting the environment variables.

#### In Python

```python
import os
os.environ["OPENAI_API_KEY"] = "..."
```
#### In Terminal

```bash
export OPENAI_API_KEY = "..."
```

### Create a basic LLM chain to evaluate

This example uses langchain and OpenAI, but the same process can be followed with any framework and model provider. Once you've created your chain, just call TruChain to wrap it. Doing so allows you to capture the chain metadata for logging.

```python
full_prompt = HumanMessagePromptTemplate(
prompt=PromptTemplate(
template="Provide a helpful response with relevant background information for the following: {prompt}",
input_variables=["prompt"],
)
)
chat_prompt_template = ChatPromptTemplate.from_messages([full_prompt])

chat = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.9)

chain = LLMChain(llm=chat, prompt=chat_prompt_template)

# wrap with truchain to instrument your chain
tc = tru_chain.TruChain(chain)
```

### Set up logging and instrumentation

Make the first call to your LLM Application. The instrumented chain can operate like the original but can also produce a log or "record" of the chain execution.

```python
prompt_input = 'que hora es?'
gpt3_response, record = tc(prompt_input)
```

We can log the records but first we need to log the chain itself.

```python
tru.add_chain(chain_json=truchain.json)
```

Now we can log the record:
```python
tru.add_record(
prompt=prompt_input, # prompt input
response=gpt3_response['text'], # LLM response
record_json=record # record is returned by the TruChain wrapper
)
```

## Evaluate Quality

Following the request to your app, you can then evaluate LLM quality using feedback functions. This is completed in a sequential call to minimize latency for your application, and evaluations will also be logged to your local machine.

To get feedback on the quality of your LLM, you can use any of the provided feedback functions or add your own.

To assess your LLM quality, you can provide the feedback functions to tru.run_feedback() in a list as shown below. Here we'll just add a simple language match checker.
```python
from trulens_eval.tru_feedback import Feedback, Huggingface

os.environ["HUGGINGFACE_API_KEY"] = "..."

# Initialize Huggingface-based feedback function collection class:
hugs = Huggingface()

# Define a language match feedback function using HuggingFace.
f_lang_match = Feedback(hugs.language_match).on(
text1="prompt", text2="response"
)

# Run feedack functions. This might take a moment if the public api needs to load the language model used by the feedback function.
feedback_result = f_lang_match.run_on_record(
chain_json=truchain.json, record_json=record
)

JSON(feedback_result)

# We can also run a collection of feedback functions
feedback_results = tru.run_feedback_functions(
record_json=record,
feedback_functions=[f_lang_match]
)
display(feedback_results)
```

After capturing feedback, you can then log it to your local database
```python
tru.add_feedback(feedback_results)
```

### Automatic logging
The above logging and feedback function evaluation steps can be done by TruChain.
```python
truchain = TruChain(
chain,
chain_id='Chain1_ChatApplication',
feedbacks=[f_lang_match],
tru=tru
)
# Note: providing `db: TruDB` causes the above constructor to log the wrapped chain in the database specified.
# Note: any `feedbacks` specified here will be evaluated and logged whenever the chain is used.

truchain("This will be automatically logged.")
```

### Out-of-band Feedback evaluation

In the above example, the feedback function evaluation is done in the same process as the chain evaluation. The alternative approach is the use the provided persistent evaluator started via `tru.start_deferred_feedback_evaluator`. Then specify the `feedback_mode` for `TruChain` as `deferred` to let the evaluator handle the feedback functions.

For demonstration purposes, we start the evaluator here but it can be started in another process.
```python
truchain: TruChain = TruChain(
chain,
chain_id='Chain1_ChatApplication',
feedbacks=[f_lang_match],
tru=tru,
feedback_mode="deferred"
)

tru.start_evaluator()
truchain("This will be logged by deferred evaluator.")
tru.stop_evaluator()
```


### Run the dashboard!
```python
tru.run_dashboard() # open a streamlit app to explore
# tru.stop_dashboard() # stop if needed
```

### Chain Leaderboard: Quickly identify quality issues.

Understand how your LLM application is performing at a glance. Once you've set up logging and evaluation in your application, you can view key performance statistics including cost and average feedback value across all of your LLM apps using the chain leaderboard. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up.

Note: Average feedback values are returned and displayed in a range from 0 (worst) to 1 (best).

![Chain Leaderboard](Assets/image/Leaderboard.png)

To dive deeper on a particular chain, click "Select Chain".

### Understand chain performance with Evaluations

To learn more about the performance of a particular chain or LLM model, we can select it to view its evaluations at the record level. LLM quality is assessed through the use of feedback functions. Feedback functions are extensible methods for determining the quality of LLM responses and can be applied to any downstream LLM task. Out of the box we provide a number of feedback functions for assessing model agreement, sentiment, relevance and more.

The evaluations tab provides record-level metadata and feedback on the quality of your LLM application.

![Evaluations](Assets/image/Leaderboard.png)

Click on a record to dive deep into all of the details of your chain stack and underlying LLM, captured by tru_chain.

![Explore a Chain](Assets/image/Chain_Explore.png)

If you prefer the raw format, you can quickly get it using the "Display full chain json" or "Display full record json" buttons at the bottom of the page.

### Out-of-the-box Feedback Functions
See: <https://www.trulens.org/trulens_eval/api/tru_feedback/>

#### Relevance

This evaluates the *relevance* of the LLM response to the given text by LLM prompting.

Relevance is currently only available with OpenAI ChatCompletion API.

#### Sentiment

This evaluates the *positive sentiment* of either the prompt or response.

Sentiment is currently available to use with OpenAI, HuggingFace or Cohere as the model provider.

* The OpenAI sentiment feedback function prompts a Chat Completion model to rate the sentiment from 1 to 10, and then scales the response down to 0-1.
* The HuggingFace sentiment feedback function returns a raw score from 0 to 1.
* The Cohere sentiment feedback function uses the classification endpoint and a small set of examples stored in feedback_prompts.py to return either a 0 or a 1.

#### Model Agreement

Model agreement uses OpenAI to attempt an honest answer at your prompt with system prompts for correctness, and then evaluates the aggreement of your LLM response to this model on a scale from 1 to 10. The agreement with each honest bot is then averaged and scaled from 0 to 1.

#### Language Match

This evaluates if the language of the prompt and response match.

Language match is currently only available to use with HuggingFace as the model provider. This feedback function returns a score in the range from 0 to 1, where 1 indicates match and 0 indicates mismatch.

#### Toxicity

This evaluates the toxicity of the prompt or response.

Toxicity is currently only available to be used with HuggingFace, and uses a classification endpoint to return a score from 0 to 1. The feedback function is negated as not_toxicity, and returns a 1 if not toxic and a 0 if toxic.

#### Moderation

The OpenAI Moderation API is made available for use as feedback functions. This includes hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic. Each is negated (ex: not_hate) so that a 0 would indicate that the moderation rule is violated. These feedback functions return a score in the range 0 to 1.

## Adding new feedback functions

Feedback functions are an extensible framework for evaluating LLMs. You can add your own feedback functions to evaluate the qualities required by your application by updating trulens_eval/tru_feedback.py. If your contributions would be useful for others, we encourage you to contribute to trulens!

Feedback functions are organized by model provider into Provider classes.

The process for adding new feedback functions is:
1. Create a new Provider class or locate an existing one that applies to your feedback function. If your feedback function does not rely on a model provider, you can create a standalone class:

```python
class StandAlone(Provider):
def __init__(self):
pass
```

2. Add a new feedback function method to your selected class. Your new method can either take a single text (str) as a parameter or both promopt (str) and response (str). It should return a float between 0 (worst) and 1 (best).

```python
def feedback(self, text: str) -> float:
"""
Describe how the model works
Parameters:
text (str): Text to evaluate.
Can also be prompt (str) and response (str).
Returns:
float: A value between 0 (worst) and 1 (best).
"""
return float
```
3 changes: 3 additions & 0 deletions docs/trulens_explain/api/attribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Attribution Methods

::: trulens_explain.trulens.nn.attribution
3 changes: 3 additions & 0 deletions docs/trulens_explain/api/distributions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Distributions of Interest

::: trulens_explain.trulens.nn.distributions
3 changes: 3 additions & 0 deletions docs/trulens_explain/api/model_wrappers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Model Wrappers

::: trulens_explain.trulens.nn.models
3 changes: 3 additions & 0 deletions docs/trulens_explain/api/quantities.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Quantities of Interest

::: trulens_explain.trulens.nn.quantities
3 changes: 3 additions & 0 deletions docs/trulens_explain/api/slices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Slices

::: trulens_explain.trulens.nn.slices
3 changes: 3 additions & 0 deletions docs/trulens_explain/api/visualizations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Visualization Methods

::: trulens_explain.trulens.visualizations
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ git clone https://github.com/truera/trulens.git

4. [Locall installation] Install the trulens repo.
```
cd trulens
cd trulens_explain
pip install -e .
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ To quickly play around with the TruLens library, check out the following CoLab n


### Install & Use
Check out the [Installation](https://truera.github.io/trulens/install/) instructions for information on how to install the library, use it, and contribute.
Check out the [Installation](https://truera.github.io/trulens/trulens_explain/install/) instructions for information on how to install the library, use it, and contribute.
File renamed without changes.
Loading

0 comments on commit 56c17a2

Please sign in to comment.