Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low performance hotspots #1160

Open
wants to merge 586 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
586 commits
Select commit Hold shift + click to select a range
159c0be
exposed AzureOpenAI provider (#698)
epinzur Dec 20, 2023
91bb9e0
ollama quickstart (#703)
joshreini1 Dec 20, 2023
5fa4b82
allow debug timeout to be adjusted (#713)
piotrm0 Dec 21, 2023
332858b
import llama only if needed (#714)
joshreini1 Dec 22, 2023
426fa70
fix dashboard starts for colab (#721)
piotrm0 Dec 22, 2023
37772e2
savE (#719)
walnutdust Dec 22, 2023
c26988c
first (#720)
piotrm0 Dec 22, 2023
ab57de9
Add shortcut to select_context() (#706)
joshreini1 Dec 22, 2023
a791a96
add optional (#723)
piotrm0 Dec 22, 2023
7e024c3
pydantic2 deprecation fix to model config (#724)
piotrm0 Dec 23, 2023
f2eefbd
Fix correctness prompt (#725)
shayaks Dec 23, 2023
0c0b484
Releases/rc trulens eval 0.20.0 (#727)
shayaks Dec 23, 2023
27b664c
Releases/rc trulens eval 0.20.0 (#729)
shayaks Dec 23, 2023
eff8562
debug migration issue in release pipeline (#726)
piotrm0 Dec 27, 2023
d7290be
azureopenai fixes (#735)
piotrm0 Jan 2, 2024
5c84ecc
fix typo (#739)
piotrm0 Jan 2, 2024
b8d9303
Update extract_score_and_reasons to work across providers (#732)
joshreini1 Jan 2, 2024
3b3b969
Update instrumentation docs (#737)
joshreini1 Jan 3, 2024
e2b5ad1
add instructions for installing from github (#740)
piotrm0 Jan 3, 2024
7e2d753
Automated File Generation from Docs Notebook Changes (#744)
github-actions[bot] Jan 4, 2024
1b7c516
adjust optional llama (#745)
piotrm0 Jan 4, 2024
c28c0af
adjust human feedback notebook (#746)
piotrm0 Jan 4, 2024
caa9205
Automated File Generation from Docs Notebook Changes (#749)
github-actions[bot] Jan 4, 2024
7343252
WithClassInfo bugfixes (#741)
piotrm0 Jan 4, 2024
b9a2fc9
update notebooks to test (#753)
piotrm0 Jan 4, 2024
98d9623
langchain thread executor rehack (#755)
piotrm0 Jan 4, 2024
7bdf01f
Fix subscripted generics typechecking for Python<3.10 (#754)
coreyhu Jan 4, 2024
feabdba
check for langchain legacy (#757)
piotrm0 Jan 4, 2024
02b11f0
pass bedrock provider to ground truth eval (#743)
joshreini1 Jan 4, 2024
3d58eee
convert structures to str in feedback result tables (#758)
piotrm0 Jan 4, 2024
4989a14
langchain provider fix (#759)
piotrm0 Jan 4, 2024
fb2f424
Releases/rc trulens eval 0.20.1 (#761)
joshreini1 Jan 5, 2024
408b838
Automated File Generation from Docs Notebook Changes (#762)
github-actions[bot] Jan 5, 2024
2488e64
add missing langchain provider docs (#760)
joshreini1 Jan 5, 2024
d511973
include excluded pydantic fields in json dumps (#768)
piotrm0 Jan 8, 2024
4f73448
changed the default model id to titan lite (#774)
rajib76 Jan 8, 2024
3fef759
documentation and fix to weakref usage (#771)
piotrm0 Jan 8, 2024
58649e2
more optional annotations and some bugfixes (#770)
piotrm0 Jan 8, 2024
966a716
Releases/rc trulens eval 0.20.2 (#780)
joshreini1 Jan 9, 2024
5146001
Automated File Generation from Docs Notebook Changes (#781)
github-actions[bot] Jan 9, 2024
724eee5
removes summarize_provider from groundedness variable (#785)
ingridstevens Jan 9, 2024
e36b997
more fixes to utility imports (#786)
piotrm0 Jan 9, 2024
c16b700
better prompt for GroundTruth feedback function + pydantic v2 valudat…
daniel-huang-1230 Jan 9, 2024
748400d
update langchain_agents notebook (#778)
piotrm0 Jan 10, 2024
ba83563
Josh/aws updates (#788)
joshreini1 Jan 10, 2024
af81e76
Fix missing f-strings (#790)
andrewisplinghoff Jan 10, 2024
845dcaa
optionals readme (#787)
piotrm0 Jan 10, 2024
def8172
TruLens-Eval v0.20.3 (#791)
joshreini1 Jan 10, 2024
ddd0150
Automated File Generation from Docs Notebook Changes (#792)
github-actions[bot] Jan 11, 2024
9db4b0c
Minor fixes- changing g"generation" to "generated_text" and updating …
vivekgangasani Jan 12, 2024
91569a4
fix precision error (#798)
joshreini1 Jan 16, 2024
e420568
Groundedness refactor (#801)
joshreini1 Jan 16, 2024
921b8c8
Evaluations Page Nits (#797)
joshreini1 Jan 22, 2024
12b71c9
deduplicating sync/async code (#793)
piotrm0 Jan 22, 2024
9bc29ea
error on deprecated passthrough methods (#803)
piotrm0 Jan 23, 2024
f6e61c6
virtual models for logging and evaluating existing data (#806)
piotrm0 Jan 23, 2024
194acd8
oopenai -> openai (#815)
joshreini1 Jan 23, 2024
9eb94ad
Fix summarization, rename to comprehensiveness (#816)
joshreini1 Jan 23, 2024
26fe140
deprecate more things (#817)
piotrm0 Jan 24, 2024
ac7d235
Generate Test Cases (#705)
joshreini1 Jan 24, 2024
ac1dc54
Refactor Evaluation Docs (#823)
joshreini1 Jan 25, 2024
9a34cae
Make OpenAI Optional (#827)
joshreini1 Jan 26, 2024
159b8dd
enable async unit tests (#831)
piotrm0 Jan 26, 2024
a303230
Releases/rc trulens eval 0.21.0 (#830)
joshreini1 Jan 29, 2024
2ad0670
Automated File Generation from Docs Notebook Changes (#837)
github-actions[bot] Jan 29, 2024
1cc234a
fix ellipsis issue (#840)
piotrm0 Jan 31, 2024
42a3520
factor out common error from app types (#832)
piotrm0 Jan 31, 2024
4ed5706
Docs refactor (#829)
joshreini1 Jan 31, 2024
43af1be
Automated File Generation from Docs Notebook Changes (#842)
github-actions[bot] Jan 31, 2024
c18afbc
few more ellipses (#843)
piotrm0 Feb 1, 2024
47983b4
update azure example, also show provider extension (#847)
joshreini1 Feb 2, 2024
2aea0b8
add testing with older python versions (#841)
piotrm0 Feb 2, 2024
d6682bd
Bump vite in /trulens_eval/trulens_eval/react_components/record_viewe…
dependabot[bot] Feb 2, 2024
5e2421c
Integration Testing (#838)
joshreini1 Feb 2, 2024
7eea89d
TruLens-Eval v0.22.0 release (#851)
joshreini1 Feb 3, 2024
5e55846
Automated File Generation from Docs Notebook Changes (#853)
github-actions[bot] Feb 5, 2024
47f806c
init cleanup (#852)
piotrm0 Feb 5, 2024
29d93a8
Randomly run evals based on record_id hash (#850)
joshreini1 Feb 6, 2024
9f917ed
fix typo (#857)
joshreini1 Feb 6, 2024
99d747c
Fix typo and adjust some debug printouts. (#866)
piotrm0 Feb 7, 2024
ace1f5b
non-threaded pacer (#874)
piotrm0 Feb 8, 2024
703cdaf
better deferred evaluation (#807)
piotrm0 Feb 8, 2024
2e9c9cf
Automated File Generation from Docs Notebook Changes (#876)
github-actions[bot] Feb 8, 2024
b2a979a
streamlit experimental query params -> query params (#860)
joshreini1 Feb 8, 2024
603972d
deferred progress status (#879)
piotrm0 Feb 8, 2024
f513157
Allow different schemas for Bedrock provider calls (#878)
joshreini1 Feb 8, 2024
326d535
pull from other branch (#881)
piotrm0 Feb 9, 2024
c044a88
fix st.query_params (#883)
joshreini1 Feb 9, 2024
3a9d0d4
TruLens-Eval v0.22.1 (#882)
joshreini1 Feb 9, 2024
9afa162
metadata datatype validation fix (#888)
aaronvarghese Feb 10, 2024
2198b3a
ensure agreement prompt outputs integer score as the last part of the…
daniel-huang-1230 Feb 12, 2024
8758d7c
pin llama-index temp (#893)
joshreini1 Feb 13, 2024
6b70a12
first (#892)
piotrm0 Feb 13, 2024
848898e
bedrock provider branching fix (#887)
joshreini1 Feb 13, 2024
0d7035a
TruLens-Eval 0.22.2 (#894)
joshreini1 Feb 13, 2024
fe3d79a
cleanup (#880)
piotrm0 Feb 14, 2024
05e1d5e
fix for in-memory sqlite params (#904)
piotrm0 Feb 15, 2024
a72aca1
Fix use case colab links (#900)
joshreini1 Feb 15, 2024
f5f3b1d
few site-related fixes to recently merged pr (#903)
piotrm0 Feb 15, 2024
895d0f6
comprehensiveness updates (#901)
joshreini1 Feb 15, 2024
3c31e71
model rebuilds as necessary (#905)
piotrm0 Feb 15, 2024
28fbf60
Deeper Instrumentation for Hybrid Retrievers + examples (#873)
joshreini1 Feb 16, 2024
5b3e8fc
various documentation fixes (#907)
piotrm0 Feb 16, 2024
deced20
Releases/rc trulens eval 0.23.0 (#908)
joshreini1 Feb 16, 2024
b141279
cost tracking tests and litellm cost tracking (#910)
piotrm0 Feb 21, 2024
df9adac
check packages on init (#917)
piotrm0 Feb 21, 2024
9bc058a
Increase provider test coverage to Huggingface feedback provider (#919)
venkatkakoju Feb 22, 2024
0569396
upgrade Llama-Index integration to 0.10 (#891)
joshreini1 Feb 22, 2024
da29a66
Automated File Generation from Docs Notebook Changes (#922)
github-actions[bot] Feb 22, 2024
5f98b00
Update issue templates (#923)
joshreini1 Feb 22, 2024
3699ce5
async handling adjustments (#918)
piotrm0 Feb 23, 2024
3c2fe22
update instrumentation notebooks and related nits (#931)
piotrm0 Feb 23, 2024
60d4206
TruLens-Eval 0.24.0 (#927)
joshreini1 Feb 23, 2024
c959a92
Update selecting_components.md (#926)
joshreini1 Feb 23, 2024
a91e294
bump patchlevel (#933)
piotrm0 Feb 23, 2024
979295b
makefile targets for release process (#934)
piotrm0 Feb 23, 2024
c147bbc
Better selection of main input/output (#938)
joshreini1 Feb 28, 2024
9538049
Documentation structure and heading pages (#945)
piotrm0 Mar 1, 2024
8826a03
update tru virtual docs (#949)
piotrm0 Mar 4, 2024
1f2a828
[MLNN-1217] Improve regex matching for structured output extraction f…
daniel-huang-1230 Mar 5, 2024
a8d473e
instrumentation notebook updates and fixes (#953)
piotrm0 Mar 6, 2024
6dfd13d
add nemo guardrails integrations (#824)
piotrm0 Mar 6, 2024
491bb10
Fix release test pipeline (#962)
joshreini1 Mar 6, 2024
f008169
extract response attr if app returns object with that attr (#865)
joshreini1 Mar 6, 2024
d896a4e
fix links in docs (#963)
piotrm0 Mar 6, 2024
bdc883f
Automated File Generation from Docs Notebook Changes (#964)
github-actions[bot] Mar 6, 2024
dc1c5d6
add back all_tolls (#965)
joshreini1 Mar 7, 2024
80d2500
version bump (#966)
joshreini1 Mar 7, 2024
062ade8
canopy quickstart (#925)
joshreini1 Mar 7, 2024
d4e0fdd
add missing virtual app setup and redirects from old core concept lin…
joshreini1 Mar 7, 2024
d371af5
fix links in docs (#968)
piotrm0 Mar 8, 2024
9654f2b
Update typo in pip install on azure_openai.ipynb (#973)
ingridstevens Mar 8, 2024
dafabc6
readd redirects (#972)
piotrm0 Mar 8, 2024
f6984f1
Automated File Generation from Docs Notebook Changes (#975)
github-actions[bot] Mar 8, 2024
c0a0cf3
fix colab link - langchain ensemble notebook (#980)
joshreini1 Mar 8, 2024
1d0c0fc
allow deserialization for faiss example (#978)
joshreini1 Mar 8, 2024
0a2e5b5
fix truchain (#974)
joshreini1 Mar 8, 2024
f77e7ee
patch (#982)
joshreini1 Mar 8, 2024
1c2a646
QS Relevance -> Context Relevance (#977)
joshreini1 Mar 8, 2024
15a19f7
Automated File Generation from Docs Notebook Changes (#983)
github-actions[bot] Mar 9, 2024
b25575e
version bump (#981)
joshreini1 Mar 9, 2024
40f78bb
existing data quickstart (#976)
joshreini1 Mar 9, 2024
1058a5d
Adds Azure Quickstart for LangChain (#984)
ingridstevens Mar 9, 2024
4398e5c
fix more docs links (#987)
piotrm0 Mar 11, 2024
d92e79f
update (#990)
piotrm0 Mar 12, 2024
5927f66
verify feedback selectors on recorder init (#961)
piotrm0 Mar 12, 2024
759d035
relax llama version (#985)
joshreini1 Mar 12, 2024
83223c4
Allow VirtualRecords to have multiple calls to the same component. (#…
piotrm0 Mar 12, 2024
76e28ff
Fix broken colab links (#994)
joshreini1 Mar 13, 2024
c5353c4
docs updates/additions (#996)
piotrm0 Mar 13, 2024
bb11c2d
add install redirect in docs (#995)
joshreini1 Mar 13, 2024
5c1d2ff
Update feedback docs (#999)
joshreini1 Mar 14, 2024
0c6ba29
doc usage formatting (#1002)
piotrm0 Mar 15, 2024
eaa10f2
Allow Feedback.run with args even if they had selectors specified. (#…
piotrm0 Mar 15, 2024
57b3078
version bump (#1004)
joshreini1 Mar 15, 2024
6fccee8
Python 3.12 support (#1012)
joshreini1 Mar 19, 2024
75c5838
documentation nits and moves (#1015)
piotrm0 Mar 20, 2024
41936d8
Fix shield link to docs (#1019)
joshreini1 Mar 21, 2024
0387610
Create pull_request_template.md (#1021)
piotrm0 Mar 22, 2024
15fe242
Automated File Generation from Docs Notebook Changes (#1020)
github-actions[bot] Mar 22, 2024
45ee7b1
Fix DB Issues (#1023)
arn-tru Mar 22, 2024
0f9901e
typo in rag_triad.md (#1022)
joshreini1 Mar 22, 2024
502889e
Fix paul_graham_essay.txt links to new location (#1024)
joshreini1 Mar 22, 2024
c7814d6
Feedback upgrades (#1018)
joshreini1 Mar 22, 2024
cae1e2f
updated App.select_context to support MultiQueryRetriever of langchai…
sayedsohan Mar 22, 2024
bff1cdc
Add Vectara Hallucination Detection Model (#950)
Josephrp Mar 23, 2024
6f05a29
Parametrize temperature for create chat completion (#1026)
daniel-huang-1230 Mar 23, 2024
7974d12
0.27.0 version bump (#1027)
joshreini1 Mar 23, 2024
82f2d68
first (#1030)
piotrm0 Mar 26, 2024
f9fbef4
docs | standards on proper names (#997)
markdavidmc0 Mar 26, 2024
41d37f3
docs glossary (#1029)
piotrm0 Mar 26, 2024
9086f3e
fix doc link in hybrid retriever notebook (#1035)
daniel-huang-1230 Mar 27, 2024
63e436b
docs README (#1034)
joshreini1 Mar 27, 2024
23bad45
Update with_app.md (#1036)
nicoloboschi Mar 27, 2024
cbd6d36
more pipelines docs (#1033)
piotrm0 Mar 28, 2024
c30842a
Automated File Generation from Docs Notebook Changes (#1031)
github-actions[bot] Mar 28, 2024
2b5e303
add missing job name (#1037)
joshreini1 Mar 28, 2024
d013107
Add if_missing. (#1038)
piotrm0 Apr 1, 2024
f0484de
Docs updates for feedback, instrumentation apis, examples (#1032)
joshreini1 Apr 1, 2024
8a84b49
fix (#1043)
piotrm0 Apr 2, 2024
94088be
[DOCS] more proper names and glossary terms (#1042)
piotrm0 Apr 2, 2024
66fd06d
Added feedback button to trulens (#1046)
arn-tru Apr 2, 2024
99a80e0
Import improvements, fix version conflicts (#1047)
joshreini1 Apr 3, 2024
3da61a0
Fix import and favicon (#1049)
arn-tru Apr 3, 2024
ccf03b4
remove pkg_resources and distutils (#1052)
piotrm0 Apr 3, 2024
a9944b8
version bump (#1053)
joshreini1 Apr 4, 2024
29f1efb
add missing pprint import (#1054)
joshreini1 Apr 4, 2024
b6c96e9
bump 27 2 (#1055)
joshreini1 Apr 4, 2024
a8c350a
Meta-eval / feeback functions benchmarking notebooks, ranking-based e…
daniel-huang-1230 Apr 5, 2024
acba5ea
MongoDB Atlas quickstart (#1056)
joshreini1 Apr 6, 2024
5dbe80a
OpenAI Assistants API (quickstart) (#1041)
joshreini1 Apr 6, 2024
b9189ce
App delete functionality added (#1061)
arn-tru Apr 9, 2024
6f0973e
Queue fixed for python version lower than 3.9 (#1066)
arn-tru Apr 9, 2024
5644b1a
Added lanchain provider tests (#1062)
arn-tru Apr 17, 2024
d03fa8b
docs fixes (#1075)
piotrm0 Apr 17, 2024
1368970
configurable table prefix (#971)
piotrm0 Apr 17, 2024
5af1347
add example service file (#1072)
piotrm0 Apr 17, 2024
ba4cae3
fix test-tru (#1070)
piotrm0 Apr 17, 2024
14f1813
commented out broken tests (#1076)
arn-tru Apr 17, 2024
02c1bf9
fix legacy db missing abstract method (#1077)
piotrm0 Apr 17, 2024
3d33ffb
release test fixes (#1078)
piotrm0 Apr 17, 2024
ece8892
Version Bump 0.28.0 (#1079)
arn-tru Apr 17, 2024
b79a91a
Merge branch 'releases/rc-trulens-eval-0.28.0' into main
piotrm0 Apr 17, 2024
5cd3967
Update Main Branch to Match Latest Release (#1083)
arn-tru Apr 17, 2024
2e93282
feature(Improved Trace Display): [MLNN-1342] Improved Trace Display (…
walnutdust Apr 19, 2024
e4b17f0
improvements(configs): Add import-related configs (#1091)
walnutdust Apr 22, 2024
2a9ae32
add maintainers and releases files (#1085)
piotrm0 Apr 22, 2024
c5dceee
Fix Colab links in expositional example notebooks (#1095)
daniel-huang-1230 Apr 23, 2024
7bc2ec2
better singleton already made warnings (#1088)
piotrm0 Apr 23, 2024
8deebee
Fixes to package build. (#1093)
piotrm0 Apr 23, 2024
57bc3b6
remove legacy db implementation (#1084)
piotrm0 Apr 23, 2024
3da0f6a
split schema.py (#1090)
piotrm0 Apr 23, 2024
85d1ed5
patch release 0.28.1 (#1094)
piotrm0 Apr 23, 2024
47e7d77
Fix dividing by zero error in context_relevance_with_cot_reasons + at…
daniel-huang-1230 Apr 24, 2024
46d362e
Prepare notebook for Assistants API hackathon (#1102)
daniel-huang-1230 Apr 26, 2024
96d6765
Show OSS models (and tracking) in LiteLLM application (#1109)
joshreini1 Apr 29, 2024
d4d7973
fix(pills): Fixed bug with trace view initialization when no feedback…
walnutdust Apr 30, 2024
0591fcc
chore: remove unused code cell (#1113)
stokedout Apr 30, 2024
73385b4
Automated File Generation from Docs Notebook Changes (#1114)
github-actions[bot] Apr 30, 2024
6f1fd6a
Remove references to running moderation endpoint on AzureOpenAI (#1116)
joshreini1 May 1, 2024
837784c
swap rag utility (qs)relevance (#1120)
piotrm0 May 2, 2024
7bd5d00
Fix Link (#1128)
timbmg May 7, 2024
a311b0b
update groundedness prompt (#1112)
bpmcgough May 7, 2024
f231f9c
Fix docs links in instrumentation (#1129)
joshreini1 May 7, 2024
11f22e1
fix rag triad and awaitable calls (#1110)
piotrm0 May 7, 2024
fc2c5a2
trurails: update to getattr (#1130)
joshreini1 May 9, 2024
529c68c
Remove placeholder feedback (#1127)
arn-tru May 9, 2024
7457e53
Default names for rag triad utility (#1122)
joshreini1 May 9, 2024
45338b0
add reasons to answer, context relevance; add collect to groundedness
joshreini1 May 10, 2024
9f7bbab
Update custom functions notebook to reflect context relevance change …
joshreini1 May 13, 2024
e365a6e
Update custom_feedback_functions.ipynb
joshreini1 May 14, 2024
744ea34
Update README.md (#1136)
eltociear May 14, 2024
d8ff339
Unify groundedness interface (#1135)
joshreini1 May 14, 2024
78dbb12
dont iterate streams in openai cost tracking (#1138)
piotrm0 May 15, 2024
95d8d0b
Automated File Generation from Docs Notebook Changes (#1137)
github-actions[bot] May 15, 2024
c424cd4
Fix a few old groundedness references (#1139)
joshreini1 May 15, 2024
e557792
Update all_tools.py
joshreini1 May 15, 2024
9a6edec
Update llama_index_quickstart.ipynb
joshreini1 May 15, 2024
e41e513
Automated File Generation from Docs Notebook Changes (#1141)
github-actions[bot] May 15, 2024
32de002
update comprehensiveness + nb (#1064)
joshreini1 May 16, 2024
05f7e74
0.29.0 version bump (#1140)
joshreini1 May 17, 2024
0c1d745
Automated File Generation from Docs Notebook Changes (#1143)
github-actions[bot] May 17, 2024
43bfb74
glossary additions (#1144)
piotrm0 May 20, 2024
e06afaf
Update ollama_quickstart.ipynb
joshreini1 May 21, 2024
cf8cd67
Patch imports error (#1146)
joshreini1 May 24, 2024
8487b2c
add langchain_community to optional imports and checks for use of ope…
piotrm0 May 24, 2024
e92b4a2
0.30.0 bump (#1158)
joshreini1 May 25, 2024
b53c597
0.30.1 bump (#1159)
joshreini1 May 25, 2024
ad70beb
Added the additional hotspot analysis pages and the notebook to rende…
bodhisaha May 25, 2024
7fac8cc
changed the notebook to run on 50 queries without all feedback scores…
bodhisaha May 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Randomly run evals based on record_id hash (#850)
* prototype

* update md

* fix run_feedback

* update

* update

* update
joshreini1 authored Feb 6, 2024
commit 29d93a80e4408ccdd7d807d38c109760f171fd82
12 changes: 4 additions & 8 deletions docs/trulens_eval/feedback_functions_existing_data.md
Original file line number Diff line number Diff line change
@@ -14,15 +14,11 @@ feedback_result = provider.relevance("<some prompt>", "<some response>")
In the case that you have already logged a run of your application with TruLens and have the record available, the process for running an (additional) evaluation on that record is by using `tru.run_feedback_functions`:

```python
tru_recorder = TruChain(
chain,
app_id='Chain1_ChatApplication'
)

with tru_recorder as recording:
record = chain(""What is langchain?")
tru_rag = TruCustomApp(rag, app_id = 'RAG v1')

tru.run_feedbacks(record, feedbacks=[f_lang_match, f_qa_relevance, f_context_relevance])
result, record = tru_rag.with_record(rag.query, "How many professors are at UW in Seattle?")
feedback_results = tru.run_feedback_functions(record, feedbacks=[f_lang_match, f_qa_relevance, f_context_relevance])
tru.add_feedbacks(feedback_results)
```

### TruVirtual
390 changes: 390 additions & 0 deletions trulens_eval/examples/experimental/random_evaluation.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,390 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Random Evaluation of Records\n",
"\n",
"This notebook walks through the random evaluation of records with TruLens.\n",
"\n",
"This is useful in cases where we want to log all application runs, but it is expensive to run evaluations each time. To gauge the performance of the app, we need *some* evaluations, so it is useful to evaluate a representative sample of records. We can do this after each record selectively running and logging feedback based on some randomization scheme.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/experimental/random_evaluation.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ! pip install trulens_eval==0.22.0 chromadb==0.4.18 openai==1.3.7"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Data\n",
"\n",
"In this case, we'll just initialize some simple text in the notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"university_info = \"\"\"\n",
"The University of Washington, founded in 1861 in Seattle, is a public research university\n",
"with over 45,000 students across three campuses in Seattle, Tacoma, and Bothell.\n",
"As the flagship institution of the six public universities in Washington state,\n",
"UW encompasses over 500 buildings and 20 million square feet of space,\n",
"including one of the largest library systems in the world.\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Vector Store\n",
"\n",
"Create a chromadb vector store in memory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from openai import OpenAI\n",
"oai_client = OpenAI()\n",
"\n",
"oai_client.embeddings.create(\n",
" model=\"text-embedding-ada-002\",\n",
" input=university_info\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import chromadb\n",
"from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction\n",
"\n",
"embedding_function = OpenAIEmbeddingFunction(api_key=os.environ.get('OPENAI_API_KEY'),\n",
" model_name=\"text-embedding-ada-002\")\n",
"\n",
"chroma_client = chromadb.Client()\n",
"vector_store = chroma_client.get_or_create_collection(name=\"Universities\",\n",
" embedding_function=embedding_function)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Add the university_info to the embedding database."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"vector_store.add(\"uni_info\", documents=university_info)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build RAG from scratch\n",
"\n",
"Build a custom RAG from scratch, and add TruLens custom instrumentation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from trulens_eval import Tru\n",
"from trulens_eval.tru_custom_app import instrument\n",
"tru = Tru()\n",
"tru.reset_database()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class RAG_from_scratch:\n",
" @instrument\n",
" def retrieve(self, query: str) -> list:\n",
" \"\"\"\n",
" Retrieve relevant text from vector store.\n",
" \"\"\"\n",
" results = vector_store.query(\n",
" query_texts=query,\n",
" n_results=2\n",
" )\n",
" return results['documents'][0]\n",
"\n",
" @instrument\n",
" def generate_completion(self, query: str, context_str: list) -> str:\n",
" \"\"\"\n",
" Generate answer from context.\n",
" \"\"\"\n",
" completion = oai_client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
" temperature=0,\n",
" messages=\n",
" [\n",
" {\"role\": \"user\",\n",
" \"content\": \n",
" f\"We have provided context information below. \\n\"\n",
" f\"---------------------\\n\"\n",
" f\"{context_str}\"\n",
" f\"\\n---------------------\\n\"\n",
" f\"Given this information, please answer the question: {query}\"\n",
" }\n",
" ]\n",
" ).choices[0].message.content\n",
" return completion\n",
"\n",
" @instrument\n",
" def query(self, query: str) -> str:\n",
" context_str = self.retrieve(query)\n",
" completion = self.generate_completion(query, context_str)\n",
" return completion\n",
"\n",
"rag = RAG_from_scratch()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set up feedback functions.\n",
"\n",
"Here we'll use groundedness, answer relevance and context relevance to detect hallucination."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from trulens_eval import Feedback, Select\n",
"from trulens_eval.feedback import Groundedness\n",
"from trulens_eval.feedback.provider.openai import OpenAI as fOpenAI\n",
"\n",
"import numpy as np\n",
"\n",
"# Initialize provider class\n",
"fopenai = fOpenAI()\n",
"\n",
"grounded = Groundedness(groundedness_provider=fopenai)\n",
"\n",
"# Define a groundedness feedback function\n",
"f_groundedness = (\n",
" Feedback(grounded.groundedness_measure_with_cot_reasons, name = \"Groundedness\")\n",
" .on(Select.RecordCalls.retrieve.rets.collect())\n",
" .on_output()\n",
" .aggregate(grounded.grounded_statements_aggregator)\n",
")\n",
"\n",
"# Question/answer relevance between overall question and answer.\n",
"f_qa_relevance = (\n",
" Feedback(fopenai.relevance_with_cot_reasons, name = \"Answer Relevance\")\n",
" .on(Select.RecordCalls.retrieve.args.query)\n",
" .on_output()\n",
")\n",
"\n",
"# Question/statement relevance between question and each context chunk.\n",
"f_context_relevance = (\n",
" Feedback(fopenai.qs_relevance_with_cot_reasons, name = \"Context Relevance\")\n",
" .on(Select.RecordCalls.retrieve.args.query)\n",
" .on(Select.RecordCalls.retrieve.rets.collect())\n",
" .aggregate(np.mean)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Construct the app\n",
"Wrap the custom RAG with TruCustomApp, add list of feedbacks for eval"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from trulens_eval import TruCustomApp\n",
"from trulens_eval import FeedbackMode\n",
"tru_rag = TruCustomApp(rag, app_id = 'RAG v1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Eval Randomization\n",
"\n",
"Create a function to run feedback functions randomly, depending on the record_id hash"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import hashlib\n",
"import random\n",
"\n",
"from typing import Sequence, Iterable\n",
"from trulens_eval.schema import Record, FeedbackResult\n",
"from trulens_eval.feedback import Feedback\n",
"\n",
"def random_run_feedback_functions(\n",
" record: Record,\n",
" feedback_functions: Sequence[Feedback]\n",
" ) -> Iterable[FeedbackResult]:\n",
" \"\"\"\n",
" Given the record, randomly decide to run feedback functions.\n",
"\n",
" args:\n",
" record (Record): The record on which to evaluate the feedback functions\n",
"\n",
" feedback_functions (Sequence[Feedback]): A collection of feedback functions to evaluate.\n",
"\n",
" returns:\n",
" `FeedbackResult`, one for each element of `feedback_functions`, or prints \"Feedback skipped for this record\".\n",
"\n",
" \"\"\"\n",
" # randomly decide to run feedback (50% chance)\n",
" decision = random.choice([True, False])\n",
" # run feedback if decided\n",
" if decision == True:\n",
" print(\"Feedback run for this record\")\n",
" tru.add_feedbacks(tru.run_feedback_functions(record, feedback_functions = [f_context_relevance, f_groundedness, f_qa_relevance]))\n",
" else:\n",
" print(\"Feedback skipped for this record\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate a test set"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from trulens_eval.generate_test_set import GenerateTestSet\n",
"test = GenerateTestSet(app_callable = rag.query)\n",
"test_set = test.generate_test_set(test_breadth = 4, test_depth = 1)\n",
"test_set"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the app\n",
"Run and log the rag applicaiton for each prompt in the test set. For a random subset of cases, also run evaluations."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# run feedback across test set\n",
"for category in test_set:\n",
" # run prompts in each category\n",
" test_prompts = test_set[category]\n",
" for test_prompt in test_prompts:\n",
" result, record = tru_rag.with_record(rag.query, \"How many professors are at UW in Seattle?\")\n",
" # random run feedback based on record_id\n",
" random_run_feedback_functions(record, feedback_functions = [f_context_relevance, f_groundedness, f_qa_relevance])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tru.get_leaderboard(app_ids=[\"RAG v1\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tru.run_dashboard()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "trulens18_release",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}