Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] generators debugging #875

Draft
wants to merge 454 commits into
base: main
Choose a base branch
from
Draft

Conversation

piotrm0
Copy link
Contributor

@piotrm0 piotrm0 commented Feb 8, 2024

Debugging generator serialization issues responsible for at least two reported Issues.

joshreini1 and others added 30 commits October 6, 2023 16:39
* version bump combine nb to docs

* hotfix
* first try

* starting feedback imp tests

* working on feedback tests

* nits

* more tests

* more adjustments

* disable unit tests for now

* added in-domain tests variants to run for now
* if groundedness output is not list, set as list so agg functions properly

* fix 0 resolving to null then -1 bug

* fix groundedness measure error, move warning from init to only renamed method
* Add groundedness to Pinecone notebook

* Fix definition per suggestion

---------

Co-authored-by: Josh Reini <[email protected]>
* add langchain multi-retrieval agents + chroma vector mgr example

* basic one feedback function working e2e

* fix deps version

* add response length custom feedback func

* update notebook with markdown + more feedback functions + deferred mode

add markdown and colab widget

remove ckp
* Update dependencies for Pinecone example

* Format notebook (isort & yapf)
* split off work on threading issues

* work on dummy example

* prototyping various thread robustness solutions

* work

* working on threading and feedback results

* more ignores

* nevermind that last gitignore addition

* remove unneeded

* added feedback result retrieval into langchain quickstart

* don't use submit inside feedback functions

* what the last thing said
* updating JSONPath, renamed to Lens

* added storage of paths as strings instead of json structures

* python parsing variations

* more parsing variants for python 3.11

* makefile

* version bounds format

---------

Co-authored-by: Josh Reini <[email protected]>
* handle no pii

* to do on error handling

* logger.debug no pii found
* fix link, add trucustom

* finish update

* Update basic_instrumentation.ipynb

remove duplicated text
* fix

* fixes

* component fixes

* make backwards compatible

* remove print message

* fix another old __call__ usage meaning "get"

* remove dist form gitignore

* include dist

* typing fix
* creating quickstart notebook for appui

* fixing some runner bugs and adding info to quickstart

* add screenshots to repo

* clear output

* unneeded try
* version bump to 0.16.0

* combine nb to docs
* dedup

* delete dups

* another try

* make one assets and images folder instead of two

---------

Co-authored-by: Josh Reini <[email protected]>
* Update use_cases_production.md

update wrong descriptions for azure/aws

* switch to new deferred example
* Save working code for feedback direction

* Use supplied_name as key if any for direction lookup

* Fix direction lookup for cell style

* Use OpenAI moderation category score as is (lower score is better)

* Update OpenAI moderation API test cases
* added SummEval test generator

groundedness smoke test for huggingface NLI model and open ai

remove extra files

clean up

clean up

* addressed pr comments

* removed sqlite

* rename f_groundtruth to f_mae

* link docs

* fixed local path

* added gpt-3.5-turbo vs gpt-4

* numeric differenc to mae

* to mae, groundedness notebook

* clear noisy cell output

* allow messages and prompt args

* answer relevance smoke tests updated

* remove cot versions

* re-run context relevance with mae

* update function definitions docs with link, fix broken link

* fix symlink

* fix function definitions links

* remove cot definitions

* small nits to md

---------

Co-authored-by: Josh Reini <[email protected]>
* Improve cot reasoning

Add prompting to influence llm to tie reasons back to the evaluation being performed

* Update prompts.py

Add criteria and tie it back language to cot template

* update docstring to include reasons template

* undo

* Update prompts.py

revert template back to supporting evidence

* reason filtering only to supporting evidence

* revert docstring

* small change

* update extract score/reasons

* remove unneeded line. extract_score/reason not used for groundedness

* add higher is better, fix title for moderation notebook
piotrm0 and others added 26 commits January 31, 2024 21:29
* update azure example, also show provider extension

* update trulens version

* remove force dashboard

* add score-only example

* custom feedback docs

* refactor for user-facing generate_score and generate_score_and_reasons methods

* update azure example with more user-friendly methods

* update bedrock with more user-friendly methods

* user friendly methods for provider extension

* types in docstring
* first

* CI update

* global import static test

* disable py312 test

* add nltk optional fix

* remove unrelated

* add format and python bound

* nit

* adjust matrix cell name for ordering

* spec

* disable tests when optional package not installed

* adjust ci script

* syntax issues

* typo

* test

* remove more

* newline

* name first

* comment

* nit

* matrix no sequences

* remove duplicate matrix cell

* displayname again

* unexpected symbol

* hmm

* fix test import

* optional tests marking

* imports tests

* few more ellipses

* remove non-existant import

* expected error

* fixing optionals

* message on subtest

* add subtests requirement

* fix discovered import bugs

* more fixes

* adjust ipython requirement

* downgrade bound even more for python 3.9

* more fixes

* ipython fix

* remove ipykernel installation in unit tests

* change pr job name

* name

* more informative tests name

* rename optional var and renable format condition

* cond try fix

* nit

* again

---------

Co-authored-by: Josh Reini <[email protected]>
#839)

Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 4.4.2 to 4.5.2.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/v4.5.2/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v4.5.2/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Josh Reini <[email protected]>
* make more tests pass

* small changes

* more tests working

* skip moderation

* test for multiple models

* more passing

* unused e removed

* fix typing issues

* more cot tests

* incorrect prompt

* more cot reasons tests

* stereotypes more extreme

* improve stereotyping prompt

* typo

* unittest only gpt-3.5-turbo

* add missing import

* mark calibration as optional test

* fix typo

* move oai import to top[]

* oai imports for all testss
* debug why static tests did not fail

* check more base modules

* update module hierarchy doc and delete another deprecated module

* trubot is optional

* don't try to run a script in static tests

* Don't try to import migrations env

* typo

* nit

* cleanup and assertion failure messages

* pinecone optional message

* update pinecone usage

* note
* prototype

* update md

* fix run_feedback

* update

* update

* update
@piotrm0 piotrm0 marked this pull request as draft February 8, 2024 02:17
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.