Refined prompt construction for feedback #1058

piotrm0 · 2024-04-08T05:58:34Z

Items to add to release announcement:

Heading: delete this list if this PR does not introduce any changes that need announcing.

Other details that are good to know but need not be announced:

There should be something here at least.

Work in progress. Designing prompts for feedback from several common parts and allow different sizes of such prompts depending on the allowable space:

class ScoringPromptBase(SerialModel):
        """Common parts to build a scoring prompt out of."""

        prefix_template: str = """"""
        """Text to include before all other parts."""

        interp_template:str = """You are a {purpose} scorer."""
        details_template: str = """"""

        low_score: int = 1
        high_score: int = 10

        shots_template: str = """EXAMPLES:"""
        shot_template: str = """
INPUTS:
{shot_inputs}
EXPECTED OUTPUT: {shot_output}"""

        suffix_template = """
Answer only with an integer from {low_score} ({low_interp}) to {high_score} ({high_interp}).

INPUTS:
{inputs}
SCORE: """
        """Text to include after all other parts."""

    class ScoringPrompt(ScoringPromptBase):
        """Specific parts to build a scoring prompt out of."""

        interp: str
        """Minimal interpretation of the score."""

        low_interp: str
        """Interpretation of a low score."""

        high_interp: str
        """Interpretation of a high score."""

        details: str
        """Text to include after the purpose to provide some more details about
        the purpose."""

        shots: List[Tuple[Dict[str, str], int]] = []

        def build(
            self,
            max_tokens: int = sys.maxsize,
            inputs: Optional[Dict[str, str]] = None
        ) -> str:
            """Build a prompt for the given inputs while staying under the token
            limit.
            
            The built prompt will have at least:
                - filled prefix_template,
                - filled interp_template,
                - filled suffix_template.

            If space allows, it will also include:
                - filled details_template,
                - filled shots_template (if at least one of the below is
                  included),
                - filled in shot_template for each shot.

            Args:
                max_tokens: The maximum number of tokens to use. Defaults to
                    sys.maxsize (effectively infinite).

                inputs: The inputs to fill in the prompts
                    with other than the ones defining the scoring task. Those
                    are contained in self.
            
            Returns:
                str: The built prompt.

            Raises:
                ValueError: If the prompt would be too long to fit within the
                    token limit even at its miminum size.

            """

* first * working version, maybe * note * fixes * note * nit

Co-authored-by: Piotr Mardziel <[email protected]>

* remove extra reset cell * fix langchain prompt import, quickstart * async fix imports, update install pins * fix langchain prompttemplate imports * ada embeddings in quickstart, pinned install versions * pin package versions * clear output

Co-authored-by: joshreini1 <[email protected]>

Co-authored-by: Josh Reini <[email protected]>

* fix quickstart imports * fix langchain trulens imports

Co-authored-by: joshreini1 <[email protected]>

Co-authored-by: Josh Reini <[email protected]>

* move model comparison to use cases (expected location) * multimodal example with trullama * remove extra commit

Co-authored-by: Shayak Sen <[email protected]> Co-authored-by: Josh Reini <[email protected]>

Co-authored-by: Josh Reini <[email protected]>

* version bump quickstarts * version bump py quickstarts * version bump all_tools * format quickstarts * version bump init * update package one-liner

* MLNN-1128 * feedback

* added wrapper for dynamically generated functions in boto3 * docs and remove debug prints * typo * remove unused * testing out streaming counting

not_toxic -> toxic, fix docstring. code itself is correct.

Co-authored-by: Shayak Sen <[email protected]>

* add generator for answer relevance using SummEval save wip work test (#627) wip * remove sqlit * groundedness eval across 100 examples * rm * Update groundedness_smoke_tests.ipynb fix typo * typo --------- Co-authored-by: Josh Reini <[email protected]>

* add langchain prompt template * add langchain template to _langchain_evaluate * pass through criteria, use standard cot reasons template

* change assertion from dict to object * get model, usage as attr not from dict

Co-authored-by: Daniel <[email protected]>

Co-authored-by: Josh Reini <[email protected]>

* fix: italise TruLens-Eval ref * fix: italise TruLens-Eval ref in root scripts. * docs: add contribution instructions for proper names with mod to inverted commas. * Update standards.md Markdown lint prefers _ to * for emphasis. --------- Co-authored-by: Josh Reini <[email protected]> Co-authored-by: Piotr Mardziel <[email protected]>

* working on glossary * finish glossary draft * nits * Add some info regarding makefiles. --------- Co-authored-by: Josh Reini <[email protected]>

* more pipelines docs * adjust trigger for release tests * one more time * one more time * again * one more * one more try * nit * add a docs pipeline --------- Co-authored-by: Josh Reini <[email protected]>

Co-authored-by: piotrm0 <[email protected]> Co-authored-by: Josh Reini <[email protected]>

* Add if_missing. * add new enum to docs feedbacks page * make re_0_10 rating a bit more robust * adjust rating extraction test * check for integers only and remove unneeded imports

* fix image * feedback_function index updates * implementation and provider docs * feedback implementations llm-based * classificiaton implementations * feedback base provider docstrings * formatting of numbered lists * more example admonitions * tru custom app docs * isntrumentation api docs * virtual app api ref * add missing title

* fix some proper names * nits * too many, giving up * remove _ from mkdocs * llama indexes --------- Co-authored-by: Josh Reini <[email protected]>

* Spell fix * Added user feedback button to the sidebar * Updated share feedback text

* pin packaging * remove packaging, remove base langchain * remove langchain requirement * update comment * move nltk to required * nltk required, download punkt on init * add packaging requirement * move punkt download * bump langchain version * pin packaging 23.2 * logger debug for optional packages

* Fix import and favicon * Update requirements.txt --------- Co-authored-by: Josh Reini <[email protected]>

* removed pkg_resources * add reqs * remove duplicate * preserve note from duplicate * format * fix for py3.8 * format * nit * remove distutils as well and add notes * notes * nits * fix static_resource for py38 again

…val utils, and docs update (#991) * implement recommendation metrics for benchmark framework ece fix Revert "ece fix" This reverts commit c58ee7e. run actual evals add context relevance inference api to hugs ffs fmt larger dataset + smarter backoff + recall nb update (wip) fix how we handle ties in precision and recall saving results for GPT-3.5, GPT-4, Claude-1, and Claude-2 remove secrets * finished evals with truera context relevance model * add Verb 2S top 1 prompt * update ECE method pushed to server * save csv results for tmp scaling * save * implement meeting bank generator * example notebook for comprehensiveness benchmark WIP * gi# This is a combination of 2 commits. gainsight benchmarking done remove secrets * prepping comprehensiveness benchmark notebook * remove unused test script * moving results csvs * updates models * intermediate results code change * good stopping point * cleanup * symlink docs * huge doc updates * fix doc symlink * fix score range in docstring * add docstring for truera's context relevance model * update comprehensiveness notebook * update comprehensiveness notebook * fix * file renames * new symlinks * update mkdcos --------- Co-authored-by: Josh Reini <[email protected]> Co-authored-by: Josh Reini <[email protected]>

* atlas quickstart * header updates

* first * assistants api (rag) quickstart * fix indent

piotrm0 and others added 30 commits November 28, 2023 15:34

serialize openai client (#595)

e33f14b

* first * working version, maybe * note * fixes * note * nit

Fix helpfulness prompt (Issue #583) (#594)

8d0a132

Co-authored-by: Piotr Mardziel <[email protected]>

remove extra reset cell (#597)

a81438c

Automated File Generation from Docs Notebook Changes (#603)

2f67d65

Co-authored-by: joshreini1 <[email protected]>

Make CI pipeline run daily as well. (#599)

7ee48ab

Co-authored-by: Josh Reini <[email protected]>

Add quickstart.ipynb and other notebooks to notebooks to test. (#601)

a971708

fix quickstart imports (#610)

6b51f98

* fix quickstart imports * fix langchain trulens imports

Automated File Generation from Docs Notebook Changes (#611)

2f99ef0

Co-authored-by: joshreini1 <[email protected]>

first (#605)

efc61eb

Co-authored-by: Josh Reini <[email protected]>

update notebook link to proper URL (#619)

ce20d2e

Add data to the embedding database. (#622)

602cdac

Josh/multimodal rag llama (#617)

31efe7c

* move model comparison to use cases (expected location) * multimodal example with trullama * remove extra commit

Fix issue with groundtruth feedback function (#615)

9a9b215

Co-authored-by: Shayak Sen <[email protected]> Co-authored-by: Josh Reini <[email protected]>

Fix > character in prompt. (#623)

872544c

Co-authored-by: Josh Reini <[email protected]>

bedrock finetune experiment example (#618)

ee1df3a

move model comparison to use cases (expected location) (#614)

008ca25

change instantiation model_id -> model_engine (#612)

f04d915

Fix PromptTemplates import. (#604)

e82468d

Co-authored-by: Josh Reini <[email protected]>

Releases/rc trulens eval 0.18.2 (#624)

19210fd

* version bump quickstarts * version bump py quickstarts * version bump all_tools * format quickstarts * version bump init * update package one-liner

test (#627)

dc32754

fix: Escape unicode for the records table (#632)

faacc9f

* MLNN-1128 * feedback

added wrapper for dynamically generated functions in boto3 (#626)

951bec3

* added wrapper for dynamically generated functions in boto3 * docs and remove debug prints * typo * remove unused * testing out streaming counting

Update hugs.py (#633)

6a41f2a

not_toxic -> toxic, fix docstring. code itself is correct.

Fix llama index agents and multimodal notebooks (#637)

78868b5

Co-authored-by: Shayak Sen <[email protected]>

Update langchain prompt based evals (#636)

18bf0f4

* add langchain prompt template * add langchain template to _langchain_evaluate * pass through criteria, use standard cot reasons template

Migrate LiteLLM to v1 (#644)

a0b9e52

* change assertion from dict to object * get model, usage as attr not from dict

fix: initialization of AzureOpenAI (#640)

0fa51c3

Co-authored-by: Daniel <[email protected]>

updating az openai to use oepnai v1 api (#647)

e72850c

daniel-huang-1230 and others added 26 commits March 22, 2024 21:57

Parametrize temperature for create chat completion (#1026)

6f05a29

Co-authored-by: Josh Reini <[email protected]>

0.27.0 version bump (#1027)

7974d12

first (#1030)

82f2d68

docs glossary (#1029)

41d37f3

* working on glossary * finish glossary draft * nits * Add some info regarding makefiles. --------- Co-authored-by: Josh Reini <[email protected]>

fix doc link in hybrid retriever notebook (#1035)

9086f3e

docs README (#1034)

63e436b

Update with_app.md (#1036)

23bad45

more pipelines docs (#1033)

cbd6d36

* more pipelines docs * adjust trigger for release tests * one more time * one more time * again * one more * one more try * nit * add a docs pipeline --------- Co-authored-by: Josh Reini <[email protected]>

Automated File Generation from Docs Notebook Changes (#1031)

c30842a

Co-authored-by: piotrm0 <[email protected]> Co-authored-by: Josh Reini <[email protected]>

add missing job name (#1037)

2b5e303

Add if_missing. (#1038)

d013107

* Add if_missing. * add new enum to docs feedbacks page * make re_0_10 rating a bit more robust * adjust rating extraction test * check for integers only and remove unneeded imports

fix (#1043)

8a84b49

[DOCS] more proper names and glossary terms (#1042)

94088be

* fix some proper names * nits * too many, giving up * remove _ from mkdocs * llama indexes --------- Co-authored-by: Josh Reini <[email protected]>

Added feedback button to trulens (#1046)

66fd06d

* Spell fix * Added user feedback button to the sidebar * Updated share feedback text

Fix import and favicon (#1049)

3da61a0

* Fix import and favicon * Update requirements.txt --------- Co-authored-by: Josh Reini <[email protected]>

remove pkg_resources and distutils (#1052)

ccf03b4

* removed pkg_resources * add reqs * remove duplicate * preserve note from duplicate * format * fix for py3.8 * format * nit * remove distutils as well and add notes * notes * nits * fix static_resource for py38 again

version bump (#1053)

a9944b8

add missing pprint import (#1054)

29f1efb

bump 27 2 (#1055)

b6c96e9

MongoDB Atlas quickstart (#1056)

acba5ea

* atlas quickstart * header updates

OpenAI Assistants API (quickstart) (#1041)

5dbe80a

* first * assistants api (rag) quickstart * fix indent

design

64328fd

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 8, 2024

nits

aa9fc6c

piotrm0 marked this pull request as draft April 8, 2024 05:59

sfc-gh-dhuang force-pushed the main branch from 7e0fbf3 to 26848c6 Compare June 29, 2024 04:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refined prompt construction for feedback #1058

Refined prompt construction for feedback #1058

piotrm0 commented Apr 8, 2024 •

edited

Loading

Refined prompt construction for feedback #1058

Are you sure you want to change the base?

Refined prompt construction for feedback #1058

Conversation

piotrm0 commented Apr 8, 2024 • edited Loading

piotrm0 commented Apr 8, 2024 •

edited

Loading