Upgrade Opengpts #361

lgesuellip · 2024-10-24T20:44:46Z

Hi Team,

As a member of the Pampa Team, I’ve been working on this PR to upgrade OpenGPTs to the latest version of Langchain dependencies. This update ensures compatibility with Pydantic 2 and resolves issues with related packages.

Code changes

Migration to Pydantic 2
- Migrated the codebase to use Pydantic 2.
Langchain Dependency Upgrades
- Updated all langchain dependencies to their latest versions (like langchain, langchain-core, langgraph, etc).
- Removed langchain-robocorp as it is currently incompatible with Pydantic 2.
- Updated the unstructured dependency to resolve issues related to nltk and its associated packages, such as punkt.
Code Adaptations
- Refactored the checkpoint logic to use the AsyncPostgresSaver from the langgraph implementation for improved compatibility and performance.
Updated schemas to work with Pydantic 2's BaseModel.
Fixed bugs using GPT-4o.

Testing

Updated test cases to ensure compatibility with the new codebase.
Refactored and adapted tests to validate changes in schemas, checkpoint handling, and dependencies.

Looking forward to your feedback
Thank you Team!

backend/app/tools.py

lgesuellip · 2024-12-05T18:18:41Z

Hey team @eyurtsev @nfcampos , I just pushed all the changes.
I look forward to your review and feedback.
Thank you!

eyurtsev

Seeing two issues:

(MAJOR) w/ migration
(minor) not 100% sure but UI doesn't seem to load the full screen for creating a new bot --- requiring clicking through on a tab. This could be something associated w/ data returned to the UI through one of the endpoints

backend/app/tools.py

eyurtsev · 2024-12-18T16:54:44Z

backend/app/agent.py

@@ -265,7 +264,7 @@ class ConfigurableRetrieval(RunnableBinding):
    llm_type: LLMType
    system_message: str = DEFAULT_SYSTEM_MESSAGE
    assistant_id: Optional[str] = None
-    thread_id: Optional[str] = None
+    thread_id: Optional[str] = ""


Why is this not a None default?

I spent a lot of time debugging this part. The error indicates a conflict in the configuration specifications for thread_id.

The error occurs during validation in the following code:

@router.get("/config_schema") async def config_schema() -> dict: """Return the config schema of the runnable.""" return agent.config_schema().model_json_schema()

The issue seems to arise because there are two conflicting ConfigurableFieldSpec definitions for thread_id:
1. Definition 1: ConfigurableFieldSpec with annotation=typing.Optional[str] and default=None.
2. Definition 2: ConfigurableFieldSpec with annotation=<class 'str'> and default=''.

So, I decided to set the default to '', and it works. However, I would prefer to keep it as None. Do you know what might be causing the problem? The assistant_id is similar, but I don’t encounter this issue with it.

eyurtsev · 2024-12-18T16:55:43Z

backend/app/agent.py

@@ -135,7 +132,7 @@ class ConfigurableAgent(RunnableBinding):
    retrieval_description: str = RETRIEVAL_DESCRIPTION
    interrupt_before_action: bool = False
    assistant_id: Optional[str] = None
-    thread_id: Optional[str] = None
+    thread_id: Optional[str] = ""


Why is this not a None?

eyurtsev · 2024-12-18T21:19:46Z

backend/migrations/000002_checkpoints_update_schema.up.sql

-ALTER TABLE checkpoints
-    ADD COLUMN IF NOT EXISTS thread_ts TIMESTAMPTZ,
-    ADD COLUMN IF NOT EXISTS parent_ts TIMESTAMPTZ;
+-- Drop existing checkpoints-related tables if they exist


I believe this migration fails for anyone that's run migrations 1 through 4 already. the migration state is kept of in the database.

Should this run as step 5 so it'll run at the end?

So if you try to run opengpts with the previous version, and then apply PR on top and run things -- the migrations will not work.

W/ current approach it seems like any old threads are no longer usable from the app. (I'm assuming not super easy to recover b/c of the pickle serde that was used).

You’re right! I think the same thing happens in LangGraph when people decide to use the new checkpointer, right?

The new checkpointers can be versioned as far as I understand

https://github.com/langchain-ai/langgraph/blob/main/libs/checkpoint-postgres/langgraph/checkpoint/postgres/base.py#L27

So at least going forward there's a way to carry out schema migrations automatically.

But yeah going from the pickle checkpointer -> new checkpointer was a breaking change. I'm OK if we don't worry about this, don't think this affects that many users.

I'd just prefer if we didn't wipe out any potential sql tables that users may want to recover data from

eyurtsev · 2024-12-18T21:26:42Z

backend/migrations/000002_checkpoints_update_schema.up.sql

+-- Drop existing checkpoints-related tables if they exist
+ALTER TABLE IF EXISTS checkpoints RENAME TO old_checkpoints;
+DROP TABLE IF EXISTS checkpoint_writes;
+DROP TABLE IF EXISTS checkpoint_blobs;


is dropping any of these checkpoints resulting in dataloss? If so should we back them up as well?

backend/app/schema.py

eyurtsev · 2024-12-18T22:18:50Z

backend/app/checkpoint.py

-        if isinstance(value, list) and all(isinstance(v, BaseMessage) for v in value):
-            loaded["channel_values"][key] = [v.__class__(**v.__dict__) for v in value]
-    return loaded
+class AsyncPostgresCheckpoint(BasePostgresSaver):


Have you considered initializing on app start up and calling .setup() to set up the migration, and then avoiding doing the wrapping of the checkpointer?

It'll help keep the checkpoints in sync and remove some extra code here

Hey Eugene,

Checkpoint:

I’ve run into a few challenges while implementing langgraph’s checkpoint:

Global Checkpointer Initialization:
I defined the checkpointer in the lifespan and declare it as a global in agent.py. However, I encountered the error “Checkpointer not initialized” because the global instance wasn’t properly initialized before being accessed.
This issue occurs because the checkpointer depends on the application startup completing before it can be used.

Singleton Pattern Issues:
I tried using a singleton pattern for AsyncPostgresSaver to manage the global instance, initializing it during the lifespan. However, the initialization of AsyncPostgresSaver requires an async event loop, which isn’t always available—such as during testing—resulting in the error: “no running event loop.”

Current Implementation:
I implemented a solution inspired by the current approach in OpenGPTs, adapted to use the new checkpointer in LangGraph:

Singleton with Lazy Initialization: Created a BasePostgresSaver class with a singleton pattern that assigns the instance before initialization.

Async Setup Method: Moved the connection pool creation to an async setup() method, ensuring it initializes during the lifespan of the application when an asynchronous loop is available.

I’m open to trying a different approach if you have any suggestions or recommendations!

Migration:

Finally, I decided to run the migrations before using the app to stay consistent with OpenGPTs and ensure the queries are ready. I am using the same .sql.

What do you think?

kabylkassymov · 2024-12-24T09:57:17Z

backend/app/checkpoint.py

+                f"{os.environ['POSTGRES_DB']}"
+            )
+
+            conn = AsyncConnectionPool(


/usr/local/lib/python3.11/site-packages/psycopg_pool/pool_async.py:142: RuntimeWarning: opening the async pool AsyncConnectionPool in the constructor is deprecated and will not be supported anymore in a future release. Please use `await pool.open()`, or use the pool as context manager using: `async with AsyncConnectionPool(...) as pool:

lgesuellip added 2 commits October 24, 2024 17:40

Migrate pydantic

cbbda36

Upgrade poetry

594d9cd

eyurtsev reviewed Oct 24, 2024

View reviewed changes

backend/app/tools.py Outdated Show resolved Hide resolved

lgesuellip added 10 commits November 13, 2024 23:23

Adapt to manage checkpoint using an AstncSaver

9e3bbd4

Adjust Tools model

aacf3db

Add checkpoint

5f8f2e1

Update poetry

4f151db

Format

7483722

Fix tests

23b0f2a

Modify tables

27f1df2

Fix gpt4o

538f46f

Fix bots

34bc8a1

Fix retrieval

024ae14

eyurtsev reviewed Dec 18, 2024

View reviewed changes

kabylkassymov reviewed Dec 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Opengpts #361

Upgrade Opengpts #361

lgesuellip commented Oct 24, 2024 •

edited

Loading

lgesuellip commented Dec 5, 2024

eyurtsev left a comment

eyurtsev Dec 18, 2024

lgesuellip Dec 19, 2024 •

edited

Loading

eyurtsev Dec 18, 2024

eyurtsev Dec 18, 2024

lgesuellip Dec 19, 2024

eyurtsev Dec 19, 2024

eyurtsev Dec 18, 2024

eyurtsev Dec 18, 2024

lgesuellip Dec 19, 2024

kabylkassymov Dec 24, 2024 •

edited

Loading

Upgrade Opengpts #361

Are you sure you want to change the base?

Upgrade Opengpts #361

Conversation

lgesuellip commented Oct 24, 2024 • edited Loading

Code changes

Testing

lgesuellip commented Dec 5, 2024

eyurtsev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgesuellip Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kabylkassymov Dec 24, 2024 • edited Loading

Choose a reason for hiding this comment

lgesuellip commented Oct 24, 2024 •

edited

Loading

lgesuellip Dec 19, 2024 •

edited

Loading

kabylkassymov Dec 24, 2024 •

edited

Loading