-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Opengpts #361
base: main
Are you sure you want to change the base?
Upgrade Opengpts #361
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing two issues:
- (MAJOR) w/ migration
- (minor) not 100% sure but UI doesn't seem to load the full screen for creating a new bot --- requiring clicking through on a tab. This could be something associated w/ data returned to the UI through one of the endpoints
@@ -265,7 +264,7 @@ class ConfigurableRetrieval(RunnableBinding): | |||
llm_type: LLMType | |||
system_message: str = DEFAULT_SYSTEM_MESSAGE | |||
assistant_id: Optional[str] = None | |||
thread_id: Optional[str] = None | |||
thread_id: Optional[str] = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a None default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a lot of time debugging this part. The error indicates a conflict in the configuration specifications for thread_id
.
The error occurs during validation in the following code:
@router.get("/config_schema")
async def config_schema() -> dict:
"""Return the config schema of the runnable."""
return agent.config_schema().model_json_schema()
The issue seems to arise because there are two conflicting ConfigurableFieldSpec definitions for thread_id:
1. Definition 1: ConfigurableFieldSpec with annotation=typing.Optional[str] and default=None.
2. Definition 2: ConfigurableFieldSpec with annotation=<class 'str'> and default=''.
So, I decided to set the default to '', and it works. However, I would prefer to keep it as None. Do you know what might be causing the problem? The assistant_id
is similar, but I don’t encounter this issue with it.
@@ -135,7 +132,7 @@ class ConfigurableAgent(RunnableBinding): | |||
retrieval_description: str = RETRIEVAL_DESCRIPTION | |||
interrupt_before_action: bool = False | |||
assistant_id: Optional[str] = None | |||
thread_id: Optional[str] = None | |||
thread_id: Optional[str] = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not a None?
ALTER TABLE checkpoints | ||
ADD COLUMN IF NOT EXISTS thread_ts TIMESTAMPTZ, | ||
ADD COLUMN IF NOT EXISTS parent_ts TIMESTAMPTZ; | ||
-- Drop existing checkpoints-related tables if they exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this migration fails for anyone that's run migrations 1 through 4 already. the migration state is kept of in the database.
Should this run as step 5 so it'll run at the end?
So if you try to run opengpts with the previous version, and then apply PR on top and run things -- the migrations will not work.
W/ current approach it seems like any old threads are no longer usable from the app. (I'm assuming not super easy to recover b/c of the pickle serde that was used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right! I think the same thing happens in LangGraph when people decide to use the new checkpointer, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new checkpointers can be versioned as far as I understand
So at least going forward there's a way to carry out schema migrations automatically.
But yeah going from the pickle checkpointer -> new checkpointer was a breaking change. I'm OK if we don't worry about this, don't think this affects that many users.
I'd just prefer if we didn't wipe out any potential sql tables that users may want to recover data from
-- Drop existing checkpoints-related tables if they exist | ||
ALTER TABLE IF EXISTS checkpoints RENAME TO old_checkpoints; | ||
DROP TABLE IF EXISTS checkpoint_writes; | ||
DROP TABLE IF EXISTS checkpoint_blobs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is dropping any of these checkpoints resulting in dataloss? If so should we back them up as well?
if isinstance(value, list) and all(isinstance(v, BaseMessage) for v in value): | ||
loaded["channel_values"][key] = [v.__class__(**v.__dict__) for v in value] | ||
return loaded | ||
class AsyncPostgresCheckpoint(BasePostgresSaver): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered initializing on app start up and calling .setup()
to set up the migration, and then avoiding doing the wrapping of the checkpointer?
It'll help keep the checkpoints in sync and remove some extra code here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Eugene,
Checkpoint:
I’ve run into a few challenges while implementing langgraph’s checkpoint:
-
Global Checkpointer Initialization:
I defined the checkpointer in the lifespan and declare it as a global inagent.py
. However, I encountered the error “Checkpointer not initialized” because the global instance wasn’t properly initialized before being accessed.
This issue occurs because the checkpointer depends on the application startup completing before it can be used. -
Singleton Pattern Issues:
I tried using a singleton pattern forAsyncPostgresSaver
to manage the global instance, initializing it during the lifespan. However, the initialization ofAsyncPostgresSaver
requires an async event loop, which isn’t always available—such as during testing—resulting in the error: “no running event loop.” -
Current Implementation:
I implemented a solution inspired by the current approach in OpenGPTs, adapted to use the new checkpointer in LangGraph:
- Singleton with Lazy Initialization: Created a
BasePostgresSaver
class with a singleton pattern that assigns the instance before initialization. - Async Setup Method: Moved the connection pool creation to an async
setup()
method, ensuring it initializes during the lifespan of the application when an asynchronous loop is available.
I’m open to trying a different approach if you have any suggestions or recommendations!
Migration:
Finally, I decided to run the migrations before using the app to stay consistent with OpenGPTs and ensure the queries are ready. I am using the same .sql.
What do you think?
f"{os.environ['POSTGRES_DB']}" | ||
) | ||
|
||
conn = AsyncConnectionPool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/usr/local/lib/python3.11/site-packages/psycopg_pool/pool_async.py:142: RuntimeWarning: opening the async pool AsyncConnectionPool in the constructor is deprecated and will not be supported anymore in a future release. Please use `await pool.open()`, or use the pool as context manager using: `async with AsyncConnectionPool(...) as pool:
Hi Team,
As a member of the Pampa Team, I’ve been working on this PR to upgrade OpenGPTs to the latest version of Langchain dependencies. This update ensures compatibility with Pydantic 2 and resolves issues with related packages.
Code changes
Migration to Pydantic 2
Langchain Dependency Upgrades
langchain
dependencies to their latest versions (likelangchain
,langchain-core
,langgraph
, etc).langchain-robocorp
as it is currently incompatible with Pydantic 2.unstructured
dependency to resolve issues related tonltk
and its associated packages, such aspunkt
.Code Adaptations
AsyncPostgresSaver
from thelanggraph
implementation for improved compatibility and performance.Updated schemas to work with Pydantic 2's BaseModel.
Fixed bugs using GPT-4o.
Testing
Looking forward to your feedback
Thank you Team!