Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of MongoDB for feedbacks - Revised #107

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Kannav02
Copy link
Collaborator

This PR aims to fix a small part of the issue #75

The objective of this PR can be tracked via the following points

  • Correcting the requirements to now include pyMongo
  • Integrating MongoDB as the database to which the feedback will go back
  • referencing different contexts based on their IDs to the main table is the feedback table

To view/test these changes, follow the following steps

  • run the frontend with the mock server
  • specify the MONGO_DB_URI to which the feedback and the context would be fed back to
  • enter a prompt
  • submit a feedback
  • now access the MongoDB instance and you will see the data there, including the timestamp of submissions

This is what I got when I ran this twice

Screenshot 2024-11-25 at 9 49 49 PM
Screenshot 2024-11-25 at 9 50 04 PM

Follow-up question, are there limited number of contexts that we have right now for this application, if yes , I might optimize the database insertions rather not to insert the pre-existing contexts, but to reference them and insert their ids into the main table

Thank you!

@Kannav02
Copy link
Collaborator Author

@luarss , even now I believe there is some problem with the CI pipeline, what should we do for the same?

@Kannav02 Kannav02 requested a review from luarss November 27, 2024 01:55
@luarss
Copy link
Collaborator

luarss commented Nov 27, 2024

Follow-up question, are there limited number of contexts that we have right now for this application, if yes , I might optimize the database insertions rather not to insert the pre-existing contexts, but to reference them and insert their ids into the main table

Do you mean to reference them as context IDs? Where would we see the mapping between context ID and their contents?

frontend/Dockerfile Outdated Show resolved Hide resolved
@Kannav02
Copy link
Collaborator Author

Follow-up question, are there limited number of contexts that we have right now for this application, if yes , I might optimize the database insertions rather not to insert the pre-existing contexts, but to reference them and insert their ids into the main table

Do you mean to reference them as context IDs? Where would we see the mapping between context ID and their contents?

so my idea is ,I have a main table for all the information and then a table for context, within that table I am abstracting all the details related to context, so you can reference the context details using the context ID assigned to it, my assumption was there are a limited number of context right now, so it shouldn't be a problem , but I just wanted to clarify this with you

@luarss
Copy link
Collaborator

luarss commented Dec 18, 2024

Hello, sorry for the delayed response. Can you please send me your e-mail at [email protected] so we can further discuss about this offline?

@Kannav02
Copy link
Collaborator Author

Sure, i'll send an email about this

Thank you!

@Kannav02
Copy link
Collaborator Author

Hey @luarss , so i was looking at different options for hosting MongoDB on GCP, I found two ways

  1. first one is just directly using a VM and installing MongoDB on that, its not that good in terms of flexibility and portability, so I wouldn't go with this

  2. second one is probably the one we could look into, using Docker instance and deploying them on a small instance for now to test its functionality and then scaling it to a big instance later on

Which one would you prefer?

@luarss
Copy link
Collaborator

luarss commented Jan 18, 2025

Second one is preferred. Can you let me know what are the minimum requirements?

@Kannav02
Copy link
Collaborator Author

So for the instance for containers deployment we can use GKE , and the instance type should be really small for now as we're just getting started with it so maybe any of the e2 family instances should be good and maybe if we look at separate storage for faster access , we could go with supplementary 5GB SSD? but this suggestion is quite optional for now

Screenshot 2025-01-18 at 8 39 22 PM

On another note, I told you about the custom deployment options, but I forgot to tell you about that MongoDB atlas has a direct deployment feature from the cloud with google cloud, it is good but we might need to have a separate discussion if we're looking into this as well,

This is the link for you reference: https://www.mongodb.com/resources/products/platform/mongodb-on-google-cloud

Thank you

@luarss
Copy link
Collaborator

luarss commented Jan 19, 2025

Thanks for doing the research! I am more inclined towards the first solution of GKE. That is also what I chose for the prototype deployment of the RAG webapp.

Let us go with the e2-micro for prototyping for now. You can assume this DB will be on the same internal network as the other nodes, so no need to expose ports externally

@Kannav02
Copy link
Collaborator Author

Got it, just to be on the same page, should I open up a separate issue related to the deployment of the Database, as this is kind of related to the development part,

Also I had a question, should we proceed with the deployments after we're done with the UI and the feedback functionality for MongoDB, or should it be done in parallel.

Thank you Jack!

@luarss
Copy link
Collaborator

luarss commented Jan 20, 2025

Yup, that is fine. Let's put the deployment details to a latter PR.

@Kannav02
Copy link
Collaborator Author

perfect then, I will finish up with MongoDB implementation in this week with documentation and then we can work on UI dashboard

once again thank you jack for your help and letting me be a part of this project

@Kannav02
Copy link
Collaborator Author

Kannav02 commented Jan 21, 2025

Hey @luarss , hope you're doing well!

Just something I wanted to discuss with you, in the previous meeting we talked about the database structure and how we can link the context right to the main schemas, turns out when the response is returned from the backend, it is returned in the following format

    if user_input.list_sources and user_input.list_context:
        response = {
            "response": result["answer"],
            "sources": (links),
            "context": (context),
        }

You can see that it isn't mentioned from what source the context is coming from,

but lets say if we were to assume that from what source is a particular context coming from , we might have to make changes to how data is sent back to the frontend, so we can derive a relationship

whats your opinion on this?

- schema corrected for the database
- parameters included in the main submit_feedback function
- insertion corrected to utilise the correct datetime.now() function

Signed-off-by: Kannav02 <[email protected]>
- added the function to now submit feedback back to mongoDB
- corrected the sys path to now include common as a package, workaround , kind of like a pseudopackage

Signed-off-by: Kannav02 <[email protected]>
@luarss
Copy link
Collaborator

luarss commented Jan 21, 2025

Hi @Kannav02, we might have to do some modification of the backend code. The links variable is currently deduplicated, but it should have a one-to-one correspondence with context (i.e. same length)

@Kannav02
Copy link
Collaborator Author

sure, we can work on this , I was wondering if we can possibly have another meeting for the same, I need to also show you what I've been thinking about the same

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants