Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add PeerDB input plugin for handling Postgres CDC #2596

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iskakaushik
Copy link

  • Introduced a new PeerDB input plugin for Benthos that reads from a Postgres DB using CDC (Change Data Capture) with pglogrepl - https://github.com/jackc/pglogrepl.
  • Change Data Capture (CDC) is a process that captures changes made to a database and ensures that these changes are reflected in downstream systems. This is crucial for keeping data in sync across distributed systems, enabling real-time data processing, and supporting various data integration scenarios.
  • The implementation includes maintaining a persistent replication connection to ensure continuous data capture. The last offset LSN (Log Sequence Number) is stored in a cache to keep track of the replication progress, allowing the system to resume from the last processed point in case of disruptions. This ensures data consistency and reliability in the replication process. The cache has to be a persistent cache for correctness.
  • Follow-up updates will include support for initial load and more comprehensive documentation.

@iskakaushik iskakaushik requested a review from Jeffail as a code owner May 16, 2024 16:51
@iskakaushik iskakaushik force-pushed the peerdb-postgres-cdc branch from e1eccd6 to 464a224 Compare May 16, 2024 16:52
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the name here. Has this something to do with PeerDB and their (commercial) offering? Or would you just share the name because the concept is the same...? https://github.com/jackc/pglogrepl doesn't mention "peerdb" anywhere.


// Run a postgres container
resource, err := pool.RunWithOptions(&dockertest.RunOptions{
Repository: "debezium/postgres",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this significant? Would the stock postgres image be missing extensions..? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image has some changes to the default postgres image, it isn't hard to get this in there but was more convenient to just is it, specifically

wal_level = logical
max_wal_senders = <N > 0>
max_replication_slots = < N > 0>

@srenatus
Copy link
Contributor

I'm just a passer-by reviewing what looked like a very cool addition. Please bear with me 😅

@iskakaushik
Copy link
Author

I'm just a passer-by reviewing what looked like a very cool addition. Please bear with me 😅

Thanks for taking the time to look at the PR =)

@srenatus
Copy link
Contributor

@iskakaushik I'm still confused about the "peerdb" name (see previous comment). Could you clear that up for me? 😅

@Jeffail Jeffail requested a review from asimms41 July 23, 2024 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants