Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce manually managed partitions #56

Closed
wants to merge 1 commit into from

Conversation

ylegat
Copy link
Contributor

@ylegat ylegat commented Jan 1, 2025

The data used as a partition key is not always of the greatest quality. For instance, a timestamp column could contain some invalid dates referring to year -1. Ideally these values should be cleaned, but it is not always an option in a system where data is treated as immutable.

One way to deal with this situation is to add a default partition that will be in charge of containing data that cannot be assigned to any of the partitions already defined. Another solution, when the range of invalid values is known, is to move this data inside one or several dedicated partitions.

Both solutions involve the creation of partitions that cannot be managed by the partition manager, because their range break the policy defined for the other partitions.

For this reason we propose to introduce a new configuration property: manuallyManagedPartitions. It indicates which partitions should be ignored by the partition manager when checking or cleaning up partitions tables.

@ylegat ylegat force-pushed the allow-manually-managed-partitions branch 2 times, most recently from 05f5fde to afe42f3 Compare January 1, 2025 10:47
The data used as a partition key is not always of the greatest quality. For instance, a timestamp column could contain some invalid dates referring to year -1. Ideally these values should be cleaned, but it is not always an option in a system where data is treated as immutable.

One way to deal with this situation is to add a default partition that will be in charge of containing data that cannot be assigned to any of the partitions already defined. Another solution, when the range of invalid values is known, is to move this data inside one or several dedicated partitions.

Both solutions involve the creation of partitions that cannot be managed by the partition manager, because their range break the policy defined for the other partitions.

For this reason we propose to introduce a new configuration property: `manuallyManagedPartitions`. It indicates which partitions should be ignored by the partition manager when checking or cleaning up partitions tables.

Signed-off-by: Yohan Legat <[email protected]>
@ylegat ylegat force-pushed the allow-manually-managed-partitions branch from afe42f3 to d908602 Compare January 3, 2025 08:43
Copy link

github-actions bot commented Jan 3, 2025

Code Coverage

Package Line Rate Health
github.com/qonto/postgresql-partition-manager/internal/infra/postgresql 89%
github.com/qonto/postgresql-partition-manager/internal/infra/retry 100%
github.com/qonto/postgresql-partition-manager/internal/infra/uuid7 91%
github.com/qonto/postgresql-partition-manager/pkg/ppm 74%
Summary 79% (568 / 722)

Minimum allowed line rate is 60%

@dcupif
Copy link
Contributor

dcupif commented Jan 3, 2025

Hi @ylegat, thank you for your contribution 👍

Before going into the details of the implementation, I'd like to better understand the needs behind this MR.

I understand that the PPM is quite restrictive in how it allows you to setup your partitions. Our initial intention was to help apply best practices by leaving almost no flexibility in how the partitions are setup.

An example of this is that we intentionally do not support a default partition because this is not considered a good practice. Whilst it protects you against holes within your partitions, it can silently start accumulating data and slowly but surely degrade your performance which defies the initial purpose of partitioning.

Now, this being said, can you give me a bit more context related to having corrupted data? You mention having incorrect dates. I am not clear how this is blocking you in using the partitions generated by PPM?

I understand the requirement of immutability, but how is this data usable if not cleaned? 🤔 I honestly would rather challenge our ability to clean the data instead of quickly adding the ability to PPM to handle manually created partition.

There are a few reasons I can think of that makes me consider against adding this capability:

  1. the all purpose of this tool is to handle partitions automatically for you
  2. how would we validate that partitions created by the PPM will not overlap with the manually created partitions?
  3. it is unclear to me how we can properly orchestrate between having manual and automated partitions, it may be just an irrational feeling, but it sounds to me like a recipe for issues, wdyt?

@dcupif
Copy link
Contributor

dcupif commented Jan 9, 2025

As discussed together, your partitioning use case is not meant to be supported by PPM. We identified together that hash partitioning is more suitable to your needs which is not the "common" use case that PPM aims to cover.
Consequently I'm closing this PR, but we will support you in setting up the most adequate partitioning strategy for your needs. Thanks again for the contribution 👍

@dcupif dcupif closed this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants