feat(storage): support non_pk_prefix_watermark state cleaning #19889

Li0k · 2024-12-23T06:06:15Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

related to #18802

This PR supports non_pk_prefix_watermark state cleaning for Hummock.

Since non_pk_prefix_watermark relies on catalogs, this introduces additional overhead. Therefore, this PR does not guarantee read filtering for non_pk_prefix_watermark and only handles expired data.

The changes are as follows:

watermarks of type non_pk_prefix_watermark are not added to ReadWatermarkIndex.
state table support to write non_pk_prefix_watermark and serialize.
compaction catalog agent support to get watermark serde
~~skip watermark iterator supports filtering non_pk_prefix_watermark.~~ support NonPkPrefixSkipWatermarkIterator to filter table data of NonPkPrefix Watermark.

Checklist

I have written necessary rustdoc comments.
I have added necessary unit tests and integration tests.
I have added test labels as necessary.
I have added fuzzing tests or opened an issue to track them.
My PR contains breaking changes.
My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

My PR needs documentation updates.

Release note

…nto li0k/storage_non_pk_watermark_clean

src/storage/src/hummock/store/version.rs

src/meta/src/hummock/manager/compaction/mod.rs

hzxa21 · 2025-01-03T08:16:31Z

src/meta/src/hummock/manager/compaction/mod.rs

+                .table_watermarks
+                .iter()
+                .filter_map(|(table_id, table_watermarks)| {
+                    if table_id_with_pk_prefix_watermark.contains(table_id) {


We already have a WaterMarkType define in the version, why don't we just use that to filter out table with non pk prefix watermark?

Also, if we filter out non pk prefix watermark here, how can compactor retrieve the non pk prefix watermark? Based on the logic here, it seems that we rely on the fact that non pk prefix watermark is present in the compact task.

Good catch , we should filter the watermark by WaterMarkType directly.

And, the filtered results are only passed to the picker, while all relevant watermarks are passed to the compactor (pk or non-pk).

src/storage/hummock_sdk/src/compact_task.rs

src/storage/hummock_sdk/src/table_watermark.rs

src/storage/hummock_sdk/Cargo.toml

hzxa21 · 2025-01-03T08:32:49Z

src/storage/src/hummock/iterator/skip_watermark.rs

@@ -42,10 +47,14 @@ pub struct SkipWatermarkIterator<I> {
 }

 impl<I: HummockIterator<Direction = Forward>> SkipWatermarkIterator<I> {


nits: since SkipWatermarkIterator is only used by compactor, how about moving skip_watermark.rs into src/hummock/compactor?

Of course, I will propose a separate pr for it

hzxa21 · 2025-01-03T08:41:45Z

src/storage/src/hummock/iterator/skip_watermark.rs

+                                            });
+                                    let watermark_col_in_pk =
+                                        row.datum_at(*watermark_col_idx_in_pk);
+                                    cmp_datum(


IIUC, if cmp_datum returns Euqal | Greater, based on the logic in L360, the watermark will be advanced. I think this is incorrect for non pk prefix watermark because the non pk prefix watermark and the pk doesn't have the same ordering.

…nto li0k/storage_non_pk_watermark_clean

hzxa21 · 2025-01-08T13:45:21Z

src/meta/src/hummock/manager/compaction/mod.rs

+            let table_watermarks = version
+                .latest_version()
+                .table_watermarks
+                .iter()
+                .filter_map(|(table_id, table_watermarks)| {
+                    if matches!(
+                        table_watermarks.watermark_type,
+                        WatermarkSerdeType::PkPrefix,
+                    ) {
+                        Some((*table_id, table_watermarks.clone()))
+                    } else {
+                        None
+                    }
+                })
+                .collect();


Actually why don't we do the filtering inside the picker instead like in here if the watermark type is part of TableWatermarks:

risingwave/src/meta/src/hummock/compaction/selector/vnode_watermark_selector.rs

Line 53 in 5ed4920

let table_watermarks =

We can avoid cloning the table watermark, which can be large given that it stores bytes from user data, with no harm.

Any ideas on this comment? @Li0k

hzxa21 · 2025-01-08T13:56:15Z

src/storage/src/hummock/iterator/skip_watermark.rs

+                                }
+                                WatermarkSerdeType::Serde(_watermark) => {
+                                    // do not skip the non-pk prefix watermark when vnode is the same
+                                    return false;


I am afraid this is still incorrect based on the semantic of advance_watermark:

/// Return a flag indicating whether the current key will be filtered by the current watermark.

If we always return false when the table, vnode are the same here, that means none of the keys can be filtered by the watermark. Please clearfully walk through the logics of advance_watermark, should_delete and advance_key_and_watermark. I am still concerned that the implementation of SkipWatermarkState and SkipWatermarkIterator rely on the assumption that the key ordering and watermark ordering is the same and we may still miss some changes.

…nto li0k/storage_non_pk_watermark_clean

hzxa21 · 2025-01-16T02:15:43Z

src/common/src/util/row_serde.rs

@@ -52,6 +52,18 @@ impl OrderedRowSerde {
        }
    }

+    #[must_use]
+    pub fn index(&self, idx: usize) -> Cow<'_, Self> {
+        if 1 == self.order_types.len() {


should we assert idx == 1 here?

No, row can be any length and index can be a generic function.

hzxa21 · 2025-01-16T02:19:13Z

src/meta/src/hummock/manager/compaction/mod.rs

+            let table_watermarks = version
+                .latest_version()
+                .table_watermarks
+                .iter()
+                .filter_map(|(table_id, table_watermarks)| {
+                    if matches!(
+                        table_watermarks.watermark_type,
+                        WatermarkSerdeType::PkPrefix,
+                    ) {
+                        Some((*table_id, table_watermarks.clone()))
+                    } else {
+                        None
+                    }
+                })
+                .collect();


Any ideas on this comment? @Li0k

src/storage/hummock_sdk/src/compact_task.rs

hzxa21 · 2025-01-16T09:21:54Z

src/storage/src/hummock/compactor/compactor_runner.rs

-            ),
-            &self.compact_task.table_watermarks,
-        ))
+        let combine_iter = {


Let's add comment and referencing issue for state cleaning based on NonPkPrefix watermark iterator as well.

src/storage/src/hummock/compactor/compaction_utils.rs

hzxa21 · 2025-01-16T09:45:20Z

src/storage/src/hummock/iterator/skip_watermark.rs

+                    return direction.datum_filter_by_watermark(
+                        watermark_col_in_pk,
+                        watermark,
+                        watermark_col_serde.get_order_types()[0],


Given that we use [0] here, does it mean that either the watermark column is always a single column or the order type for all watermark columns must be the same? Is this assumption checked and guaranteed somewhere?

This assumes that the watermark is a single column. I've found no such guarantee on the state table side either.

From my point of view, this should be the point that the optimizer needs to guarantee, what is the reasonable way to do it? @hzxa21 @st1page

As said above, I think this guarantee should come from clean_watermark_index_in_pk. If the assumption is broken then using [0] should crash, which is fine with me.

If there are multiple watermark column of different ordering type, I think this code won't crash but generate incorrect result.

src/storage/src/hummock/iterator/skip_watermark.rs

…nto li0k/storage_non_pk_watermark_clean

gru-agent · 2025-01-20T07:50:17Z

This pull request has been modified. If you want me to regenerate unit test for any of the files related, please find the file in "Files Changed" tab and add a comment @gru-agent. (The github "Comment on this file" feature is in the upper right corner of each file in "Files Changed" tab.)

hzxa21 · 2025-01-21T07:36:53Z

src/storage/src/hummock/iterator/skip_watermark.rs

+    fn should_delete(&mut self, key: &FullKey<&[u8]>) -> bool;
+    fn reset_watermark(&mut self);
+
+    // fn new(watermarks: BTreeMap<TableId, ReadTableWatermark>) -> Self;


hzxa21 · 2025-01-21T07:36:59Z

src/storage/src/hummock/iterator/skip_watermark.rs

+    fn reset_watermark(&mut self);
+
+    // fn new(watermarks: BTreeMap<TableId, ReadTableWatermark>) -> Self;
+    // fn from_safe_epoch_watermarks(safe_epoch_watermarks: BTreeMap<u32, TableWatermarks>) -> Self;


hzxa21 · 2025-01-21T07:37:41Z

src/storage/src/hummock/iterator/skip_watermark.rs

@@ -172,32 +165,30 @@ impl<I: HummockIterator<Direction = Forward>> HummockIterator for SkipWatermarkI
        self.inner.value_meta()
    }
 }
-pub struct SkipWatermarkState {
+
+pub trait SkipWatermarkState: Send {


Please move this trait definition to the top of this file to make the codes more readable. Also, can you add documentation for each method?

hzxa21 · 2025-01-21T07:55:56Z

src/storage/src/hummock/iterator/skip_watermark.rs

+                    continue;
+                }
+                Ordering::Equal => {
+                    self.last_serde = self


I don't get why we need last_serde. It seems that we always retrieve it from compaction_catalog_agent_ref.watermark_serde.

I think you probably want to cache it per table id. You should do the switch on table id switch instead.

hzxa21

Rest LGTM

feat(storage): basic of non_pk_watermark state clean

605f235

github-actions bot added type/feature ci/run-e2e-single-node-tests ci/run-e2e-test-other-backends labels Dec 23, 2024

Li0k added 2 commits December 23, 2024 15:27

feat(storage): ignore non_pk_prefix_watermark compaction

501d374

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

3544c0e

…nto li0k/storage_non_pk_watermark_clean

Li0k changed the title ~~feat(storage): non_pk_watermark state clean~~ WIP: feat(storage): non_pk_watermark state clean Dec 23, 2024

Li0k marked this pull request as ready for review December 23, 2024 07:28

github-actions bot added the Invalid PR Title label Dec 23, 2024

fix ut

d1a39a8

graphite-app bot requested a review from a team December 23, 2024 08:18

Li0k added 4 commits December 23, 2024 17:01

fix panic

7c3f521

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

e3dbc73

…nto li0k/storage_non_pk_watermark_clean

refactor(storage): refactor watermark type

b71eff9

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

9e0af8e

…nto li0k/storage_non_pk_watermark_clean

Li0k requested a review from a team as a code owner December 25, 2024 12:20

Li0k requested a review from xxchan December 25, 2024 12:20

Li0k added 10 commits December 25, 2024 20:22

typo

74336d6

fix(storage): fix wateramrk_col_idx_in_pk

96de9ba

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

3127678

…nto li0k/storage_non_pk_watermark_clean

fix check

49a48ad

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

bb7a29b

…nto li0k/storage_non_pk_watermark_clean

refactor

6b0b295

typo

3c23aa3

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

3113463

…nto li0k/storage_non_pk_watermark_clean

fix panic

b2e158e

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

fd308de

…nto li0k/storage_non_pk_watermark_clean

Li0k changed the title ~~WIP: feat(storage): non_pk_watermark state clean~~ feat(storage): support non_pk_prefix_watermark state cleaning Dec 30, 2024

github-actions bot removed the Invalid PR Title label Dec 30, 2024

typo

bf28307

Li0k added 2 commits December 30, 2024 14:53

typo

369d718

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

3500061

…nto li0k/storage_non_pk_watermark_clean

Li0k requested review from hzxa21, st1page and chenzl25 December 30, 2024 07:32

hzxa21 reviewed Jan 3, 2025

View reviewed changes

Li0k added 2 commits January 8, 2025 16:46

address comments

ef4c752

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

5ed4920

…nto li0k/storage_non_pk_watermark_clean

hzxa21 reviewed Jan 8, 2025

View reviewed changes

Li0k added 2 commits January 10, 2025 22:54

refactor

8130c61

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

7e8f6cd

…nto li0k/storage_non_pk_watermark_clean

xxchan removed their request for review January 15, 2025 08:01

hzxa21 reviewed Jan 16, 2025

View reviewed changes

Li0k added 2 commits January 20, 2025 15:49

address comments

ffded48

Merge branch 'main' of https://github.com/risingwavelabs/risingwave i…

492b689

…nto li0k/storage_non_pk_watermark_clean

hzxa21 reviewed Jan 21, 2025

View reviewed changes

hzxa21 approved these changes Jan 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): support non_pk_prefix_watermark state cleaning #19889

feat(storage): support non_pk_prefix_watermark state cleaning #19889

Li0k commented Dec 23, 2024 •

edited

Loading

hzxa21 Jan 3, 2025

hzxa21 Jan 3, 2025

Li0k Jan 8, 2025

hzxa21 Jan 3, 2025

Li0k Jan 8, 2025 •

edited

Loading

hzxa21 Jan 3, 2025

hzxa21 Jan 8, 2025

hzxa21 Jan 16, 2025

hzxa21 Jan 8, 2025

hzxa21 Jan 16, 2025

Li0k Jan 20, 2025

hzxa21 Jan 16, 2025

hzxa21 Jan 16, 2025

hzxa21 Jan 16, 2025

Li0k Jan 20, 2025

Li0k Jan 20, 2025

hzxa21 Jan 21, 2025

gru-agent bot commented Jan 20, 2025

hzxa21 Jan 21, 2025

hzxa21 Jan 21, 2025

hzxa21 Jan 21, 2025

hzxa21 Jan 21, 2025

hzxa21 Jan 21, 2025

hzxa21 left a comment

		@@ -42,10 +47,14 @@ pub struct SkipWatermarkIterator<I> {
		}

		impl<I: HummockIterator<Direction = Forward>> SkipWatermarkIterator<I> {

feat(storage): support non_pk_prefix_watermark state cleaning #19889

Are you sure you want to change the base?

feat(storage): support non_pk_prefix_watermark state cleaning #19889

Conversation

Li0k commented Dec 23, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Li0k Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gru-agent bot commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hzxa21 left a comment

Choose a reason for hiding this comment

Li0k commented Dec 23, 2024 •

edited

Loading

Li0k Jan 8, 2025 •

edited

Loading