Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Support compression factor for percentile_approx #54115

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

INNOCENT-BOY
Copy link
Contributor

@INNOCENT-BOY INNOCENT-BOY commented Dec 19, 2024

Why I'm doing:

I found that the runtime of compression_factor in percentile_approx hardly changes.

What I'm doing:

According to the official documentation, we can see that the percentile_approx function allows adjusting the compression. However, during actual debugging, I found that the compression remains fixed at 1000. Therefore, I implemented the content described in the function's documentation.

image

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

@INNOCENT-BOY INNOCENT-BOY requested review from a team as code owners December 19, 2024 08:26
@CLAassistant
Copy link

CLAassistant commented Dec 19, 2024

CLA assistant check
All committers have signed the CLA.

@INNOCENT-BOY INNOCENT-BOY changed the title support compression factor for percentile_approx [BugFix] Support compression factor for percentile_approx Dec 19, 2024
be/src/util/tdigest.h Outdated Show resolved Hide resolved
stdpain
stdpain previously approved these changes Dec 23, 2024
@satanson
Copy link
Contributor

@INNOCENT-BOY add some tests in test/sql/test_agg_function/T/test_percentile_approx

@kevincai
Copy link
Contributor

@INNOCENT-BOY clang-format check fails, please fix the code style according to the check.

@INNOCENT-BOY
Copy link
Contributor Author

\

@INNOCENT-BOY clang-format check fails, please fix the code style according to the check.

I have added some test sql in tes_percentile_approx. Please check again.

@INNOCENT-BOY
Copy link
Contributor Author

Hi @satanson , please review this pr again when you are free. Thanks in advance.

@stdpain
Copy link
Contributor

stdpain commented Dec 25, 2024

@INNOCENT-BOY
CI clang-format failed

/home/runner/_work/starrocks/starrocks/be/src/util/percentile_value.h had clang-format style issues
--- /home/runner/_work/starrocks/starrocks/be/src/util/percentile_value.h
/home/runner/_work/starrocks/starrocks/be/src/exprs/agg/percentile_approx.h had clang-format style issues
+++ /home/runner/_work/starrocks/starrocks/be/src/util/percentile_value.h (after clang format)
@@ -21,[9](https://github.com/StarRocks/starrocks/actions/runs/12490410567/job/34855175667?pr=54115#step:6:10) +21,7 @@
 public:
     PercentileValue() { _type = TDIGEST; }
 
-    explicit PercentileValue(double compression) : _tdigest(compression) {
-        _type = TDIGEST;
-    }
+    explicit PercentileValue(double compression) : _tdigest(compression) { _type = TDIGEST; }
 
     explicit PercentileValue(const Slice& src) {
         switch (*src.data) {

--- /home/runner/_work/starrocks/starrocks/be/src/exprs/agg/percentile_approx.h
+++ /home/runner/_work/starrocks/starrocks/be/src/exprs/agg/percentile_approx.h (after clang format)
@@ -59,7 +59,7 @@
             compression = ColumnHelper::get_const_value<TYPE_DOUBLE>(ctx->get_constant_column(2));
             if (compression < MIN_COMPRESSION || compression > MAX_COMPRESSION) {
                 LOG(WARNING) << "Compression factor out of range. Using default compression factor: "
-                << DEFAULT_COMPRESSION_FACTOR;
+                             << DEFAULT_COMPRESSION_FACTOR;
                 compression = DEFAULT_COMPRESSION_FACTOR;
             }
         }

@INNOCENT-BOY
Copy link
Contributor Author

@INNOCENT-BOY CI clang-format failed

/home/runner/_work/starrocks/starrocks/be/src/util/percentile_value.h had clang-format style issues
--- /home/runner/_work/starrocks/starrocks/be/src/util/percentile_value.h
/home/runner/_work/starrocks/starrocks/be/src/exprs/agg/percentile_approx.h had clang-format style issues
+++ /home/runner/_work/starrocks/starrocks/be/src/util/percentile_value.h (after clang format)
@@ -21,[9](https://github.com/StarRocks/starrocks/actions/runs/12490410567/job/34855175667?pr=54115#step:6:10) +21,7 @@
 public:
     PercentileValue() { _type = TDIGEST; }
 
-    explicit PercentileValue(double compression) : _tdigest(compression) {
-        _type = TDIGEST;
-    }
+    explicit PercentileValue(double compression) : _tdigest(compression) { _type = TDIGEST; }
 
     explicit PercentileValue(const Slice& src) {
         switch (*src.data) {

--- /home/runner/_work/starrocks/starrocks/be/src/exprs/agg/percentile_approx.h
+++ /home/runner/_work/starrocks/starrocks/be/src/exprs/agg/percentile_approx.h (after clang format)
@@ -59,7 +59,7 @@
             compression = ColumnHelper::get_const_value<TYPE_DOUBLE>(ctx->get_constant_column(2));
             if (compression < MIN_COMPRESSION || compression > MAX_COMPRESSION) {
                 LOG(WARNING) << "Compression factor out of range. Using default compression factor: "
-                << DEFAULT_COMPRESSION_FACTOR;
+                             << DEFAULT_COMPRESSION_FACTOR;
                 compression = DEFAULT_COMPRESSION_FACTOR;
             }
         }

Hi @stdpain , I have fixed format issue, please check again.

@Seaven
Copy link
Contributor

Seaven commented Dec 25, 2024

main RELEASE (build 5a005563a0)
query_id:00000000-0000-0000-0000-000000000000, fragment_instance:00000000-0000-0000-0000-000000000000
tracker:process consumption: 612340656
tracker:jemalloc_metadata consumption: 34296544
tracker:query_pool consumption: 11065688
tracker:query_pool/connector_scan consumption: 0
tracker:load consumption: 20928
tracker:metadata consumption: 6475282
tracker:tablet_metadata consumption: 2179810
tracker:rowset_metadata consumption: 1765131
tracker:segment_metadata consumption: 461493
tracker:column_metadata consumption: 2068848
tracker:tablet_schema consumption: 169610
tracker:segment_zonemap consumption: 300278
tracker:short_key_index consumption: 15053
tracker:column_zonemap_index consumption: 534336
tracker:ordinal_index consumption: 386944
tracker:bitmap_index consumption: 0
tracker:bloom_filter_index consumption: 0
tracker:compaction consumption: 0
tracker:schema_change consumption: 0
tracker:page_cache consumption: 1687856
tracker:jit_cache consumption: 60480
tracker:update consumption: 244859
tracker:chunk_allocator consumption: 0
tracker:passthrough consumption: 0
tracker:clone consumption: 0
tracker:consistency consumption: 0
tracker:datacache consumption: 0
tracker:poco_connection_pool consumption: 769208
tracker:replication consumption: 0
*** Aborted at 1735121553 (unix time) try "date -d @1735121553" if you are using GNU date ***
PC: @          0x6e6ca56 starrocks::PercentileApproxAggregateFunction::create(starrocks::FunctionContext*, unsigned char*) const
*** SIGSEGV (@0x10) received by PID 7303 (TID 0x7f0c504b0640) LWP(41142) from PID 16; stack trace: ***
    @     0x7f0d9efe4ee8 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x99ee7)
    @          0xfaccc59 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0x7f0d9fec0486 os::Linux::chained_handler(int, siginfo_t*, void*)
    @     0x7f0d9fec678b JVM_handle_linux_signal
    @     0x7f0d9feb878c signalHandler(int, siginfo_t*, void*)
    @     0x7f0d9ef8d520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x4251f)
    @          0x6e6ca56 starrocks::PercentileApproxAggregateFunction::create(starrocks::FunctionContext*, unsigned char*) const
    @          0x6e45941 starrocks::AggStateUnion::create(starrocks::FunctionContext*, unsigned char*) const
    @          0x6e4e738 starrocks::AggFuncBasedValueAggregator::AggFuncBasedValueAggregator(starrocks::AggStateDesc*, std::unique_ptr<starrocks::AggregateFunction, std::default_delete<starrocks::AggregateFunction> >)
    @          0x6e42aac starrocks::ColumnAggregatorFactory::create_value_column_aggregator(std::shared_ptr<starrocks::Field> const&)
    @          0x6e3eaee starrocks::ChunkAggregator::ChunkAggregator(starrocks::Schema const*, unsigned int, unsigned int, double, bool, bool)
    @          0x6e3ec6c starrocks::ChunkAggregator::ChunkAggregator(starrocks::Schema const*, unsigned int, unsigned int, double)
    @          0x8c6de0a starrocks::MemTable::_init_aggregator_if_needed()
    @          0x8c6e32d starrocks::MemTable::MemTable(long, starrocks::Schema const*, std::vector<starrocks::SlotDescriptor*, std::allocator<starrocks::SlotDescriptor*> > const*, starrocks::MemTableSink*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<cha��
    @          0x8d46451 starrocks::DeltaWriter::_reset_mem_table()
    @          0x8d49279 starrocks::DeltaWriter::write(starrocks::Chunk const&, unsigned int const*, unsigned int, unsigned int)
    @          0xb00caf4 starrocks::AsyncDeltaWriter::_execute(void*, bthread::TaskIterator<starrocks::AsyncDeltaWriter::Task>&)
    @          0xb00f839 bthread::ExecutionQueue<starrocks::AsyncDeltaWriter::Task>::execute_task(void*, void*, bthread::TaskIteratorBase&)
    @          0xfccde8f bthread::ExecutionQueueBase::_execute(bthread::TaskNode*, bool, int*)
    @          0xfccefbb bthread::ExecutionQueueBase::_execute_tasks(void*)
    @          0xb2b75ea std::_Function_handler<void (), starrocks::bthreads::ThreadPoolExecutor::submit(void* (*)(void*), void*)::{lambda()#1}>::_M_invoke(std::_Any_data const&)
    @          0xb41e238 starrocks::ThreadPool::dispatch_thread()
    @          0xb41f4f5 std::_Function_handler<void (), std::_Bind<void (starrocks::ThreadPool::*(starrocks::ThreadPool*))()> >::_M_invoke(std::_Any_data const&)
    @          0xb41340b starrocks::Thread::supervise_thread(void*)
    @     0x7f0d9efdfac3 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x94ac2)
    @     0x7f0d9f071850 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x12684f)

@LiShuMing be crash in MV cases, help plz

LiShuMing
LiShuMing previously approved these changes Dec 27, 2024
public:
void create(FunctionContext* ctx, AggDataPtr __restrict ptr) const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give a default compression if ctx is null because agg_state in common aggregate state it may be null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@INNOCENT-BOY
Copy link
Contributor Author

Hi @kevincai @LiShuMing @stdpain , please help review this pr again.

@stdpain
Copy link
Contributor

stdpain commented Dec 27, 2024

@INNOCENT-BOY check the test case: test_agg_state_table_with_all_functions

@LiShuMing
Copy link
Contributor

@mergify rebase

Copy link
Contributor

mergify bot commented Dec 31, 2024

rebase

✅ Branch has been successfully rebased

@LiShuMing
Copy link
Contributor

I found a bug in original sql test, you can rebase the code after pr(#54556) is merged.

image

@LiShuMing
Copy link
Contributor

@mergify rebase

Copy link
Contributor

mergify bot commented Dec 31, 2024

rebase

✅ Branch has been successfully rebased

@INNOCENT-BOY
Copy link
Contributor Author

Hi @LiShuMing @satanson ,What do I need to do to get this PR merged?

Copy link

github-actions bot commented Jan 7, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Jan 7, 2025

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Jan 7, 2025

[BE Incremental Coverage Report]

pass : 13 / 16 (81.25%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/exprs/agg/percentile_approx.h 12 15 80.00% [58, 59, 60]
🔵 be/src/util/percentile_value.h 1 1 100.00% []

@Seaven Seaven enabled auto-merge (squash) January 7, 2025 12:45
@Seaven Seaven merged commit 2b225a7 into StarRocks:main Jan 7, 2025
43 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants