Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove changing volatile to singleqe for queries with outer quals #1163

Open
wants to merge 6 commits into
base: adb-7.2.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/backend/optimizer/path/allpaths.c
Original file line number Diff line number Diff line change
Expand Up @@ -754,8 +754,9 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
* Greenplum specific behavior:
* Change the path in pathlist if it is a general or segmentgeneral
* path that contains volatile restrictions.
* Do not do it, if there is outer query.
*/
if (rel->reloptkind == RELOPT_BASEREL)
if (rel->upperrestrictinfo == NULL && rel->reloptkind == RELOPT_BASEREL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does empty upperrestrictinfo sign that there is no outer queries?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment above.

Copy link
Collaborator

@Stolb27 Stolb27 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You filtered out cases for bring_to_outer_query from this condition. Why problem manifests itself only in case of not null upperrestrictinfo? What are other cases to execute Segment General only on Single QE?

Function handle_gen_seggen_volatile_path was not removed, because it contains
check, which restricts DML queries on replicated tables with filter, which
contains volatile functions.

It's quite strange to leave complex transformation logic only for exact checks. Do we still needed these transformations?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions handle_gen_seggen_volatile_path, bring_to_outer_query and bring_to_singleQE make the same things, but in the different cases. Function handle_gen_seggen_volatile_path works when locus is general or segment general or replicated and there is volatile function. Function bring_to_outer_query works when locus is not general and not outer query and there is outer quals. Function bring_to_singleQE works when locus is not general and not entry and not single QE and not outer query.
I think removing or union of these functions must be done in the other ticket, which will be complex refactoring of it. My patch just fix double work of these functions, when handle_gen_seggen_volatile_path and bring_to_outer_query work together. It does not needed to execute function handle_gen_seggen_volatile_path if function bring_to_outer_query will be executed (in case of non null upperrestrictinfo).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of some queries, which will be changed if handle_gen_seggen_volatile_path is removed:

create table t_hashdist(a int, b int, c int) distributed by (a);
create table t_replicate_volatile(a int, b int, c int) distributed replicated;
explain (costs off) select * from t_replicate_volatile, t_hashdist where t_replicate_volatile.a > random();

Plan before:

                              QUERY PLAN                              
----------------------------------------------------------------------
 Nested Loop
   ->  Gather Motion 3:1  (slice1; segments: 3)
         ->  Seq Scan on t_hashdist
   ->  Materialize
         ->  Result
               ->  Gather Motion 1:1  (slice2; segments: 1)
                     ->  Seq Scan on t_replicate_volatile
                           Filter: ((a)::double precision > random())
 Optimizer: Postgres query optimizer
(9 rows)

Plan after:

                        QUERY PLAN                        
----------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Nested Loop
         ->  Seq Scan on t_replicate_volatile
               Filter: ((a)::double precision > random())
         ->  Materialize
               ->  Seq Scan on t_hashdist
 Optimizer: Postgres-based planner
(7 rows)

handle_gen_seggen_volatile_path(root, rel);

/*
Expand Down
226 changes: 226 additions & 0 deletions src/test/regress/expected/rpt.out
Original file line number Diff line number Diff line change
Expand Up @@ -958,6 +958,56 @@ explain (costs off) create table rpt_ctas as select a from generate_series(1, 10
Optimizer: Postgres query optimizer
(6 rows)

-- insert into table with serial column
create table t_replicate_dst(id serial, i integer) distributed replicated;
create table t_replicate_src(i integer) distributed replicated;
insert into t_replicate_src select i from generate_series(1, 5) i;
explain (costs off, verbose) insert into t_replicate_dst (i) select i from t_replicate_src;
QUERY PLAN
---------------------------------------------------------------------------------------------
Insert on rpt.t_replicate_dst
-> Broadcast Motion 1:3 (slice1; segments: 1)
Output: ((nextval('t_replicate_dst_id_seq'::regclass))::integer), t_replicate_src.i
-> Seq Scan on rpt.t_replicate_src
Output: nextval('t_replicate_dst_id_seq'::regclass), t_replicate_src.i
Optimizer: Postgres-based planner
Settings: enable_seqscan = 'off'
(7 rows)

explain (costs off, verbose) with s as (select i from t_replicate_src group by i having random() > 0) insert into t_replicate_dst (i) select i from s;
QUERY PLAN
----------------------------------------------------------------------------------------
Insert on rpt.t_replicate_dst
-> Broadcast Motion 1:3 (slice1; segments: 1)
Output: ((nextval('t_replicate_dst_id_seq'::regclass))::integer), "*SELECT*".i
-> Subquery Scan on "*SELECT*"
Output: nextval('t_replicate_dst_id_seq'::regclass), "*SELECT*".i
-> Result
Output: t_replicate_src.i
-> GroupAggregate
Output: t_replicate_src.i
Group Key: t_replicate_src.i
Filter: (random() > '0'::double precision)
-> Sort
Output: t_replicate_src.i
Sort Key: t_replicate_src.i
-> Seq Scan on rpt.t_replicate_src
Output: t_replicate_src.i
Optimizer: Postgres-based planner
Settings: enable_seqscan = 'off'
(18 rows)

insert into t_replicate_dst (i) select i from t_replicate_src;
select distinct id from gp_dist_random('t_replicate_dst') order by id;
id
----
1
2
3
4
5
(5 rows)

-- update & delete
explain (costs off) update t_replicate_volatile set a = 1 where b > random();
ERROR: could not devise a plan (cdbpath.c:2578)
Expand Down Expand Up @@ -1608,6 +1658,182 @@ drop table t1;
drop table t2;
drop table r;
drop table d;
--
-- Check sub-selects with distributed replicated tables and volatile functions
--
create table t (i int) distributed replicated;
create table t1 (a int) distributed by (a);
create table t2 (a int, b float) distributed replicated;
create or replace function f(i int) returns int language sql security definer as $$ select i; $$;
-- ensure we make gather motion when volatile functions in subplan
explain (costs off, verbose) select (select f(i) from t);
QUERY PLAN
-------------------------------------------------------------
Result
Output: $0
InitPlan 1 (returns $0) (slice1)
-> Gather Motion 1:1 (slice2; segments: 1)
Output: (f(t.i))
-> Seq Scan on rpt.t
Output: f(t.i)
Optimizer: Postgres-based planner
Settings: enable_bitmapscan = 'off', enable_seqscan = 'off'
(9 rows)

explain (costs off, verbose) select (select f(i) from t group by f(i));
QUERY PLAN
-------------------------------------------------------------
Result
Output: $0
InitPlan 1 (returns $0) (slice1)
-> Gather Motion 1:1 (slice2; segments: 1)
Output: (f(t.i))
-> GroupAggregate
Output: (f(t.i))
Group Key: (f(t.i))
-> Sort
Output: (f(t.i))
Sort Key: (f(t.i))
-> Seq Scan on rpt.t
Output: f(t.i)
Optimizer: Postgres-based planner
Settings: enable_bitmapscan = 'off', enable_seqscan = 'off'
(15 rows)

explain (costs off, verbose) select (select i from t group by i having f(i) > 0);
QUERY PLAN
-------------------------------------------------------------
Result
Output: $0
InitPlan 1 (returns $0) (slice1)
-> Gather Motion 1:1 (slice2; segments: 1)
Output: t.i
-> GroupAggregate
Output: t.i
Group Key: t.i
Filter: (f(t.i) > 0)
-> Sort
Output: t.i
Sort Key: t.i
-> Seq Scan on rpt.t
Output: t.i
Optimizer: Postgres-based planner
Settings: enable_bitmapscan = 'off', enable_seqscan = 'off'
(16 rows)

-- ensure we do not make broadcast motion
explain (costs off, verbose) select * from t1 where a in (select random() from t where i=a group by i);
QUERY PLAN
-----------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: t1.a
-> Seq Scan on rpt.t1
Output: t1.a
Filter: (SubPlan 1)
SubPlan 1
-> Result
Output: random(), t.i
Filter: (t.i = t1.a)
-> Materialize
Output: t.i
-> Broadcast Motion 1:3 (slice2; segments: 1)
Output: t.i
-> Seq Scan on rpt.t
Output: t.i
Optimizer: Postgres-based planner
Settings: enable_bitmapscan = 'off', enable_seqscan = 'off'
(17 rows)

explain (costs off, verbose) select * from t1 where a in (select random() from t where i=a);
QUERY PLAN
-----------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: t1.a
-> Seq Scan on rpt.t1
Output: t1.a
Filter: (SubPlan 1)
SubPlan 1
-> Result
Output: random()
Filter: (t.i = t1.a)
-> Materialize
Output: t.i
-> Broadcast Motion 1:3 (slice2; segments: 1)
Output: t.i
-> Seq Scan on rpt.t
Output: t.i
Optimizer: Postgres-based planner
Settings: enable_bitmapscan = 'off', enable_seqscan = 'off'
(17 rows)

-- ensure we make broadcast motion when volatile function in deleting motion flow
explain (costs off, verbose) insert into t2 (a, b) select i, random() from t;
QUERY PLAN
-------------------------------------------------------------
Insert on rpt.t2
-> Broadcast Motion 1:3 (slice1; segments: 1)
Output: t.i, (random())
-> Result
Output: t.i, (random())
-> Seq Scan on rpt.t
Output: t.i, random()
Optimizer: Postgres-based planner
Settings: enable_bitmapscan = 'off', enable_seqscan = 'off'
(9 rows)

-- ensure we make broadcast motion when volatile function in correlated subplan qual
explain (costs off, verbose) select * from t1 where a in (select f(i) from t where i=a and f(i) > 0);
QUERY PLAN
-----------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: t1.a
-> Seq Scan on rpt.t1
Output: t1.a
Filter: (SubPlan 1)
SubPlan 1
-> Result
Output: f(t.i)
Stolb27 marked this conversation as resolved.
Show resolved Hide resolved
Filter: (t.i = t1.a)
-> Materialize
Output: t.i
-> Broadcast Motion 1:3 (slice2; segments: 1)
Output: t.i
-> Seq Scan on rpt.t
Output: t.i
Filter: (f(t.i) > 0)
Optimizer: Postgres-based planner
Settings: enable_bitmapscan = 'off', enable_seqscan = 'off'
(18 rows)

-- ensure we do not break broadcast motion
explain (costs off, verbose) select * from t1 where 1 <= ALL (select i from t group by i having random() > 0);
QUERY PLAN
------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: t1.a
-> Seq Scan on rpt.t1
Output: t1.a
Filter: (SubPlan 1)
SubPlan 1
-> Materialize
Output: t.i
-> Broadcast Motion 1:3 (slice2; segments: 1)
Output: t.i
-> GroupAggregate
Output: t.i
Group Key: t.i
Filter: (random() > '0'::double precision)
-> Sort
Output: t.i
Sort Key: t.i
-> Seq Scan on rpt.t
Output: t.i
Optimizer: Postgres-based planner
Settings: enable_bitmapscan = 'off', enable_seqscan = 'off'
(21 rows)

drop table if exists t, t1, t2;
drop function if exists f(i int);
-- start_ignore
drop schema rpt cascade;
NOTICE: drop cascades to 7 other objects
Expand Down
Loading