Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-4603: Allow multiple params in the child nodes of <dxl:TestExpr> #1140

Open
wants to merge 10 commits into
base: adb-6.x-dev
Choose a base branch
from

Conversation

mos65o2
Copy link

@mos65o2 mos65o2 commented Dec 3, 2024

Allow multiple params in the child nodes of <dxl:TestExpr> (#1140)

Problem definition:
Some subqueries (example in the test) could output error "Attribute number
... not found in project list". This happened in the DXL to Plan Statement
stage because the attribute was not found in the list of column references
(CMappingColIdVar) when processing the left node of <dxl:TestExpr>.
<dxl:TestExpr> is part of the <dxl:SubPlan> node and the TestExpr acts
as a filter when a tuple is returned from SubPlan at execution
(see ExecScanSubPlan()).
<dxl:TestExpr> is processed by TranslateDXLSubplanTestExprToScalar(). The
left node was processed depending on the flag "has_outer_refs". If it was true,
the TranslateDXLTestExprScalarIdentToExpr() was called to read one <dxl:Ident>
and then convert it to a param. If it was false, the function of handling the
DXL tree was called without taking the param into account. In our case, the
"has_outer_refs" flag was false.
"has_outer_refs" was defined at the DXL generation stage in
PdxlnQuantifiedSubplan(). The params of TestExpr are the common columns of
the scalar and the inner part of Correlated Join node (see the physical plan).
In fact, the flag determined the presence of one param in the left node of
TestExpr. As it turned out, the left node can contain more than one param,
so the defenition of "has_outer_refs" was fixed.
However, another error occurred - "Unsupported subplan test expression"
since the left node contained a tree, but
TranslateDXLSubplanTestExprToScalar() only supported id and cast(id).
Therefore, a tree processing function was used to process the left node of
TestExpr. But in this case, it is impossible to get the TestExpr param
(it is read directly from node). Therefore, a list of params was used instead
of the "has_outer_refs" flag and params were pre-added to the list of column
references(CMappingColIdVar).

Problem summary:
The <dxl:TestExpr> handler could output errors:
"Attribute number ... not found in project list" and "Unsupported subplan
test expression".
The first error occurred if a reference to an attribute processed as a TestExpr
param (don't confuse with subplan params <dxl:ParamList>) was not found. This
occurred due to incorrect detection of the presence of TestExpr params
(m_outer_param flag) at the DXL generation stage in the
PdxlnQuantifiedSubplan function and the lack of adding TestExpr params to the
list of references to attributes (CMappingColIdVar) at the <dxl:TestExpr>
transforming stage.
The second error occurred if the left node dxl:TestExpr contained a tree with
the TestExpr param, since only id and cast(id) were supported (the param was
read directly from the node).
See the minidump for more information, but keep in mind that the TestExpr param
presence flag (m_outer_param) is not serialized.
Support for one param in the left node TestExpr was added in ac6ce1a.

Possible solutions:
The "Unsupported subplan test expression" error is fixed by using the
TranslateDXLToScalar function, which allows processing the tree in the child
nodes of TestExpr (mostly the left node). But in this case, it is impossible
to get the TestExpr param (currently it is read directly from node). It is
impossible to determine whether this is a param or a Var (there are no
markers indicating this), if there is more than one attribute in the
tree - this will cause the error "Attribute number ... not found in project
list". Therefore, the TestExpr params must be obtained in a separate structure
and added in advance to the list of references to attributes
(CMappingColIdVar). Defining the TestExpr params and adding them to the
CMappingColIdVar before handling DXL fixes the "Attribute number ... not
found in project list" error, but since the left node may contain a tree,
this will cause the "Unsupported subplan test expression" error. Therefore,
these errors must be fixed simultaneously.

Solution:

  1. Define the necessary columns as TestExpr params at the DXL generating
    stage;
  2. Add references to attributes processed as TestExpr params at the DXL to
    Plan Statment stage (prefilling CMappingColIdVar);
  3. Add support for processing a tree with params in the left node TestExpr.

Changes:
At the DXL generation stage (in the PdxlnQuantifiedSubplan function),
the definition of the TestExpr params for PhysicalCorrelated***NLJoin
nodes has been unified. Instead of the "has_outer_refs" flag, the presence
of TestExpr params is now determined by the array of columns -
m_test_expr_params (passed via the CDXLScalarSubPlan class, not serialized).
At the DXL to Plan Statement stage in the subplan handler, a copy of the
mapping (list of references to columns) of the subplan is created to
handle <dxl:TestExpr>, including the TestExpr params (obtained from
m_test_expr_params). We use the copy because we do not want the TestExpr
params to be mixed with the subplan params.
In the <dxl:TestExpr> handler, the TranslateDXLToScalar function is
used instead of TranslateDXLTestExprScalarIdentToExpr for the left node,
which allows processing trees.
For the right node <dxl:TestExpr>, TranslateDXLToScalar is also used for
unification purposes. The TranslateDXLTestExprScalarIdentToExpr function
has been removed as unnecessary.

@mos65o2 mos65o2 changed the title ADBDEV-4603: Fix attribute not found in project list subquery with ORCA ADBDEV-4603: Fix ORCA attribute not found in project list subquery Dec 3, 2024
@arenadata arenadata deleted a comment from BenderArenadata Dec 4, 2024
Problem:
Subqueries using test expr may result in an "attribute not found in the
projection list" error.

Solution:
Define the necessary parameters and support nested processing of the test
expr tree.

Changes:
Adds parameter definition for test expression in exprToDXL step for
CorrelatedInLeftSemiNLJoin. Reworked test expr handler in DXLtoPlStmt step.
@mos65o2 mos65o2 changed the title ADBDEV-4603: Fix ORCA attribute not found in project list subquery ADBDEV-4603: Fix ORCA attribute not found in project list Dec 25, 2024
@arenadata arenadata deleted a comment from BenderArenadata Dec 25, 2024
@arenadata arenadata deleted a comment from BenderArenadata Dec 25, 2024
@arenadata arenadata deleted a comment from BenderArenadata Dec 25, 2024
@mos65o2 mos65o2 marked this pull request as ready for review December 26, 2024 14:14
Added a separate function to get projected columns from the Result node.
PdxlRestictRestul returned to its original state.
@mos65o2 mos65o2 marked this pull request as draft December 27, 2024 17:23
@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/90283

@arenadata arenadata deleted a comment from BenderArenadata Dec 28, 2024
@arenadata arenadata deleted a comment from BenderArenadata Dec 28, 2024
@mos65o2 mos65o2 marked this pull request as ready for review December 28, 2024 10:32

const CDXLColRefArray *outer_refs = dxlop->GetDxlOuterColRefsArray();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated line removed

@@ -2020,6 +1962,7 @@ CTranslatorDXLToScalar::TranslateDXLScalarIdentToScalar(
dxlop->GetDXLColRef()->Id()))
{
// not an outer ref -> Translate var node

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated line added

@@ -3089,7 +3089,7 @@ CTranslatorExprToDXL::PdxlnRestrictResult(CDXLNode *dxlnode,
//
// @doc:
// Helper to build a Result expression with project list
// restricted to required columns
// restricted to required columns.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated comment changed

@@ -61,20 +62,25 @@ class CDXLScalarSubPlan : public CDXLScalar
EdxlSubPlanType m_dxl_subplan_type;

// test expression -- not null if quantified/existential subplan

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated line added


// private copy ctor
CDXLScalarSubPlan(CDXLScalarSubPlan &);

public:
// ctor/dtor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated line added

@@ -614,3 +614,32 @@ drop table tl1;
drop table tl2;
drop table tl3;
drop table tl4;

-- Check deep tree support in Test expression node when outer params are present with =ANY or IN queries

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment mentions =ANY queries, but only IN queries are actually tested. Maybe add tests for =ANY?

ANALYZE test_tbl_t;

EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM test_tbl_t WHERE
(int_t IN (SELECT int_d FROM test_tbl_d)) IN (SELECT int_p = 1 from test_tbl_p) ORDER BY int_t;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this second part of the test? It produces a slightly different plan without the patch, but the plan seems valid and no GPORCA errors are present.

                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Sort
   Output: test_tbl_t.text_t, test_tbl_t.int_t
   Sort Key: test_tbl_t.int_t
   ->  Result
         Output: test_tbl_t.text_t, test_tbl_t.int_t
         Filter: (SubPlan 2)
         ->  Gather Motion 3:1  (slice1; segments: 3)
               Output: test_tbl_t.text_t, test_tbl_t.int_t
               ->  Seq Scan on public.test_tbl_t
                     Output: test_tbl_t.text_t, test_tbl_t.int_t
         SubPlan 2
           ->  Result
                 Output: true
                 Filter: (((hashed SubPlan 1)) = ((test_tbl_p.int_p = 1)))
                 ->  Result
                       Output: (test_tbl_p.int_p = 1), (hashed SubPlan 1)
                       ->  Materialize
                             Output: test_tbl_p.int_p
                             ->  Gather Motion 3:1  (slice2; segments: 3)
                                   Output: test_tbl_p.int_p
                                   ->  Seq Scan on public.test_tbl_p
                                         Output: test_tbl_p.int_p
                       SubPlan 1
                         ->  Result
                               Output: test_tbl_d.int_d
                               ->  Materialize
                                     Output: test_tbl_d.int_d
                                     ->  Gather Motion 3:1  (slice3; segments: 3)
                                           Output: test_tbl_d.int_d
                                           ->  Seq Scan on public.test_tbl_d
                                                 Output: test_tbl_d.int_d
 Optimizer: GPORCA
(32 rows)

CREATE TABLE test_tbl_p (text_p text, int_p int) DISTRIBUTED BY (int_p);
INSERT INTO test_tbl_p VALUES ('0', 0), ('0', 1);

EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM test_tbl_t WHERE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear that falling back to Postgres optimizer is a fail condition, maybe add a comment that GPORCA should be able to handle this successfully? Or use set client_min_messages = log the way it's used in the ticket, so that the error is clearly visible in the diff

@andr-sokolov
Copy link
Member

Typo: "physical plane"

@mos65o2 mos65o2 changed the title ADBDEV-4603: Fix ORCA attribute not found in project list ADBDEV-4603: Allow multiple params in the child nodes <dxl:TestExpr> Jan 22, 2025
@mos65o2 mos65o2 changed the title ADBDEV-4603: Allow multiple params in the child nodes <dxl:TestExpr> ADBDEV-4603: Allow multiple params in the child nodes of <dxl:TestExpr> Jan 30, 2025
Removed lines;
Added "optimizer_trace_fallback" in test;
Changed test output.
Previously, the projection of the internal node was redefined only for
EopPhysicalCorrelatedLeftOuterNLJoin, but not for
EopPhysicalCorrelatedInLeftSemiNLJoin and
EopPhysicalCorrelatedNotInLeftAntiSemiNLJoin. This was due (most likely)
to the fact that only for EopPhysicalCorrelatedLeftOuterNLJoin the
<dxl:TestExpr> handler could correctly process the left node of
<dxl:TestExpr> with one parameter.
Since the TestExpr handler can now support trees in the left node with
multiple parameters, the projection can be unified for the rest of the CorrelatedNLJoins.
The logic for determining the projection of the columns of the internal node remains the
same. Assert with the dxl_subplan_type definition remains the same.
@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/92385

@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/92429

@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/92502

@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/92514

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants