-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADBDEV-4603: Allow multiple params in the child nodes of <dxl:TestExpr> #1140
base: adb-6.x-dev
Are you sure you want to change the base?
Conversation
Problem: Subqueries using test expr may result in an "attribute not found in the projection list" error. Solution: Define the necessary parameters and support nested processing of the test expr tree. Changes: Adds parameter definition for test expression in exprToDXL step for CorrelatedInLeftSemiNLJoin. Reworked test expr handler in DXLtoPlStmt step.
Added a separate function to get projected columns from the Result node. PdxlRestictRestul returned to its original state.
Allure report https://allure.adsw.io/launch/90283 |
|
||
const CDXLColRefArray *outer_refs = dxlop->GetDxlOuterColRefsArray(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated line removed
@@ -2020,6 +1962,7 @@ CTranslatorDXLToScalar::TranslateDXLScalarIdentToScalar( | |||
dxlop->GetDXLColRef()->Id())) | |||
{ | |||
// not an outer ref -> Translate var node | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated line added
@@ -3089,7 +3089,7 @@ CTranslatorExprToDXL::PdxlnRestrictResult(CDXLNode *dxlnode, | |||
// | |||
// @doc: | |||
// Helper to build a Result expression with project list | |||
// restricted to required columns | |||
// restricted to required columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated comment changed
@@ -61,20 +62,25 @@ class CDXLScalarSubPlan : public CDXLScalar | |||
EdxlSubPlanType m_dxl_subplan_type; | |||
|
|||
// test expression -- not null if quantified/existential subplan | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated line added
|
||
// private copy ctor | ||
CDXLScalarSubPlan(CDXLScalarSubPlan &); | ||
|
||
public: | ||
// ctor/dtor | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated line added
src/test/regress/sql/subselect.sql
Outdated
@@ -614,3 +614,32 @@ drop table tl1; | |||
drop table tl2; | |||
drop table tl3; | |||
drop table tl4; | |||
|
|||
-- Check deep tree support in Test expression node when outer params are present with =ANY or IN queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment mentions =ANY queries, but only IN queries are actually tested. Maybe add tests for =ANY?
src/test/regress/sql/subselect.sql
Outdated
ANALYZE test_tbl_t; | ||
|
||
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM test_tbl_t WHERE | ||
(int_t IN (SELECT int_d FROM test_tbl_d)) IN (SELECT int_p = 1 from test_tbl_p) ORDER BY int_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this second part of the test? It produces a slightly different plan without the patch, but the plan seems valid and no GPORCA errors are present.
QUERY PLAN
----------------------------------------------------------------------------------
Sort
Output: test_tbl_t.text_t, test_tbl_t.int_t
Sort Key: test_tbl_t.int_t
-> Result
Output: test_tbl_t.text_t, test_tbl_t.int_t
Filter: (SubPlan 2)
-> Gather Motion 3:1 (slice1; segments: 3)
Output: test_tbl_t.text_t, test_tbl_t.int_t
-> Seq Scan on public.test_tbl_t
Output: test_tbl_t.text_t, test_tbl_t.int_t
SubPlan 2
-> Result
Output: true
Filter: (((hashed SubPlan 1)) = ((test_tbl_p.int_p = 1)))
-> Result
Output: (test_tbl_p.int_p = 1), (hashed SubPlan 1)
-> Materialize
Output: test_tbl_p.int_p
-> Gather Motion 3:1 (slice2; segments: 3)
Output: test_tbl_p.int_p
-> Seq Scan on public.test_tbl_p
Output: test_tbl_p.int_p
SubPlan 1
-> Result
Output: test_tbl_d.int_d
-> Materialize
Output: test_tbl_d.int_d
-> Gather Motion 3:1 (slice3; segments: 3)
Output: test_tbl_d.int_d
-> Seq Scan on public.test_tbl_d
Output: test_tbl_d.int_d
Optimizer: GPORCA
(32 rows)
CREATE TABLE test_tbl_p (text_p text, int_p int) DISTRIBUTED BY (int_p); | ||
INSERT INTO test_tbl_p VALUES ('0', 0), ('0', 1); | ||
|
||
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM test_tbl_t WHERE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear that falling back to Postgres optimizer is a fail condition, maybe add a comment that GPORCA should be able to handle this successfully? Or use set client_min_messages = log
the way it's used in the ticket, so that the error is clearly visible in the diff
Typo: "physical plane" |
Removed lines; Added "optimizer_trace_fallback" in test; Changed test output.
Previously, the projection of the internal node was redefined only for EopPhysicalCorrelatedLeftOuterNLJoin, but not for EopPhysicalCorrelatedInLeftSemiNLJoin and EopPhysicalCorrelatedNotInLeftAntiSemiNLJoin. This was due (most likely) to the fact that only for EopPhysicalCorrelatedLeftOuterNLJoin the <dxl:TestExpr> handler could correctly process the left node of <dxl:TestExpr> with one parameter. Since the TestExpr handler can now support trees in the left node with multiple parameters, the projection can be unified for the rest of the CorrelatedNLJoins. The logic for determining the projection of the columns of the internal node remains the same. Assert with the dxl_subplan_type definition remains the same.
Allure report https://allure.adsw.io/launch/92385 |
Allure report https://allure.adsw.io/launch/92429 |
Allure report https://allure.adsw.io/launch/92502 |
Allure report https://allure.adsw.io/launch/92514 |
Allow multiple params in the child nodes of <dxl:TestExpr> (#1140)
Problem definition:
Some subqueries (example in the test) could output error "Attribute number
... not found in project list". This happened in the DXL to Plan Statement
stage because the attribute was not found in the list of column references
(CMappingColIdVar) when processing the left node of <dxl:TestExpr>.
<dxl:TestExpr> is part of the <dxl:SubPlan> node and the TestExpr acts
as a filter when a tuple is returned from SubPlan at execution
(see ExecScanSubPlan()).
<dxl:TestExpr> is processed by TranslateDXLSubplanTestExprToScalar(). The
left node was processed depending on the flag "has_outer_refs". If it was true,
the TranslateDXLTestExprScalarIdentToExpr() was called to read one <dxl:Ident>
and then convert it to a param. If it was false, the function of handling the
DXL tree was called without taking the param into account. In our case, the
"has_outer_refs" flag was false.
"has_outer_refs" was defined at the DXL generation stage in
PdxlnQuantifiedSubplan(). The params of TestExpr are the common columns of
the scalar and the inner part of Correlated Join node (see the physical plan).
In fact, the flag determined the presence of one param in the left node of
TestExpr. As it turned out, the left node can contain more than one param,
so the defenition of "has_outer_refs" was fixed.
However, another error occurred - "Unsupported subplan test expression"
since the left node contained a tree, but
TranslateDXLSubplanTestExprToScalar() only supported id and cast(id).
Therefore, a tree processing function was used to process the left node of
TestExpr. But in this case, it is impossible to get the TestExpr param
(it is read directly from node). Therefore, a list of params was used instead
of the "has_outer_refs" flag and params were pre-added to the list of column
references(CMappingColIdVar).
Problem summary:
The <dxl:TestExpr> handler could output errors:
"Attribute number ... not found in project list" and "Unsupported subplan
test expression".
The first error occurred if a reference to an attribute processed as a TestExpr
param (don't confuse with subplan params <dxl:ParamList>) was not found. This
occurred due to incorrect detection of the presence of TestExpr params
(m_outer_param flag) at the DXL generation stage in the
PdxlnQuantifiedSubplan function and the lack of adding TestExpr params to the
list of references to attributes (CMappingColIdVar) at the <dxl:TestExpr>
transforming stage.
The second error occurred if the left node dxl:TestExpr contained a tree with
the TestExpr param, since only id and cast(id) were supported (the param was
read directly from the node).
See the minidump for more information, but keep in mind that the TestExpr param
presence flag (m_outer_param) is not serialized.
Support for one param in the left node TestExpr was added in ac6ce1a.
Possible solutions:
The "Unsupported subplan test expression" error is fixed by using the
TranslateDXLToScalar function, which allows processing the tree in the child
nodes of TestExpr (mostly the left node). But in this case, it is impossible
to get the TestExpr param (currently it is read directly from node). It is
impossible to determine whether this is a param or a Var (there are no
markers indicating this), if there is more than one attribute in the
tree - this will cause the error "Attribute number ... not found in project
list". Therefore, the TestExpr params must be obtained in a separate structure
and added in advance to the list of references to attributes
(CMappingColIdVar). Defining the TestExpr params and adding them to the
CMappingColIdVar before handling DXL fixes the "Attribute number ... not
found in project list" error, but since the left node may contain a tree,
this will cause the "Unsupported subplan test expression" error. Therefore,
these errors must be fixed simultaneously.
Solution:
stage;
Plan Statment stage (prefilling CMappingColIdVar);
Changes:
At the DXL generation stage (in the PdxlnQuantifiedSubplan function),
the definition of the TestExpr params for PhysicalCorrelated***NLJoin
nodes has been unified. Instead of the "has_outer_refs" flag, the presence
of TestExpr params is now determined by the array of columns -
m_test_expr_params (passed via the CDXLScalarSubPlan class, not serialized).
At the DXL to Plan Statement stage in the subplan handler, a copy of the
mapping (list of references to columns) of the subplan is created to
handle <dxl:TestExpr>, including the TestExpr params (obtained from
m_test_expr_params). We use the copy because we do not want the TestExpr
params to be mixed with the subplan params.
In the <dxl:TestExpr> handler, the TranslateDXLToScalar function is
used instead of TranslateDXLTestExprScalarIdentToExpr for the left node,
which allows processing trees.
For the right node <dxl:TestExpr>, TranslateDXLToScalar is also used for
unification purposes. The TranslateDXLTestExprScalarIdentToExpr function
has been removed as unnecessary.