forked from greenplum-db/gpdb-archive
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT - Move Orca to a separate Postgres extension #1111
Draft
whitehawk
wants to merge
26
commits into
adb-7.2.0
Choose a base branch
from
feature/ADBDEV-6552
base: adb-7.2.0
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch adds a skeleton for a new extension gpcontrib/gp_orca with empty _PG_init and _PG_fini. The extension will be filled in consequent commits. Plus this patch updates 'shared_preload_libraries' value in the postgresql.conf.sample file if ORCA is enabled.
GPMemoryProtect_TrackStartupMemory() contained compile-time dependency on Orca. It prevented moving of all Orca's code into a shared lib and making its work transparent to the GPDB core. This patch reworks the logic of the GPMemoryProtect_TrackStartupMemory(). Now extension should call GPMemoryProtect_RequestAddinStartupMemory() at its _PG_init(), if the extension affects the per-process startup committed memory. Such call is added to the Orca's _PG_init(). Orca related code is removed from the GPMemoryProtect_TrackStartupMemory().
Problem: Orca's code was strongly coupled with the standard_planner() code. It introduced difficulties for moving Orca's functionality into a shared lib. This patch: 1. moves the invocation of Orca planner out from the standard_planner() into a hook in gp_orca shared lib; 2. moves the init procedure of Orca to the first call of the hook (thus the init is done when the backend has already been initialized including its memory protection means); 3. moves the deinit procedure of Orca into a callback, that is registered at the init (thus the deinit is done right before ShutdownPostgres() is called, where it was before). 4. adds clang-format config file
Problem: Explain component had compile-time dependencies on Orca. It introduced difficulties for moving Orca's functionality into a shared lib. This patch: 1. Adds explain hook implementation to the gp_orca extension. It handles DXL output format of explain command. 2. Removes all Orca related code from explain component in the GPDB core. 3. Adds function standard_ExplainOneQuery() that now contains the default explain code (thus it can be invoked from the hook). 4. Removes 'planGen' field from 'PlannedStmt' structure. Its type was enum, and it had only 2 values: Planner and Optimizer (for Postgres planner and Orca planner respectively). It was not suitable for a general planner plugged in from an extension. 5. Adds 'plannerName' field to 'PlannedStmt' structure. It is now used to output the name of the external planner, if it is used. 6. Updates 'compute_jit_flags()' due to changes above. Now gp_orca extension defines its own implementation for computing JIT parameters. Orca related logic for JIT parameters is removed from the core. 7. Adds new ABI ignore file, as the 'PlannedStmt' structure is updated.
…ce (#1126) Problem: Functions in gp_optimizer_functions.c had a compile-time dependency on Orca. It introduced difficulties for moving Orca's functionality into a shared lib. This patch: 1. Moves implementation of 'DisableXform()', 'EnableXform()', 'LibraryVersion()' into gp_orca shared lib. 2. Updates implementation of wrapper functions 'enable_xform()', 'disable_xform()', 'gp_opt_version()'. Now they try to load the underlying functions from a shared lib. Attempt to load a symbol from a shared lib is wrapped into PG_TRY & PG_CATCH (with all necessary shenanigans), as the lib may be missing. In this case we catch the error, and show an appropriate message. 3. Updates tests due to change of the message if Orca is not enabled. Note: we do not move the wrapper's implementation into gp_orca extension, as they are already a part of the system catalog. And we'd like to avoid changing the system catalog.
Problem: File 'guc_gp.c' contained a compile-time dependency on Orca, related to 'optimizer' GUC. It prevented moving of all Orca's code into a shared lib and making its work transparent to the GPDB core. This patch: 1. Sets the default value for 'optimizer' GUC to 'false'. 2. Adds enabling of 'optimizer' GUC in gp_orca extension shared lib. 3. Updates check in 'check_optimizer()' function. Now it utilizes runtime check of planner_hook presence to distinguish if an external planner is available or not. Note: the semantic of 'optimizer' GUC has changed. Previously it was used solely to enable & disable Orca. Now it is used to enable & disable any general external planner. The name of GUC hasn't been changed, because too many places in code and infrastructure rely on it, and we aim not to blow up the size of the patch.
Decouple ORCA from the pg_hint_plan extension. This change moves the definition of plan_hint_hook from ORCA, which is going to be an extension to the planner. Although this hook is currently used only in ORCA, it makes it possible to use it with any planner that needs to get a hint list from an extension, such as pg_hint_plan. Additionally, we make this hook, plan_hint_hook, typed and return a pointer to HintState, as it actually does. There is no reason to keep it generic and returning void *.
…1135) Problem: If a session was started with disabled 'optimizer' GUC, and EnableXform() or DisableXform() was called from a query, the query failed with a crash. The issue was found on 'rowhints' test. Cause: The functions mentioned above tried to access internal Orca's structures, that were not initialized, as the init was done during first planning only if 'optimizer' was on. Fix: Init Orca when planner is invoked first time regardless the value of 'optimizer'. Note: One can think that it is better to perform Orca init somewhen at the backend init. But we'd like to keep Orca init after GPMemoryProtect_Init() (as it was before Orca moved to an extension). And there is no existing standard hook to call after GPMemoryProtect_Init() and before PostgresMain() enters its main loop. Thus it is where it is. No new tests are added, as existing 'rowhints' test can catch this issue.
This patch renames gp_orca shared lib into orca, as we'd like to avoid using any 'gp' prefix now. Also it updates file names and internal namings to avoid the prefix.
Problem: 'explain_format' test failed if 'optimizer' GUC was set to 'on'. 'auto_explain' test failed with any 'optimizer' GUC value. Cause: Explain command outputs values of GUCs that affect planning if their value differ from the default. Default value for 'optimizer' GUC has been changed to 'off' recently. So, the output of the tests has been changed as well. Fix: The patch updates matchsubs in 'explain_format' test to exclude handling of 'optimizer' GUC value and introduces respective updates in the files with expected output. For 'auto_explain' test, it just updates the files with expected output.
Problem description: Tests for 'pg_stat_statements', launched with 'optimizer=on', failed with error: "FATAL: External planner is not registered". Root cause: Setup and teardown steps of 'pg_stat_statements' configured values of 'shared_preload_libraries' without taking into account its previous value. Thus, it was set to 'pg_stat_statements' at test start, and to empty string after the tests. And 'orca' lib was lost. It caused the error. Fix: Now setup and teardown steps take into account previous value of 'shared_preload_libraries'.
As a part of the bigger task of separating Orca into a Postgres extension, this patch: 1. Moves 'src/backend/gporca/' and 'src/backend/gpopt/' folders with source files into the extension. 2. Moves related headers into the extension. 3. Moves 'src/backend/optimizer/plan/orca.c' into the extension. As the extension already contains 'orca.c' file with hooks implementation, old 'orca.c' is renamed to 'orca_entry.c'. 4. All obj file names of newly added files are added into OBJS of 'gpcontrib/orca/Makefile'. It is mandated by the build infrastructure for extensions (called PGXS). Old makefiles in 'gporca' and 'gpopt' are removed, as no more needed. But CMake files are preserved, allowing to launch 'gporca' unit tests. 5. Updates ABI ignore files, as some symbols are now missing in the core. 6. Moves OptimizerMemoryContext into the extension, as now it is referenced only by the code in the extension. 7. Updates 'fmt' and 'tidy' tools and moves them into the extension, as they were used only for Orca code. Note: 'src/tools/vagrant' also contains some references to Orca code, but it looks too obsolete to be workable (as it references the old gporca git repo, when Orca wasn't a part of the core). So it is left unchanged. 8. Updates Readme and infrastructure files due to all changes above.
Problem: After recent changes related to moving Orca into a shared lib, 'gpconfig' behave test started to fail. Cause: Fail could be reproduced with the following commands: "gpconfig -c optimizer -v on" "gpstop -ar" 'gpconfig' alters 'postgresql.conf' by adding 'optimizer = on' there. So, the issue appeared when we tried to set 'optimizer' GUC from the 'postgresql.conf' file. At this moment shared preload libraries are yet not processed, so the planner hook is yet not registered. It led to an error in 'check_optimizer()' validation function when setting 'optimizer'. Fix: In case planner hook is not registered, issue an error in 'check_optimizer()' only if source is more than PGC_S_ARGV. So, for cases when 'optimizer' is set from the config file, env variable or postmaster's cmd argument, we allow setting 'optimizer' to 'on' even if planner hook is yet not registered, as it can't be registered at this stage, only later.
USE_ORCA define is no longer used. This patch removes generation of this define. File 'configure.in' is modified manually. Other files are re-generated by 'autoconf' and 'autoheader' tools.
Move Orca's GUCs into the extension This patch: 1. Moves all GUCs, that are related to Orca and used inside it, into Orca extension. 2. Updates minirepro utility tool. This tool generates a detailed information related to the monitored SQL commands being executed, including non-default GUCs. The patch updates the query used to retrieve non-default GUCs. The update is needed, as, after moving the "optimizer_*" GUCs, their 'category' in 'pg_settings' table had been changed to 'Customized Options' value. Without this change the respective regression test 'minirepro' fails. 3. Adds all the touched GUC names into the file with ignored symbols for ABI check, as they no longer exist in the main executable. 4. Removes GUC 'optimizer_partition_selection_log' completely, as it is not used anywhere. Note. Following GUCs were not moved, as they are used inside the core: - 'optimizer_enable_query_parameter' - 'optimizer_analyze_root_partition' - 'optimizer_analyze_midlevel_partition' - 'optimizer_replicated_table_insert' - 'optimizer' - 'optimizer_control'
This partially reverts commit 341d342: it returns back all GUCs into the core (except 'optimizer_partition_selection_log', which is not used anywhere), but preserves GUC infrastructure inside the extension and preserves changes in minirepro utility tool. Reason: moving the GUCs into the extension has broken the ability to set them in 'postgresql.conf' file. It is so, because the config file is parsed first time before the shared libs are loaded. Thus, if any of the moved GUCs was set in the config file, parsing failed with error "unrecognized configuration parameter". GPDB can handle GUCs from extensions properly, if the GUCs have a qualified name (e.g. "<extension_name>.<guc_name>"). So, in order to move the Orca related GUCs into the extension, we need to rename them in a way "optimizer_guc_name" -> "orca.optimizer_guc_name". That will break backward compatibility, which we want to avoid in current release. Therefore, we leave Orca related GUCs inside the core in the current release, and we'll move them along with the name updating in the next major release.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TDB. Currently this PR is for CI only to test accumulated changes in the feature branch.