[ADBDEV-6847] Implement cluster validation possibility #1164

bimboterminator1 · 2024-12-20T02:54:35Z

Implement cluster validation possibility

This is the first commit for building an MVP for new rebalance utility -
gprebalance. This utility is intended to be used for the situation, when after
cluster resize (after expand, shrink) is in unbalanced state. Balanced state
is defined very simple: if number of segments per host is equal across all the
hosts, then cluster is balanced. There are a lot of other aspects for proper
implementation of optimal rebalance algorithm, which will be implemented in
the next patches.

This patch adds the skeleton of future utility, providing initial validation
of rebalance possibility. It includes checks, that validate some basic aspects:
whether segments can be distributed uniformly and can target mirroring strategy
be achieved. Decided to provide validation through separate classes, which is
different approach from gpexpand utility. Also, some unit tests have been added.
Validation of available disk space is not implemented since cannot be achieved at
this initial validation step

Unit tests can be run from gpMgmt/bin by running rule make unitdevel

This is the first commit for building an MVP for new rebalance utility - gprebalance. This utility is intented to be used for the situation, when after cluster resize (after expand, shrink) is in unbalanced state. Balanced state is defined very simple: if number of segments per host is equal across all the hosts, then cluster is balanced. There are a lot of other aspects for proper implementation of optimal rebalance algorithm, which will be implemented in the next patches. This patch adds the skeleton of future utulity, providing initial validation of rebalance possibility. It includes checks, that validate some basic aspects: whether segments can be distributed uniformly and can target mirroring strategy be achieved. Decided to provide validation through separate classed, which is different approach from gpexpand utility. Also, some unit tests has been added.

gpMgmt/bin/gprebalance

KnightMurloc

I think it's worth adding the ability to run a mirrorless rebalance in silent mode. As I understand it, in the current version we require interactive confirmation from the user. It may be worth adding the -y (answer yes to everything) or -a (auto) option, or giving the option to specify mirrorless as mirror-mode.

gpMgmt/bin/gprebalance

gpMgmt/bin/gprebalance_modules/rebalance.py

gpMgmt/bin/gprebalance_modules/rebalance_validator.py

gpMgmt/bin/gprebalance

gpMgmt/bin/gprebalance_modules/rebalance.py

gpMgmt/bin/gppylib/test/unit/test_unit_gprebalance.py

gpMgmt/bin/gppylib/test/unit/data/gprebalance/balanced_no_mirrors.array

gpMgmt/bin/gppylib/test/unit/data/gprebalance/balanced_spread.array

gpMgmt/bin/gppylib/test/unit/data/gprebalance/unbalanced_to_spread_neg.array

gpMgmt/bin/gppylib/test/unit/data/gprebalance/balanced_spread.array

gpMgmt/bin/gppylib/test/unit/data/gprebalance/unbalanced_to_spread_neg.array

gpMgmt/bin/gppylib/test/unit/data/gprebalance/unbalanced_to_spread_pos.array

gpMgmt/bin/gprebalance_modules/rebalance_validator.py

gpMgmt/bin/gprebalance

gpMgmt/bin/gprebalance_modules/rebalance_validator.py

gpMgmt/bin/Makefile

KnightMurloc

I discovered the following situation. If one of the segments is down, gprebalance may mistakenly consider the cluster to be balanced. Example:
There are 3 primary segments on the sdw1 host and 1 primary segment on sdw2. If one of the segments on sdw1 falls, then its role will be performed by a mirror on the sdw2 host, and two primary segments will be obtained on each host. In this case, gprebalance will consider the cluster to be balanced. Is this the expected behavior?

If you specify an incorrect argument, the utility will output several stack traces, which does not look good.

gpadmin@cdw:~/gpdb_src/gpMgmt/bin$ ./gprebalance asdasdasdas
20250121:08:06:13:010313 gprebalance:cdw:gpadmin-[ERROR]:-Unknown argument asdasdasdas
Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 289, in main
    options, args = validate_options(options, args, parser)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 104, in validate_options
    parser.exit(1)
  File "/usr/lib/python3.10/optparse.py", line 1559, in exit
    sys.exit(status)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 344, in <module>
    main(options, args, parser)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 339, in main
    remove_pid_file(options.coordinator_data_directory)
AttributeError: 'Values' object has no attribute 'coordinator_data_directory'

KnightMurloc · 2025-01-20T09:42:35Z

gpMgmt/bin/gprebalance

+        if not gpreb.options.silent and not ask_yesno('', ''' %s Are you sure you want
+                                                      to continue with this gprebalance session?''' % str(e), "N"):


In the current version, the indentation on the second line will be included in the final string. The following changes help to avoid this.

Suggested change

if not gpreb.options.silent and not ask_yesno('', ''' %s Are you sure you want

to continue with this gprebalance session?''' % str(e), "N"):

if not gpreb.options.silent and not ask_yesno('', ("%s Are you sure you want"

"to continue with this gprebalance session?") % str(e), "N"):

KnightMurloc · 2025-01-20T09:53:53Z

gpMgmt/bin/gprebalance_modules/rebalance_validator.py

+        if total_primary_segments % total_hosts != 0:
+            raise StateValidationError(
+                f"Cannot evenly distribute {total_primary_segments} segments across {total_hosts} hosts."
+            )


This check is done twice. Do we really need to do this twice? If so, can we make it a separate function?
Also according to the specification we should offer the user to use gpresize, gpexpand or gpshrink to bring the cluster to the desired state.

KnightMurloc · 2025-01-21T09:54:46Z

gpMgmt/bin/gprebalance

+    parser.add_option('-f', '--hosts-file', metavar='<hosts_file>', dest='filename',
+                      help='yaml containing target hosts configuration')


why not --target-hosts as in the specification?

KnightMurloc · 2025-01-21T09:59:23Z

gpMgmt/bin/gprebalance

+    with open(coordinator_data_directory + '/gprebalance_hosts.yaml', 'w') as fp:
+        fp.write(config_yaml)


Why not in the current directory? As gpexpand does, for example. It may also be better to print the full path to the file, not just the directory.

bimboterminator1 force-pushed the ADBDEV-5482 branch from 8d73ee3 to d77e666 Compare December 20, 2024 02:58

bimboterminator1 marked this pull request as ready for review December 20, 2024 03:01

bimboterminator1 force-pushed the ADBDEV-5482 branch from d77e666 to 2d9c93f Compare December 20, 2024 05:56

This comment was marked as resolved.

Sign in to view

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Show resolved Hide resolved

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Dec 23, 2024

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

bimboterminator1 changed the title ~~[ADBDEV-5482] Implement cluster validation possibility~~ [ADBDEV-6847] Implement cluster validation possibility Dec 23, 2024

bimboterminator1 added 3 commits December 24, 2024 00:23

Make mirroring a choice type

3445200

fix docstring

b1c54e4

fix quotes and use formatting

e0f9068

KnightMurloc reviewed Dec 28, 2024

View reviewed changes

Minor changes

08ca9e2

RekGRpth reviewed Jan 10, 2025

View reviewed changes

gpMgmt/bin/gprebalance Show resolved Hide resolved

RekGRpth reviewed Jan 10, 2025

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 10, 2025

View reviewed changes

gpMgmt/bin/gprebalance Show resolved Hide resolved

RekGRpth reviewed Jan 10, 2025

View reviewed changes

gpMgmt/bin/gprebalance_modules/rebalance.py Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 13, 2025

View reviewed changes

gpMgmt/bin/gppylib/test/unit/test_unit_gprebalance.py Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 13, 2025

View reviewed changes

gpMgmt/bin/gppylib/test/unit/data/gprebalance/balanced_no_mirrors.array Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 13, 2025

View reviewed changes

gpMgmt/bin/gppylib/test/unit/data/gprebalance/balanced_spread.array Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 13, 2025

View reviewed changes

gpMgmt/bin/gppylib/test/unit/data/gprebalance/unbalanced_to_spread_neg.array Outdated Show resolved Hide resolved

bimboterminator1 added 6 commits January 14, 2025 15:22

fix tests content and refactoring

5042998

Change logic with new hosts

1915b47

delete hosts file

082ab51

parser exit 1

b8032d1

generalize hosts work

81e3755

filename

42fae2c

RekGRpth reviewed Jan 16, 2025

View reviewed changes

gpMgmt/bin/gppylib/test/unit/data/gprebalance/balanced_spread.array Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 16, 2025

View reviewed changes

gpMgmt/bin/gppylib/test/unit/data/gprebalance/unbalanced_to_spread_neg.array Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 16, 2025

View reviewed changes

gpMgmt/bin/gppylib/test/unit/data/gprebalance/unbalanced_to_spread_pos.array Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 16, 2025

View reviewed changes

gpMgmt/bin/gprebalance_modules/rebalance_validator.py Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 16, 2025

View reviewed changes

gpMgmt/bin/gprebalance_modules/rebalance_validator.py Outdated Show resolved Hide resolved

Allow shrinkage

cb9f3d2

RekGRpth reviewed Jan 16, 2025

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 16, 2025

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

bimboterminator1 added 2 commits January 16, 2025 13:53

invalid segments

9ca5dc5

flag message

41bc541

RekGRpth reviewed Jan 16, 2025

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

the same digits

b17a001

RekGRpth reviewed Jan 20, 2025

View reviewed changes

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved

RekGRpth reviewed Jan 20, 2025

View reviewed changes

gpMgmt/bin/gprebalance Show resolved Hide resolved

RekGRpth reviewed Jan 20, 2025

View reviewed changes

gpMgmt/bin/gprebalance Show resolved Hide resolved

RekGRpth reviewed Jan 20, 2025

View reviewed changes

gpMgmt/bin/gprebalance_modules/rebalance_validator.py Outdated Show resolved Hide resolved

Minor changes

4b1c437

This comment was marked as resolved.

Sign in to view

Install gprebalance

3c4bcfa

RekGRpth reviewed Jan 21, 2025

View reviewed changes

gpMgmt/bin/Makefile Outdated Show resolved Hide resolved

tests

a404558

KnightMurloc reviewed Jan 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADBDEV-6847] Implement cluster validation possibility #1164

[ADBDEV-6847] Implement cluster validation possibility #1164

bimboterminator1 commented Dec 20, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

KnightMurloc left a comment

This comment was marked as resolved.

KnightMurloc left a comment

KnightMurloc Jan 20, 2025

KnightMurloc Jan 20, 2025

KnightMurloc Jan 21, 2025

KnightMurloc Jan 21, 2025

		if not gpreb.options.silent and not ask_yesno('', ''' %s Are you sure you want
		to continue with this gprebalance session?''' % str(e), "N"):

		parser.add_option('-f', '--hosts-file', metavar='<hosts_file>', dest='filename',
		help='yaml containing target hosts configuration')

		with open(coordinator_data_directory + '/gprebalance_hosts.yaml', 'w') as fp:
		fp.write(config_yaml)

[ADBDEV-6847] Implement cluster validation possibility #1164

Are you sure you want to change the base?

[ADBDEV-6847] Implement cluster validation possibility #1164

Conversation

bimboterminator1 commented Dec 20, 2024 • edited Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

KnightMurloc left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

KnightMurloc left a comment

Choose a reason for hiding this comment

KnightMurloc Jan 20, 2025

Choose a reason for hiding this comment

KnightMurloc Jan 20, 2025

Choose a reason for hiding this comment

KnightMurloc Jan 21, 2025

Choose a reason for hiding this comment

KnightMurloc Jan 21, 2025

Choose a reason for hiding this comment

bimboterminator1 commented Dec 20, 2024 •

edited

Loading