Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADBDEV-6847] Implement cluster validation possibility #1164

Open
wants to merge 18 commits into
base: feature/ADBDEV-6608
Choose a base branch
from

Conversation

bimboterminator1
Copy link
Member

@bimboterminator1 bimboterminator1 commented Dec 20, 2024

Implement cluster validation possibility

This is the first commit for building an MVP for new rebalance utility -
gprebalance. This utility is intended to be used for the situation, when after
cluster resize (after expand, shrink) is in unbalanced state. Balanced state
is defined very simple: if number of segments per host is equal across all the
hosts, then cluster is balanced. There are a lot of other aspects for proper
implementation of optimal rebalance algorithm, which will be implemented in
the next patches.

This patch adds the skeleton of future utility, providing initial validation
of rebalance possibility. It includes checks, that validate some basic aspects:
whether segments can be distributed uniformly and can target mirroring strategy
be achieved. Decided to provide validation through separate classes, which is
different approach from gpexpand utility. Also, some unit tests have been added.
Validation of available disk space is not implemented since cannot be achieved at
this initial validation step


Unit tests can be run from gpMgmt/bin by running rule make unitdevel

@bimboterminator1 bimboterminator1 marked this pull request as ready for review December 20, 2024 03:01
This is the first commit for building an MVP for new rebalance utility -
gprebalance. This utility is intented to be used for the situation, when after
cluster resize (after expand, shrink) is in unbalanced state. Balanced state
is defined very simple: if number of segments per host is equal across all the
hosts, then cluster is balanced. There are a lot of other aspects for proper
implementation of optimal rebalance algorithm, which will be implemented in
the next patches.

This patch adds the skeleton of future utulity, providing initial validation
of rebalance possibility. It includes checks, that validate some basic aspects:
whether segments can be distributed uniformly and can target mirroring strategy
be achieved. Decided to provide validation through separate classed, which is
different approach from gpexpand utility. Also, some unit tests has been added.
@RekGRpth

This comment was marked as resolved.

@RekGRpth

This comment was marked as resolved.

@RekGRpth

This comment was marked as resolved.

gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
@bimboterminator1 bimboterminator1 changed the title [ADBDEV-5482] Implement cluster validation possibility [ADBDEV-6847] Implement cluster validation possibility Dec 23, 2024
Copy link

@KnightMurloc KnightMurloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth adding the ability to run a mirrorless rebalance in silent mode. As I understand it, in the current version we require interactive confirmation from the user. It may be worth adding the -y (answer yes to everything) or -a (auto) option, or giving the option to specify mirrorless as mirror-mode.

gpMgmt/bin/gprebalance Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance_modules/rebalance.py Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance_modules/rebalance_validator.py Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance_modules/rebalance_validator.py Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance_modules/rebalance_validator.py Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
gpMgmt/bin/gprebalance Outdated Show resolved Hide resolved
@RekGRpth

This comment was marked as resolved.

gpMgmt/bin/Makefile Outdated Show resolved Hide resolved
Copy link

@KnightMurloc KnightMurloc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered the following situation. If one of the segments is down, gprebalance may mistakenly consider the cluster to be balanced. Example:
There are 3 primary segments on the sdw1 host and 1 primary segment on sdw2. If one of the segments on sdw1 falls, then its role will be performed by a mirror on the sdw2 host, and two primary segments will be obtained on each host. In this case, gprebalance will consider the cluster to be balanced. Is this the expected behavior?

If you specify an incorrect argument, the utility will output several stack traces, which does not look good.

gpadmin@cdw:~/gpdb_src/gpMgmt/bin$ ./gprebalance asdasdasdas
20250121:08:06:13:010313 gprebalance:cdw:gpadmin-[ERROR]:-Unknown argument asdasdasdas
Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 289, in main
    options, args = validate_options(options, args, parser)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 104, in validate_options
    parser.exit(1)
  File "/usr/lib/python3.10/optparse.py", line 1559, in exit
    sys.exit(status)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 344, in <module>
    main(options, args, parser)
  File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 339, in main
    remove_pid_file(options.coordinator_data_directory)
AttributeError: 'Values' object has no attribute 'coordinator_data_directory'

Comment on lines +215 to +216
if not gpreb.options.silent and not ask_yesno('', ''' %s Are you sure you want
to continue with this gprebalance session?''' % str(e), "N"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current version, the indentation on the second line will be included in the final string. The following changes help to avoid this.

Suggested change
if not gpreb.options.silent and not ask_yesno('', ''' %s Are you sure you want
to continue with this gprebalance session?''' % str(e), "N"):
if not gpreb.options.silent and not ask_yesno('', ("%s Are you sure you want"
"to continue with this gprebalance session?") % str(e), "N"):

Comment on lines +75 to +78
if total_primary_segments % total_hosts != 0:
raise StateValidationError(
f"Cannot evenly distribute {total_primary_segments} segments across {total_hosts} hosts."
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is done twice. Do we really need to do this twice? If so, can we make it a separate function?
Also according to the specification we should offer the user to use gpresize, gpexpand or gpshrink to bring the cluster to the desired state.

Comment on lines +78 to +79
parser.add_option('-f', '--hosts-file', metavar='<hosts_file>', dest='filename',
help='yaml containing target hosts configuration')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not --target-hosts as in the specification?

Comment on lines +198 to +199
with open(coordinator_data_directory + '/gprebalance_hosts.yaml', 'w') as fp:
fp.write(config_yaml)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not in the current directory? As gpexpand does, for example. It may also be better to print the full path to the file, not just the directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants