-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADBDEV-6847] Implement cluster validation possibility #1164
base: feature/ADBDEV-6608
Are you sure you want to change the base?
Conversation
8d73ee3
to
d77e666
Compare
This is the first commit for building an MVP for new rebalance utility - gprebalance. This utility is intented to be used for the situation, when after cluster resize (after expand, shrink) is in unbalanced state. Balanced state is defined very simple: if number of segments per host is equal across all the hosts, then cluster is balanced. There are a lot of other aspects for proper implementation of optimal rebalance algorithm, which will be implemented in the next patches. This patch adds the skeleton of future utulity, providing initial validation of rebalance possibility. It includes checks, that validate some basic aspects: whether segments can be distributed uniformly and can target mirroring strategy be achieved. Decided to provide validation through separate classed, which is different approach from gpexpand utility. Also, some unit tests has been added.
d77e666
to
2d9c93f
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding the ability to run a mirrorless rebalance in silent mode. As I understand it, in the current version we require interactive confirmation from the user. It may be worth adding the -y (answer yes to everything) or -a (auto) option, or giving the option to specify mirrorless as mirror-mode.
gpMgmt/bin/gppylib/test/unit/data/gprebalance/balanced_no_mirrors.array
Outdated
Show resolved
Hide resolved
gpMgmt/bin/gppylib/test/unit/data/gprebalance/balanced_spread.array
Outdated
Show resolved
Hide resolved
gpMgmt/bin/gppylib/test/unit/data/gprebalance/unbalanced_to_spread_neg.array
Outdated
Show resolved
Hide resolved
gpMgmt/bin/gppylib/test/unit/data/gprebalance/balanced_spread.array
Outdated
Show resolved
Hide resolved
gpMgmt/bin/gppylib/test/unit/data/gprebalance/unbalanced_to_spread_neg.array
Outdated
Show resolved
Hide resolved
gpMgmt/bin/gppylib/test/unit/data/gprebalance/unbalanced_to_spread_pos.array
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered the following situation. If one of the segments is down, gprebalance may mistakenly consider the cluster to be balanced. Example:
There are 3 primary segments on the sdw1 host and 1 primary segment on sdw2. If one of the segments on sdw1 falls, then its role will be performed by a mirror on the sdw2 host, and two primary segments will be obtained on each host. In this case, gprebalance will consider the cluster to be balanced. Is this the expected behavior?
If you specify an incorrect argument, the utility will output several stack traces, which does not look good.
gpadmin@cdw:~/gpdb_src/gpMgmt/bin$ ./gprebalance asdasdasdas
20250121:08:06:13:010313 gprebalance:cdw:gpadmin-[ERROR]:-Unknown argument asdasdasdas
Traceback (most recent call last):
File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 289, in main
options, args = validate_options(options, args, parser)
File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 104, in validate_options
parser.exit(1)
File "/usr/lib/python3.10/optparse.py", line 1559, in exit
sys.exit(status)
SystemExit: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 344, in <module>
main(options, args, parser)
File "/home/gpadmin/gpdb_src/gpMgmt/bin/./gprebalance", line 339, in main
remove_pid_file(options.coordinator_data_directory)
AttributeError: 'Values' object has no attribute 'coordinator_data_directory'
if not gpreb.options.silent and not ask_yesno('', ''' %s Are you sure you want | ||
to continue with this gprebalance session?''' % str(e), "N"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current version, the indentation on the second line will be included in the final string. The following changes help to avoid this.
if not gpreb.options.silent and not ask_yesno('', ''' %s Are you sure you want | |
to continue with this gprebalance session?''' % str(e), "N"): | |
if not gpreb.options.silent and not ask_yesno('', ("%s Are you sure you want" | |
"to continue with this gprebalance session?") % str(e), "N"): |
if total_primary_segments % total_hosts != 0: | ||
raise StateValidationError( | ||
f"Cannot evenly distribute {total_primary_segments} segments across {total_hosts} hosts." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is done twice. Do we really need to do this twice? If so, can we make it a separate function?
Also according to the specification we should offer the user to use gpresize, gpexpand or gpshrink to bring the cluster to the desired state.
parser.add_option('-f', '--hosts-file', metavar='<hosts_file>', dest='filename', | ||
help='yaml containing target hosts configuration') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not --target-hosts
as in the specification?
with open(coordinator_data_directory + '/gprebalance_hosts.yaml', 'w') as fp: | ||
fp.write(config_yaml) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not in the current directory? As gpexpand does, for example. It may also be better to print the full path to the file, not just the directory.
Implement cluster validation possibility
This is the first commit for building an MVP for new rebalance utility -
gprebalance. This utility is intended to be used for the situation, when after
cluster resize (after expand, shrink) is in unbalanced state. Balanced state
is defined very simple: if number of segments per host is equal across all the
hosts, then cluster is balanced. There are a lot of other aspects for proper
implementation of optimal rebalance algorithm, which will be implemented in
the next patches.
This patch adds the skeleton of future utility, providing initial validation
of rebalance possibility. It includes checks, that validate some basic aspects:
whether segments can be distributed uniformly and can target mirroring strategy
be achieved. Decided to provide validation through separate classes, which is
different approach from gpexpand utility. Also, some unit tests have been added.
Validation of available disk space is not implemented since cannot be achieved at
this initial validation step
Unit tests can be run from gpMgmt/bin by running rule
make unitdevel