Skip to content

Commit

Permalink
monclient: try to resend the mon commands to the same monitor if avai…
Browse files Browse the repository at this point in the history
…lable

When we have a socket failure or connection issue, we may send a mon command
and never check if it completed. If we resend the command to another monitor,
the resent command may complete before the first sent command. This can cause
users to send the command twice, which can lead to issues in automated
environments. For example:

We have 2 monitors: mon.a and mon.b

1. Send command to delete pool - monclient targets mon.a
2. A socket failure occurs, and mon.a has a delay in response
3. Monclient hunts for another monitor to resend the delete pool command
   and finds mon.b
4. Mon.b removes the pool and sends an acknowledgment
5. The user script now sends a create pool command, but mon.a now sends the
   acknowledgment for the pool delete from step #1

We end up without a pool, as mon.a deleted it.

The mon_client_hunt_on_resent configuration was added to control the behavior of
retrying commands on monitor connection failures.
By default, this option is enabled to prevent situations where a command is retried
on the same monitor, potentially missing better monitor candidates.
Clients experiencing specific conditions that require retrying on the same monitor
can disable this feature by setting the configuration to false.

Fixes: https://tracker.ceph.com/issues/63789
Signed-off-by: Nitzan Mordechai <[email protected]>
  • Loading branch information
NitzanMordhai committed Jun 4, 2024
1 parent ec0f9cc commit ee8e66a
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 0 deletions.
8 changes: 8 additions & 0 deletions src/common/options/global.yaml.in
Original file line number Diff line number Diff line change
Expand Up @@ -2308,6 +2308,14 @@ options:
level: advanced
default: 10
with_legacy: true
- name: mon_client_hunt_on_resent
type: bool
level: advanced
default: true
fmt_desc: On commands connection failure, hunt for any monitor.
If false, try to resend the command to the same monitor to prevent
command race conditions.
with_legacy: true
- name: mon_client_max_log_entries_per_message
type: int
level: advanced
Expand Down
21 changes: 21 additions & 0 deletions src/mon/MonClient.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1249,6 +1249,7 @@ void MonClient::_send_command(MonCommand *r)
}

// normal CLI command
r->sent_name = monmap.get_name(active_con->get_con()->get_peer_addr());
ldout(cct, 10) << __func__ << " " << r->tid << " " << r->cmd << dendl;
auto m = ceph::make_message<MMonCommand>(monmap.fsid);
m->set_tid(r->tid);
Expand Down Expand Up @@ -1286,6 +1287,26 @@ void MonClient::_resend_mon_commands()
// starting with octopus, tell commands use their own connetion and need no
// special resend when we finish hunting.
} else {
if (!cct->_conf->mon_client_hunt_on_resent) {
// Ensure the target name of the resend mon command matches the rank
// of the current active connection.
ldout(cct, 20) << __func__ << " " << cmd->tid << " " << cmd->cmd
<< " last sent_name " << cmd->sent_name << dendl;
if (!monmap.contains(cmd->sent_name)) {
ldout(cct, 20) << __func__ << " " << cmd->tid << " " << cmd->cmd
<< " sent_name " << cmd->sent_name << " not in monmap using mon:" <<
monmap.get_name(active_con->get_con()->get_peer_addr()) << dendl;
cmd->sent_name = monmap.get_name(active_con->get_con()->get_peer_addr());
} else if (active_con && cmd->sent_name.length() &&
cmd->sent_name != monmap.get_name(active_con->get_con()->get_peer_addr()) &&
monmap.contains(cmd->sent_name)) {
ldout(cct, 20) << __func__ << " " << cmd->tid << " " << cmd->cmd
<< " wants mon " << cmd->sent_name
<< " current connection is " << monmap.get_name(active_con->get_con()->get_peer_addr())
<< ", reopening session" << dendl;
_reopen_session(monmap.get_rank(cmd->sent_name));
}
}
_send_command(cmd); // might remove cmd from mon_commands
}
}
Expand Down
1 change: 1 addition & 0 deletions src/mon/MonClient.h
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,7 @@ class MonClient : public Dispatcher,
struct MonCommand {
// for tell only
std::string target_name;
std::string sent_name;
int target_rank = -1;
ConnectionRef target_con;
std::unique_ptr<MonConnection> target_session;
Expand Down

0 comments on commit ee8e66a

Please sign in to comment.