Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multipath-tools 0.10.2 #109

Open
wants to merge 32 commits into
base: stable-0.10.y
Choose a base branch
from
Open

Conversation

mwilck
Copy link
Collaborator

@mwilck mwilck commented Jan 24, 2025

multipath-tools 0.10.2, 2025/02

This release contains backported bug fixes from the stable-0.11.y branch.

Bug fixes

  • Fix multipathd crash because of invalid path group index value, for example
    if an invalid path device was removed from a map.
    Fixes #105.
  • Make sure maps are reloaded in the path checker loop after detecting an
    inconsistent or wrong kernel state (e.g. missing or falsely mapped path
    device). Wrongly mapped paths will be unmapped and released to the system.
    Fixes another issue reported in
    #105.
  • Fix the problem that group_by_tpg might be disabled if one or more
    paths were offline during initial configuration.
  • Fix possible misdetection of changed pathgroups in a map.
  • Fix the problem that if a map was scheduled to be reloaded already,
    max_sectors_kb might not be set on a path device that
    was being added to a multipath map. This problem was introduced in 0.9.9.

multipath-tools 0.10.1, 2025/01

This is the first bug fix release on the stable-0.10.y branch. It contains
bug fixes from 0.11.0, and some CI-related fixes.

Bug fixes

  • Fixed the problem that multipathd wouldn't start on systems with certain types
    of device mapper devices, in particular devices with multiple DM targets.
    The problem was introduced in 0.10.0.
    Fixes #102.
  • Fixed a corner case in the udev rules which could cause a device not to be
    activated during boot if a cold plug uevent is processed for a previously
    not configured multipath map while this map was suspended. This problem existed
    since 0.9.8.
  • Fixed the problem that devices with no_path_retry fail and no setting
    for dev_loss_tmo might get the dev_loss_tmo set to 0, causing the
    device to be deleted immediately in the event of a transport disruption.
    This bug was introduced in 0.9.6.
  • Fixed the problem that, if there were multiple maps with deferred failback
    (failback value > 0 in multipath.conf), some maps might fail back later
    than configured. The problem existed since 0.9.6.
  • Removed a warning message that multipathd would print if systemd's
    WATCHDOG_USEC environment variable had the value "0", which means that the
    watchdog is simply disabled. This (minor) problem existed since 0.4.9.
  • Fixed a memory leak in the nvme foreign library. The bug existed since
    0.7.8.
  • Fixed a problem in the marginal path detection algorithm that could cause
    the io error check for a recently failed path to be delayed. This bug
    existed since 0.7.4.

mwilck and others added 30 commits November 14, 2024 17:42
We can't use "container:" any more because upload-artifact@v4 doesn't
work on Debian Jessie.

Signed-off-by: Martin Wilck <[email protected]>
Running "make" in a container and "make clean" outside doesn't
work (access right issues). So just use separate jobs.

Signed-off-by: Martin Wilck <[email protected]>
Avoid "text file busy" error on GitHub.

dmevents-test: Text file busy
Makefile:74: recipe for target 'dmevents.out' failed

Signed-off-by: Martin Wilck <[email protected]>
The ABI should never change on a stable branch. This workflow
asserts that.

Signed-off-by: Martin Wilck <[email protected]>
dm_get_maps() traverses the entire list of dm maps. We shouldn't
give up just because probing a single map failed.

Fixes: bf3a4ad ("libmultipath: simplify dm_get_maps()")
Signed-off-by: Martin Wilck <[email protected]>
We import DM_COLDPLUG_SUSPENDED in all code flows below mpath_coldplug_end.
Clarify this in the code.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Since b22c273 ("11-dm-mpath.rules: Don't force activation while device is
suspended"), we've handled the case where a device is suspended while
an uevent is processed (e.g. because multipathd is reloading the
map again at the same time). But we were missing the case where
The device had never been initialized before. If .MPATH_DEVICE_READY_OLD
was empty, we'd jump to scan_import without setting MPATH_DEVICE_READY
to 0. This can cause a device not to be fully activated at boot time,
because in follow-up uevents we'd assume that the device had already
been set up.

Treat the case in which an uevent is processed for a previously not
fully set-up, suspended device like other situations where we set
MPATH_DEVICE_READY to 0.

Fixes: b22c273 ("11-dm-mpath.rules: Don't force activation while device is
suspended")
Reviewed-by: Benjamin Marzinski <[email protected]>
Our code is always setting MPATH_UNCHANGED and DM_ACTIVATION in
pairs. While DM_ACTIVATION is a global DM property, MPATH_UNCHANGED
is owned by us. Just set MPATH_UNCHANGED, and adapt DM_ACTIVATION
when necessary just in one place.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
We set .DM_NOSCAN above where we set DM_UDEV_DISABLE_OTHER_RULES_FLAG, too. If
the latter isn't set, .DM_NOSCAN can't be set. Skip the redundant test.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
When multipath reloads a device or fails or restores a path, the udev
rules disable LVM scanning, but since .DM_NOSCAN isn't set, blkid is
still run on the device. When multipath devices that are set to
queue_if_no_path lose all their paths at close to the same time, udev
workers can hang trying to run blkid. The blkid results shouldn't
change when multipathd is adding, removing, failing or reinstating
paths, aside from avoiding hanging udev processes, we're skipping
unnecessary work.

Hence, set .DM_NOSCAN if MPATH_UNCHANGED is set, to avoid blkid from
being called in 13-dm.rules.

Suggested-by: Benjamin Marzinski <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
If pp->dev_loss is DEV_LOSS_TMO_UNSET and min_dev_loss is 0 (which is
the case if no_path_retry is NO_PATH_RETRY_FAIL or NO_PATH_RETRY_UNDEF),
we will set pp->dev_loss to 0, which is wrong. Fix it.

Fixes: 058b5f5 ("libmultipath: fix dev_loss_tmo even if not set in configuration")
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
If reload_and_sync_map() removes the multipath device,
deferred_failback_tick() needs to decrement the counter so that it
doesn't skip the following device.

Reviewed-by: Martin Wilck <[email protected]>
Signed-off-by: Benjamin Marzinski <[email protected]>
Reported by coverity: "i--" may cause an underflow, which will again
cause an overflow when the loop continues. Use a signed int for
loops like this to make coverity happy.

Signed-off-by: Martin Wilck <[email protected]>
Avoid the following error with clang 19:

msort.c:268:27: error: cast from '__compar_fn_t' (aka 'int (*)(const void *, const void *)') to '__compar_d_fn_t' (aka 'int (*)(const void *, const void *, void *)') converts to incompatible function type [-Werror,-Wcast-function-type-mismatch]
  268 |         return msort_r (b, n, s, (__compar_d_fn_t)cmp, NULL);
      |                                  ^~~~~~~~~~~~~~~~~~~~

Signed-off-by: Martin Wilck <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
WATCHDOG_USEC may be set to 0, which means that the watchdog
is disabled in systemd.

Fixes: 9366cfb ("multipathd: Implement systemd watchdog integration")
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
On systems with LVM volumes, "multipath -ll" will spit out lots of messages
like

    libmp_mapinfo: map vg-lv0 has multiple targets

which is irritating. Reduce the log level of these messages to 3, as they
are harmless most of the time.

This is a backport of e8949c2 ("libmultipath: reduce log level of
libmp_mapinfo() messages") from the master branch. We can't apply exactly
the same fix because the stable branch is missing 8c772d3 ("libmultipath:
check DM UUID earlier in libmp_mapinfo__").

Signed-off-by: Martin Wilck <[email protected]>
_find_controllers() needs to free the udev device if it doesn't get
added to a path. Otherwise it can leak memory whenever check_foreign()
is called, causing multipathd's memory usage to continually grow.

Fixes: 7b47762 ("libmultipath: nvme: fix path detection for kernel 4.16")
Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
(cherry picked from commit 5a3d334)
In function io_err_stat_handle_pathfail(), path->io_err_dis_reinstate_time
is set to 0 to enqueue path to io error check as soon as possible. But
multipathd can not do it within marginal_path_err_recheck_gap_time seconds
after power-on, because curr_time is less than
marginal_path_err_recheck_gap_time.

To handle the early marginal path, we can enqueue path when
io_err_dis_reinstate_time is 0.

Signed-off-by: chenrenhui <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>

>

(cherry picked from commit a1e3cf2)
pp->pgindex is set in disassemble_map() when a map is parsed.
There are various possiblities for this index to become invalid.
pp->pgindex is only used in enable_group() and followover_should_fallback(),
and both callers take no action if it is 0, which is the right
thing to do if we don't know the path's pathgroup.

Make sure pp->pgindex is reset to 0 in various places:
- when it's orphaned,
- before (re)grouping paths,
- when we detect a bad mpp assignment in update_pathvec_from_dm().
- when a pathgroup is deleted in update_pathvec_from_dm(). In this
  case, pgindex needs to be invalidated for all paths in all pathgroups
  after the one that was deleted.

The hunk in group_paths is mostly redundant with the hunk in free_pgvec(), but
because we're looping over pg->paths in the former and over pg->pgp in
the latter, I think it's better too play safe.

Fixes: 99db1bd ("[multipathd] re-enable disabled PG when at least one path is up")
Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
(cherry picked from commit cd912cf)
(cherry picked from commit 714c20b)
The previous algorithm didn't detect the case case where cpgp
contained a path that was not contained in pgp. Fix this.

Cherry-picked from d4b35f6
Fixes: 90773ba ("libmultipath: resolve hash collisions in pgcmp()")
Signed-off-by: Martin Wilck <[email protected]>
If map creation succeeds, previously not multipathed devices are
now multipathed. udev may not have noticed this yet, thus trigger
path uevents to make it aware of the situation. Likewise, if
creating a map fails, the paths in question were likely considered
multipath members by udev, too. They will now be marked as failed,
so trigger an event in this situation as well.

Fixes: opensvc#103
Suggested-by: Benjamin Marzinski <[email protected]>
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
(cherry picked from commit 98b3a7b)
... if a map has been flushed. In this case, we know that the
the paths haven't been multipathed by coalesce_paths() because of the current
configuration (failure to create the map can't be the reason if the map
exists in coalesce_maps()). Make sure udev sees the paths which have
been released from the map as non-multipath.

Note that this is the only case where maps are flushed where it is correct
to trigger paths uevents. In other cases, e.g. after a "remove map" CLI
command, the configuration is unchanged and if we triggered an uevent,
the map would be re-created by multipathd when the uevent arrived.

Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
(cherry picked from commit ad3ea47)
If a multipath device already has need_reload set when a path is
adopted, it won't call set_path_max_sectors_kb() because of
short-circuit evaluation. This isn't what's intended.

Fixes: e5e20c7 ("libmultipath: set max_sectors_kb in adopt_paths()")
Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
(cherry picked from commit f5c0c4b)
This change doesn't actually fix anything. The code was already safe.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
(cherry picked from commit 85ec51e)
multipath wouldn't autodetect the GROUP_BY_PRIO path grouping policy or
allow the GROUP_BY_TPG policy if there was a path that didn't have its
prioritizer selected (for instance because multipathd was reconfigured
while it was offline). To avoid this, make verify_alua_prio() assume an
alua multipath device if all the paths with a prioritizer selected
(there must be at least one) use an alua-based prioritizer.

Signed-off-by: Benjamin Marzinski <[email protected]>
Reviewed-by: Martin Wilck <[email protected]>
(cherry picked from commit b47a577)
@mwilck mwilck requested a review from bmarzins January 24, 2025 23:26

This comment has been minimized.

@mwilck
Copy link
Collaborator Author

mwilck commented Jan 24, 2025

Hm the "stable abi" test worked in my repo. I will double check.

@mwilck mwilck changed the title Stable 0.10.y multipath-tools 0.10.2 Jan 25, 2025
@mwilck mwilck mentioned this pull request Jan 25, 2025
Copy link
Collaborator

@bmarzins bmarzins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants