You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to run "showremapped --by-osd" gives the following:
./placementoptimizer.py showremapped --by-osd
Traceback (most recent call last):
File "/root/ceph-balancer/./placementoptimizer.py", line 5497, in
exit(main())
File "/root/ceph-balancer/./placementoptimizer.py", line 5491, in main
run()
File "/root/ceph-balancer/./placementoptimizer.py", line 5461, in
run = lambda: showremapped(args, state)
File "/root/ceph-balancer/./placementoptimizer.py", line 5347, in showremapped
print(f"{osdname}: {cluster.osds[osdid]['host_name']} =>{sum_to} {sum_data_to_pp} <={sum_from} {sum_data_from_pp}"
KeyError: -1
I'm suspecting this crash could be due to there being undersized & degraded pgs in the cluster that are being remapped and cluster.osds being indexed by -1. If so, this should be pretty easy to fix.
Excerpt from output of "showremapped":
pg 18.38d toofull 223.0G: 4659450 of 4659450, 100.0%, 97->220;221->236;55->77;90->113;124->60;-1->76
pg 18.40a toofull 111.3G: 3874000 of 3874000, 100.0%, 216->72;-1->220;45->162;252->90;46->58;171->130;54->175;186->54;116->112;147->118
pg 18.432 backfill 111.6G: 3881870 of 3881870, 100.0%, -1->72;117->97;139->185;27->240;51->212;175->29;33->95;85->109;239->102;96->44
pg 18.45a backfill 111.4G: 1165623 of 1165623, 100.0%, 92->61;-1->99;114->104
ceph status
cluster:
id: xxx
health: HEALTH_WARN
3 failed cephadm daemon(s)
nodeep-scrub flag(s) set
10 backfillfull osd(s)
19 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 105 pgs backfill_toofull
Degraded data redundancy: 318827/9214981270 objects degraded (0.003%), 1 pg degraded, 12 pgs undersized
88 pgs not deep-scrubbed in time
1220 pgs not scrubbed in time
5 pool(s) backfillfull
29 slow ops, oldest one blocked for 667 sec, daemons [osd.110,osd.42] have slow ops.
services:
mon: 3 daemons, quorum sm1,sm3,sm2 (age 29h)
mgr: sm2.igewzl(active, since 29h), standbys: sm1.guvysx, sm3.hjkzda
mds: 1/1 daemons up, 2 standby
osd: 259 osds: 259 up (since 16m), 259 in (since 18m); 208 remapped pgs
flags nodeep-scrub
Trying to run "showremapped --by-osd" gives the following:
./placementoptimizer.py showremapped --by-osd
Traceback (most recent call last):
File "/root/ceph-balancer/./placementoptimizer.py", line 5497, in
exit(main())
File "/root/ceph-balancer/./placementoptimizer.py", line 5491, in main
run()
File "/root/ceph-balancer/./placementoptimizer.py", line 5461, in
run = lambda: showremapped(args, state)
File "/root/ceph-balancer/./placementoptimizer.py", line 5347, in showremapped
print(f"{osdname}: {cluster.osds[osdid]['host_name']} =>{sum_to} {sum_data_to_pp} <={sum_from} {sum_data_from_pp}"
KeyError: -1
I'm suspecting this crash could be due to there being undersized & degraded pgs in the cluster that are being remapped and cluster.osds being indexed by -1. If so, this should be pretty easy to fix.
Excerpt from output of "showremapped":
pg 18.38d toofull 223.0G: 4659450 of 4659450, 100.0%, 97->220;221->236;55->77;90->113;124->60;-1->76
pg 18.40a toofull 111.3G: 3874000 of 3874000, 100.0%, 216->72;-1->220;45->162;252->90;46->58;171->130;54->175;186->54;116->112;147->118
pg 18.432 backfill 111.6G: 3881870 of 3881870, 100.0%, -1->72;117->97;139->185;27->240;51->212;175->29;33->95;85->109;239->102;96->44
pg 18.45a backfill 111.4G: 1165623 of 1165623, 100.0%, 92->61;-1->99;114->104
ceph status
cluster:
id: xxx
health: HEALTH_WARN
3 failed cephadm daemon(s)
nodeep-scrub flag(s) set
10 backfillfull osd(s)
19 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 105 pgs backfill_toofull
Degraded data redundancy: 318827/9214981270 objects degraded (0.003%), 1 pg degraded, 12 pgs undersized
88 pgs not deep-scrubbed in time
1220 pgs not scrubbed in time
5 pool(s) backfillfull
29 slow ops, oldest one blocked for 667 sec, daemons [osd.110,osd.42] have slow ops.
services:
mon: 3 daemons, quorum sm1,sm3,sm2 (age 29h)
mgr: sm2.igewzl(active, since 29h), standbys: sm1.guvysx, sm3.hjkzda
mds: 1/1 daemons up, 2 standby
osd: 259 osds: 259 up (since 16m), 259 in (since 18m); 208 remapped pgs
flags nodeep-scrub
data:
volumes: 1/1 healthy
pools: 14 pools, 3122 pgs
objects: 1.22G objects, 1.8 PiB
usage: 2.4 PiB used, 1.2 PiB / 3.6 PiB avail
pgs: 318827/9214981270 objects degraded (0.003%)
78100572/9214981270 objects misplaced (0.848%)
2554 active+clean
360 active+clean+scrubbing
99 active+remapped+backfilling
96 active+remapped+backfill_toofull
8 active+undersized+remapped+backfill_toofull
3 active+undersized+remapped+backfilling
1 active+undersized+degraded+remapped+backfilling
1 active+remapped+backfill_wait+backfill_toofull
io:
client: 1.6 MiB/s rd, 43 KiB/s wr, 1.15k op/s rd, 17 op/s wr
recovery: 1.7 GiB/s, 1.62k objects/s
The text was updated successfully, but these errors were encountered: