Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

showremapped --by-osd crash #49

Closed
tatuylonen opened this issue Oct 18, 2024 · 1 comment
Closed

showremapped --by-osd crash #49

tatuylonen opened this issue Oct 18, 2024 · 1 comment

Comments

@tatuylonen
Copy link

Trying to run "showremapped --by-osd" gives the following:

./placementoptimizer.py showremapped --by-osd
Traceback (most recent call last):
File "/root/ceph-balancer/./placementoptimizer.py", line 5497, in
exit(main())
File "/root/ceph-balancer/./placementoptimizer.py", line 5491, in main
run()
File "/root/ceph-balancer/./placementoptimizer.py", line 5461, in
run = lambda: showremapped(args, state)
File "/root/ceph-balancer/./placementoptimizer.py", line 5347, in showremapped
print(f"{osdname}: {cluster.osds[osdid]['host_name']} =>{sum_to} {sum_data_to_pp} <={sum_from} {sum_data_from_pp}"
KeyError: -1

I'm suspecting this crash could be due to there being undersized & degraded pgs in the cluster that are being remapped and cluster.osds being indexed by -1. If so, this should be pretty easy to fix.

Excerpt from output of "showremapped":
pg 18.38d toofull 223.0G: 4659450 of 4659450, 100.0%, 97->220;221->236;55->77;90->113;124->60;-1->76
pg 18.40a toofull 111.3G: 3874000 of 3874000, 100.0%, 216->72;-1->220;45->162;252->90;46->58;171->130;54->175;186->54;116->112;147->118
pg 18.432 backfill 111.6G: 3881870 of 3881870, 100.0%, -1->72;117->97;139->185;27->240;51->212;175->29;33->95;85->109;239->102;96->44
pg 18.45a backfill 111.4G: 1165623 of 1165623, 100.0%, 92->61;-1->99;114->104

ceph status
cluster:
id: xxx
health: HEALTH_WARN
3 failed cephadm daemon(s)
nodeep-scrub flag(s) set
10 backfillfull osd(s)
19 nearfull osd(s)
Low space hindering backfill (add storage if this doesn't resolve itself): 105 pgs backfill_toofull
Degraded data redundancy: 318827/9214981270 objects degraded (0.003%), 1 pg degraded, 12 pgs undersized
88 pgs not deep-scrubbed in time
1220 pgs not scrubbed in time
5 pool(s) backfillfull
29 slow ops, oldest one blocked for 667 sec, daemons [osd.110,osd.42] have slow ops.

services:
mon: 3 daemons, quorum sm1,sm3,sm2 (age 29h)
mgr: sm2.igewzl(active, since 29h), standbys: sm1.guvysx, sm3.hjkzda
mds: 1/1 daemons up, 2 standby
osd: 259 osds: 259 up (since 16m), 259 in (since 18m); 208 remapped pgs
flags nodeep-scrub

data:
volumes: 1/1 healthy
pools: 14 pools, 3122 pgs
objects: 1.22G objects, 1.8 PiB
usage: 2.4 PiB used, 1.2 PiB / 3.6 PiB avail
pgs: 318827/9214981270 objects degraded (0.003%)
78100572/9214981270 objects misplaced (0.848%)
2554 active+clean
360 active+clean+scrubbing
99 active+remapped+backfilling
96 active+remapped+backfill_toofull
8 active+undersized+remapped+backfill_toofull
3 active+undersized+remapped+backfilling
1 active+undersized+degraded+remapped+backfilling
1 active+remapped+backfill_wait+backfill_toofull

io:
client: 1.6 MiB/s rd, 43 KiB/s wr, 1.15k op/s rd, 17 op/s wr
recovery: 1.7 GiB/s, 1.62k objects/s

@tatuylonen
Copy link
Author

I just noticed there was already another issue on this same problem (#39).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant