Hello,
after having moved 4 ssds to another host (+ the ceph tell hanging issue
- see previous mail), we ran into 241 unknown pgs:
cluster:
id: 1ccd84f6-e362-4c50-9ffe-59436745e445
health: HEALTH_WARN
noscrub flag(s) set
2 nearfull osd(s)
1 pool(s) nearfull
Reduced data availability: 241 pgs inactive
1532 slow requests are blocked > 32 sec
789 slow ops, oldest one blocked for 1949 sec, daemons [osd.12,osd.14,osd.2,osd.20,osd.23,osd.25,osd.3,osd.33,osd.35,osd.50]... have slow ops.
services:
mon: 3 daemons, quorum black1,black2,black3 (age 97m)
mgr: black2(active, since 96m), standbys: black1, black3
osd: 85 osds: 85 up, 82 in; 118 remapped pgs
flags noscrub
rgw: 1 daemon active (admin)
data:
pools: 12 pools, 3000 pgs
objects: 33.96M objects, 129 TiB
usage: 388 TiB used, 159 TiB / 548 TiB avail
pgs: 8.033% pgs unknown
409151/101874117 objects misplaced (0.402%)
2634 active+clean
241 unknown
107 active+remapped+backfill_wait
11 active+remapped+backfilling
7 active+clean+scrubbing+deep
io:
client: 91 MiB/s rd, 28 MiB/s wr, 1.76k op/s rd, 686 op/s wr
recovery: 67 MiB/s, 17 objects/s
This used to be around 700+ unknown, however these 241 are stuck in this
state for more than 1h. Below is a sample of pgs from "ceph pg dump
all | grep unknown"
2.7f7 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0
2.7c7 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0
2.7c2 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0
2.7ab 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0
2.78b 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0
2.788 0 0 0 0 0 0 0 0 0 0 unknown 2020-09-22 19:03:00.694873 0'0 0:0 [] -1 [] -1 0'0 2020-09-22 19:03:00.694873 0'0 2020-09-22 19:03:00.694873 0
2.76e 0
Using ceph pg 2.7f7 query hangs.
We checked and one server did have an incorrect MTU setting (9204
instead of the correct 9000), but that was fixed some hours ago.
Does anyone have a hint on how to find those unknown osds?
Version wise this is 14.2.9:
[20:42:20] black2.place6:~# ceph versions
{
"mon": {
"ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 85
},
"mds": {},
"rgw": {
"ceph version 20200428-923-g4004f081ec (4004f081ec047d60e84d76c2dad6f31e2ac44484) nautilus (stable)": 1
},
"overall": {
"ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)": 91,
"ceph version 20200428-923-g4004f081ec (4004f081ec047d60e84d76c2dad6f31e2ac44484) nautilus (stable)": 1
}
}
>From ceph health detail:
[20:42:58] black2.place6:~# ceph health detail
HEALTH_WARN noscrub flag(s) set; 2 nearfull osd(s); 1 pool(s) nearfull; Reduced data availability: 241 pgs inactive; 1575 slow requests are blocked > 32 sec; 751 slow ops, oldest one blocked for 1986 sec, daemons [osd.12,osd.14,osd.2,osd.20,osd.23,osd.25,osd.3,osd.31,osd.33,osd.35]... have slow ops.
OSDMAP_FLAGS noscrub flag(s) set
OSD_NEARFULL 2 nearfull osd(s)
osd.36 is near full
osd.54 is near full
POOL_NEARFULL 1 pool(s) nearfull
pool 'ssd' is nearfull
PG_AVAILABILITY Reduced data availability: 241 pgs inactive
pg 2.82 is stuck inactive for 6027.042489, current state unknown, last acting []
pg 2.88 is stuck inactive for 6027.042489, current state unknown, last acting []
...
pg 19.6e is stuck inactive for 6027.042489, current state unknown, last acting []
pg 20.69 is stuck inactive for 6027.042489, current state unknown, last acting []
As can be seen, multiple pools are affected even though most missing pgs
are from pool 2.
Best regards,
Nico
--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx