Hi *,
an unexpected issue occurred today, at least twice, so it seems kind
of reproducable. I've been preparing a demo in a (virtual) lab cluster
(19.2.2) and wanted to drain multiple hosts. The first time I didn't
pay much attention, but the draining seemed stuck (kind of a common
issue these days), so I intervened and cleaned up until I got into a
healthy state, all good. Then I did my thing, changed the crush tree,
added the removed hosts again, cephadm created the OSDs, backfill
finished successfully.
Now I wanted to reset the cluster again to my starting point, so I
issued the drain command again for multiple hosts (each host has 2
OSDs):
# for i in {5..8}; do ceph orch host drain host$i; done
This time all OSDs were drained successfully (I watched 'ceph orch osd
rm status'), so I wanted to remove the hosts, but it failed:
# for i in {5..8}; do ceph orch host rm host$i --rm-crush-entry; done
Removed host 'host5'
Removed host 'host6'
Removed host 'host7'
Error EINVAL: Not allowed to remove host8 from cluster. The following
daemons are running in the host:
type id
-------------------- ---------------
osd 6
Please run 'ceph orch host drain host8' to remove daemons from host
But there was nothing to drain anymore, osd.6 was already successfully
removed from the crush tree. But on host8 there was still a daemon I
had to clean up manually:
host8:~ # cephadm rm-daemon --name osd.6 --fsid
543967bc-e586-32b8-bd2c-2d8b8b168f02 --force
I compared the cephadm.log files (3 out of 4 to-be-drained hosts were
successfully drained) and on host8 the command rm-daemon was never
executed (until I did manually). Is this a known issue? It doesn't
seem to happen with only on host, at least I didn't notice in the
past. Should I create a tracker for this?
Thanks,
Eugen
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx