Hi all,
let's say I have two DCs with replicated pools (size 4) and a
tiebreaker MON somewhere else. Is it possible to control the recovery
traffic in case of a host failure?
Both DCs have enough replicas, so in theory it should be possible to
recover within the DC with the failed host, right? This would reduce
the required bandwidth between the DCs (ignoring an entire DC outage
for now). But I don't think it currently works that way, at least
nothing I've seen so far allows this assumption.
I've been scanning the docs for hints, but to no avail yet. Assuming I
didn't miss anything and this is not possible yet, I'm wondering if
the concepts described for the "official" stretch mode [0],[1],[2]
could be adapted:
While in stretch mode, OSDs will connect only to monitors within the
data center in which they are located. OSDs DO NOT connect to the
tiebreaker monitor.
And for MON election:
When using stretch mode, the monitor election strategy must be set
to connectivity. This strategy tracks network connectivity between
the monitors and is used to determine which zone should be favored
when the cluster is in a netsplit scenario.
The inter-OSD connections could be scored as well to limit the
recovery traffic to the local DC. Or has this been already discussed
and ruled out? I would appreciate any insights!
Thanks!
Eugen
[0] https://docs.ceph.com/en/reef/rados/operations/stretch-mode/
[1]
https://docs.ceph.com/en/reef/rados/operations/stretch-mode/#connectivity-monitor-election-strategy
[2] https://docs.ceph.com/en/reef/rados/operations/change-mon-elections/
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx