Re: Network traffic with failure domain datacenter

Peter Linder <peter.linder@xxxxxxxxxxxxxx> · Thu, 8 May 2025 21:14:35 +0200

There is also the issue that if you have a 4+8 EC pool, you ideally need 
at least 4+8 of whatever your failure domain is, in this case DCs. This 
is more than most people have.

Is this k=4, m=8? What is the benefit of this compared to an ordinary 
replicated pool with 3 copies?

Even if you set the failure domain to, say rack, there is no guarantee 
that there is no PG with more than 8 parts in a single DC without some 
crushmap trickery.

If this is k=8, m=4, then only 4 failures can be handled and there is no 
way to split 12 parts so that both DCs contain 4 or less at the same time.

You really need 3 DCs and a fast, highly available network in between.

/Peter

Den 2025-05-08 kl. 17:45, skrev Anthony D'Atri:
To be pedantic … backfill usually means copying data in toto, so like normal write replication it necessarily has to traverse the WAN.

Recovery of just a lost shard/replica in theory with the LRC plugin, but as noted that doesn’t seem like a good choice.  With the default EC plugin, there *may* be some read locality preference but it’s not something I would bank on.

Stretch clusters are great when you need zero RPO when you really need a single cluster and can manage client endpoint use accordingly.  But with tradeoffs, in many cases two clusters with async replication can be a better solution, depends on needs and what you’re solving for.

On May 7, 2025, at 5:06 AM, Janne Johansson <icepic.dz@xxxxxxxxx> wrote:

Den ons 7 maj 2025 kl 10:59 skrev Torkil Svensgaard <torkil@xxxxxxxx>:
We are looking at a cluster split between two DCs with the DCs as
failure domains.

Am I right in assuming that any recovery or backfill taking place should
largely happen inside each DC and not between them? Or can no such
assumptions be made?
Pools would be EC 4+8, if that matters.
Unless I am mistaken, the first/primary of each PG is the one "doing"
the backfills, so if the primaries are evenly distributed between the
sites, the source of all backfills would be in the remote DC in 50% of
the cases.
I do not think the backfills are going to calculate how it can use
only "local" pieces to rebuild a missing/degraded PG piece without
going over the DC-DC link even if it is theoretically possible.

--
May the most significant bit of your life be positive.
It’s good to be 8-bit-clean, if you aren’t , then Kermit can compensate.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx