Re: Question about shard placement in erasure code pools

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Soeren,

the EC profile only defines a couple of attributes like k and m, but the actual placement of the chunks is defined by the crush rule for this pool. So you'll have to deal with it anyway at some point. ;-) Especially to ensure that you only have one chunk max per host. I recommend to test those rules with crushtool before applying.

It's good that you plan ahead in case you'll have more datacenters etc., some people forget about that. Because you can't change the EC profile of an existing pool, you can only change the crush rule for that pool. So in this case, you'll always have 6 chunks to distribute (unless you create a new pool and move the data).

In your current setup, you'll have inactive PGs when one DC is down (k=4 means min_size is 5). In such cases one can reduce min_size to 4 temporarily to continue operation. But it should be set back to 5 as soon as a down DC is back. When you get more racks, you can change the rule to have one chunk per rack, which means you'll need at least 6 racks (I recommend to have at least one more failure-domain to be able to recover, otherwise the PGs will be degraded until the down rack is back).

Regards,
Eugen


Zitat von Soeren Malchow <soeren.malchow@xxxxxxxxxxxx>:

Dears,

maybe someon can explain/answer my question.

We are working on setting up a new ceph cluster (underneath a proxmox cluster), the setup is as follows

3 datacenters
3 hosts per datacenter (all in one rack)
8 physical disks per host

This is the profile configutation

ceph osd erasure-code-profile set ec_profile plugin=isa technique=reed_sol_van k=4 m=2 crush-root=default crush-failure-domain=datacenter crush-device-class=nvme crush-osds-per-failure-domain=2 crush-num-failure-domains=3

We have a hierarchy - created with a custom_location_hook that contains


  *
datacenter 01
     *
rack 01
        *
host 01
        *
host 02
        *
host 03
  *
datacenter 02
     *
rack 01
        *
host 01
        *
host 02
        *
host 03
  *
datacenter 03
     *
rack 01
        *
host 01
        *
host 02
        *
host 03

So this means (my understanding) that we are placing 2 shards of data in each datacenter.

We want to make sure that we only have max one share of data per host, and in future if possible also distributed across racks once we should ahve more per DC.

Is the bucket hierarchy automatically taken into account when choosing the OSD ? Or if not, where do i start, with CRUSH rules ? (I am hesitant to manually modify CRUSH rules.

Thanks in advance

Soeren

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux