Re: squid 19.2.2 - osd_memory_target_autotune - best practices when host has lots of RAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> The servers are dedicated to Ceph
> Yes, it is perhaps too much but my IT philosophy is "there is always room for more RAM" as it usually helps running things faster 

Unless you're a certain Sun model, but I digress...

The $ spent on all that RAM would IMHO have been more effective choosing NVMe SSDs instead of HDDs.  And not having to pay for the HBA.


> Now, since I have it, I would like to use it as efficiently as possible 

That's what the autotuner is all about.

> 
> The 3 NVMEs are 15TB dedicated to OSD - there are 2 more 1.6TB dedicated to DB/WAL
> HDD are 20TB  and SSD are 7TB 
> 
> Is my understanding correct that autotune will dedicated 70% to OSDs indiscriminately ???
> ... or there is some sort of algorithm for differentiating between the disk type and size ?

There is not as far as I know, but honestly you're dramatically into the territory where you have so much that there wouldn't be much to be gained by customizing.  Diminishing returns.


> If NVME is SSD

NVMe is an interface not a medium.  A SATA SSD and an NVMe SSD are the same NAND (at similar cost) with a different interface.

> from autotune perspective, it would probably make sense to tune it manually , no ?

If you like.  See the below page for setting manually on a host-by-host or per-OSD basis. Or disable it and do the math to divide it by device class as you like, though note that if you add or remove OSDs from a given host the system won't adjust without intervention.  Or if you add systems less expensively with less RAM and move some from these to them to even it out.

Since you don't have much else contending for that RAM, you might 

ceph config set mgr/cephadm/autotune_memory_target_ratio   0.930000

which will let the autotuner use even more for OSDs.

> 
> How would I check status of autotune ...other than checking individual OSD config ?

# ceph config dump | grep osd_memory_target
osd                                       host:cephab92  basic     osd_memory_target                          12083522051
osd                                       host:cephac0f  basic     osd_memory_target                          12083519715
osd                                       host:dd13-25   basic     osd_memory_target                          6156072793
osd                                       host:dd13-29   basic     osd_memory_target                          6235526712
osd                                       host:dd13-33   basic     osd_memory_target                          6780274813
osd                                       host:dd13-37   basic     osd_memory_target                          6601357610
osd                                       host:i18-24    basic     osd_memory_target                          6670077087
osd                                       host:i18-28    basic     osd_memory_target                          6600879861
osd                                       host:k18-23    basic     osd_memory_target                          6663117330
osd                                       host:l18-24    basic     osd_memory_target                          6822190406
osd                                       host:l18-28    basic     osd_memory_target                          6782421978
osd                                       host:m18-33    basic     osd_memory_target                          6593523272
osd                                                               advanced  osd_memory_target_autotune                 true

Here the first two hosts have much more RAM than the others, so the autotuner has more to distribute. 

You can game the autotuner in various ways, see https://www.ibm.com/docs/en/storage-ceph/8.0.0?topic=osds-automatically-tuning-osd-memory 
https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/#sections-and-masks

> Many thanks
> 
> Steven
> 
> On Thu, 31 Jul 2025 at 10:43, Anthony D'Atri <aad@xxxxxxxxxxxxxx <mailto:aad@xxxxxxxxxxxxxx>> wrote:
>> IMHO the autotuner is awesome.
>> 
>> 1TB of RAM is an embarrassment of riches -- are these hosts perhaps converged compute+storage?
>> 
>> 
>> 
>> > On Jul 31, 2025, at 10:17 AM, Steven Vacaroaia <stef97@xxxxxxxxx <mailto:stef97@xxxxxxxxx>> wrote:
>> > 
>> > Hi
>> > 
>> > What is the best practice / your expert advice about using
>> > osd_memory_target_autotune
>> > on hosts with lots of RAM  ?
>> > 
>> > My hosts have 1 TB RAM , only 3 NVMEs , 12 HDD and 12 SSD
>> 
>> Remember that NVMe devices *are* SSDs ;)  I'm guessing those are used for WAL+DB offload, and thus you have 24x OSDs per host?
>> 
>> > Should I disable autotune and allocate more RAM?
>> 
>> The autotuner by default will divide 70% of physmem across all the OSDs it finds on a given host, with 30% allocated for the OS and other daemons.  I *think* any RGWs, mons, etc. are assumed to be part of that 30% but am not positive.
>> 
>> > 
>> > I saw some suggestion for 16GB to NVME , 8GB to SSD and 6 to HDD
>> 
>> I personally have a growing sense that more RAM actually can help slower OSDs more, at least with respect to rebalancing without rampant slow ops.  ymmv.
>> 
>> This implies that your NVMe devices are standalone OSDs, so that would mean 27 OSDs per node?  I'm curious what manner of chassis this is.
>> 
>> I then would think that the autotuner would set ~~ 26TB to osd_memory_size, which is ample by any measure.  ~307TB will be available for non-OSD processes.
>> 
>> 
>> If you're running compute or other significant non-Ceph workloads on the same nodes, you can adjust the reservation factor by setting ceph config set mgr mgr/cephadm/autotune_memory_target_ratio xxx.  So if you want to reserve less for non-OSD processes, something like
>> 
>> ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.1
>> 
>> If yo do have hungry compute colocated, a good value might be something like 0.25, which would give each OSD > 9GB for osd_memory_target.  If you do want to allot different amounts to different device classes, you can instead set static values, using central config device class masks.
>> 
>> 
>> 
>> > 
>> > Many thanks
>> > Steven
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux