> The servers are dedicated to Ceph > Yes, it is perhaps too much but my IT philosophy is "there is always room for more RAM" as it usually helps running things faster Unless you're a certain Sun model, but I digress... The $ spent on all that RAM would IMHO have been more effective choosing NVMe SSDs instead of HDDs. And not having to pay for the HBA. > Now, since I have it, I would like to use it as efficiently as possible That's what the autotuner is all about. > > The 3 NVMEs are 15TB dedicated to OSD - there are 2 more 1.6TB dedicated to DB/WAL > HDD are 20TB and SSD are 7TB > > Is my understanding correct that autotune will dedicated 70% to OSDs indiscriminately ??? > ... or there is some sort of algorithm for differentiating between the disk type and size ? There is not as far as I know, but honestly you're dramatically into the territory where you have so much that there wouldn't be much to be gained by customizing. Diminishing returns. > If NVME is SSD NVMe is an interface not a medium. A SATA SSD and an NVMe SSD are the same NAND (at similar cost) with a different interface. > from autotune perspective, it would probably make sense to tune it manually , no ? If you like. See the below page for setting manually on a host-by-host or per-OSD basis. Or disable it and do the math to divide it by device class as you like, though note that if you add or remove OSDs from a given host the system won't adjust without intervention. Or if you add systems less expensively with less RAM and move some from these to them to even it out. Since you don't have much else contending for that RAM, you might ceph config set mgr/cephadm/autotune_memory_target_ratio 0.930000 which will let the autotuner use even more for OSDs. > > How would I check status of autotune ...other than checking individual OSD config ? # ceph config dump | grep osd_memory_target osd host:cephab92 basic osd_memory_target 12083522051 osd host:cephac0f basic osd_memory_target 12083519715 osd host:dd13-25 basic osd_memory_target 6156072793 osd host:dd13-29 basic osd_memory_target 6235526712 osd host:dd13-33 basic osd_memory_target 6780274813 osd host:dd13-37 basic osd_memory_target 6601357610 osd host:i18-24 basic osd_memory_target 6670077087 osd host:i18-28 basic osd_memory_target 6600879861 osd host:k18-23 basic osd_memory_target 6663117330 osd host:l18-24 basic osd_memory_target 6822190406 osd host:l18-28 basic osd_memory_target 6782421978 osd host:m18-33 basic osd_memory_target 6593523272 osd advanced osd_memory_target_autotune true Here the first two hosts have much more RAM than the others, so the autotuner has more to distribute. You can game the autotuner in various ways, see https://www.ibm.com/docs/en/storage-ceph/8.0.0?topic=osds-automatically-tuning-osd-memory https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/#sections-and-masks > Many thanks > > Steven > > On Thu, 31 Jul 2025 at 10:43, Anthony D'Atri <aad@xxxxxxxxxxxxxx <mailto:aad@xxxxxxxxxxxxxx>> wrote: >> IMHO the autotuner is awesome. >> >> 1TB of RAM is an embarrassment of riches -- are these hosts perhaps converged compute+storage? >> >> >> >> > On Jul 31, 2025, at 10:17 AM, Steven Vacaroaia <stef97@xxxxxxxxx <mailto:stef97@xxxxxxxxx>> wrote: >> > >> > Hi >> > >> > What is the best practice / your expert advice about using >> > osd_memory_target_autotune >> > on hosts with lots of RAM ? >> > >> > My hosts have 1 TB RAM , only 3 NVMEs , 12 HDD and 12 SSD >> >> Remember that NVMe devices *are* SSDs ;) I'm guessing those are used for WAL+DB offload, and thus you have 24x OSDs per host? >> >> > Should I disable autotune and allocate more RAM? >> >> The autotuner by default will divide 70% of physmem across all the OSDs it finds on a given host, with 30% allocated for the OS and other daemons. I *think* any RGWs, mons, etc. are assumed to be part of that 30% but am not positive. >> >> > >> > I saw some suggestion for 16GB to NVME , 8GB to SSD and 6 to HDD >> >> I personally have a growing sense that more RAM actually can help slower OSDs more, at least with respect to rebalancing without rampant slow ops. ymmv. >> >> This implies that your NVMe devices are standalone OSDs, so that would mean 27 OSDs per node? I'm curious what manner of chassis this is. >> >> I then would think that the autotuner would set ~~ 26TB to osd_memory_size, which is ample by any measure. ~307TB will be available for non-OSD processes. >> >> >> If you're running compute or other significant non-Ceph workloads on the same nodes, you can adjust the reservation factor by setting ceph config set mgr mgr/cephadm/autotune_memory_target_ratio xxx. So if you want to reserve less for non-OSD processes, something like >> >> ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.1 >> >> If yo do have hungry compute colocated, a good value might be something like 0.25, which would give each OSD > 9GB for osd_memory_target. If you do want to allot different amounts to different device classes, you can instead set static values, using central config device class masks. >> >> >> >> > >> > Many thanks >> > Steven >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx