perf dump: "bluefs": { "db_total_bytes": 88906653696, "db_used_bytes": 11631853568, "wal_total_bytes": 0, "wal_used_bytes": 0, "slow_total_bytes": 9796816207872, "slow_used_bytes": 1881341952, "num_files": 229, "log_bytes": 11927552, "log_compactions": 78, "log_write_count": 281792, "logged_bytes": 1154220032, "files_written_wal": 179, "files_written_sst": 311, "write_count_wal": 280405, "write_count_sst": 29432, "bytes_written_wal": 4015595520, "bytes_written_sst": 15728308224, "bytes_written_slow": 2691231744, "max_bytes_wal": 0, "max_bytes_db": 13012828160, "max_bytes_slow": 3146252288, "alloc_unit_slow": 65536, "alloc_unit_db": 1048576, "alloc_unit_wal": 0, "read_random_count": 1871590, "read_random_bytes": 18959576586, "read_random_disk_count": 563421, "read_random_disk_bytes": 17110012647, "read_random_disk_bytes_wal": 0, "read_random_disk_bytes_db": 11373755941, "read_random_disk_bytes_slow": 5736256706, "read_random_buffer_count": 1313456, "read_random_buffer_bytes": 1849563939, "read_count": 275731, "read_bytes": 4825912551, "read_disk_count": 225997, "read_disk_bytes": 4016943104, "read_disk_bytes_wal": 0, "read_disk_bytes_db": 3909947392, "read_disk_bytes_slow": 106999808, "read_prefetch_count": 274534, "read_prefetch_bytes": 4785141168, "write_count": 591760, "write_disk_count": 591838, "write_bytes": 21062987776, "compact_lat": { "avgcount": 78, "sum": 0.572247346, "avgtime": 0.007336504 }, "compact_lock_lat": { "avgcount": 78, "sum": 0.182746199, "avgtime": 0.002342899 }, "alloc_slow_fallback": 0, "alloc_slow_size_fallback": 0, "read_zeros_candidate": 0, "read_zeros_errors": 0, "wal_alloc_lat": { "avgcount": 0, "sum": 0.000000000, "avgtime": 0.000000000 }, "db_alloc_lat": { "avgcount": 969, "sum": 0.006368060, "avgtime": 0.000006571 }, "slow_alloc_lat": { "avgcount": 39, "sum": 0.004502210, "avgtime": 0.000115441 }, "alloc_wal_max_lat": 0.000000000, "alloc_db_max_lat": 0.000113831, "alloc_slow_max_lat": 0.000301347 }, config show: "bluestore_rocksdb_cf": "true", "bluestore_rocksdb_cfs": "m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} L=min_write_buffer_number_to_merge=32 P=min_write_buffer_number_to_merge=32", "bluestore_rocksdb_options": "compression=kLZ4Compression,max_write_buffer_number=64,min_write_buffer_number_to_merge=6,compaction_style=kCompactionStyleLevel,write_buffer_size=16777216,max_background_jobs=4,level0_file_num_compaction_trigger=8,max_bytes_for_level_base=1073741824,max_bytes_for_level_multiplier=8,compaction_readahead_size=2MB,max_total_wal_size=1073741824,writable_file_max_buffer_size=0", "bluestore_rocksdb_options_annex": "", Dono if it is of any help, but I've compared the config from an OSD not reporting an issues, and there is no difference. ________________________________ From: Enrico Bocchi <enrico.bocchi@xxxxxxx> Sent: Wednesday, May 14, 2025 22:47 To: Kasper Rasmussen <kasper_steengaard@xxxxxxxxxxx>; ceph-users <ceph-users@xxxxxxx> Subject: Re: BLUEFS_SPILLOVER after Reef upgrade Hi Kasper, Would you mind sharing the output of `perf dump` and `config show` from the daemon socket of one of the OSDs reporting blues spillover? I am interested in the bluefs part of the former and in the bluestore_rocksdb options of the latter. The warning about slow ops in bluestore is a different story. There have been several messages on this mailing list recently with suggestions on how to tune the alert threshold. From my experience, they very likely relate to some problem with the underlying storage device, so I'd recommend investigating the root cause rather than simply silencing the warning. Cheers, Enrico ________________________________ From: Kasper Rasmussen <kasper_steengaard@xxxxxxxxxxx> Sent: Wednesday, May 14, 2025 8:22:46 PM To: ceph-users <ceph-users@xxxxxxx> Subject: BLUEFS_SPILLOVER after Reef upgrade I've just upgraded our ceph cluster from pacific 16.2.15 -> Reef 18.2.7 After that I see the warnings: [WRN] BLUEFS_SPILLOVER: 5 OSD(s) experiencing BlueFS spillover osd.110 spilled over 4.5 GiB metadata from 'db' device (8.0 GiB used of 83 GiB) to slow device osd.455 spilled over 1.1 GiB metadata from 'db' device (11 GiB used of 83 GiB) to slow device osd.533 spilled over 426 MiB metadata from 'db' device (10 GiB used of 83 GiB) to slow device osd.560 spilled over 389 MiB metadata from 'db' device (9.8 GiB used of 83 GiB) to slow device osd.597 spilled over 8.6 GiB metadata from 'db' device (7.7 GiB used of 83 GiB) to slow device [WRN] BLUESTORE_SLOW_OP_ALERT: 4 OSD(s) experiencing slow operations in BlueStore osd.410 observed slow operation indications in BlueStore osd.443 observed slow operation indications in BlueStore osd.508 observed slow operation indications in BlueStore osd.593 observed slow operation indications in BlueStore I've tried to run ceph tell osd.XXX compact with no result. Bluefs stats: ceph tell osd.110 bluefs stats 1 : device size 0x14b33fe000 : using 0x202c00000(8.0 GiB) 2 : device size 0x8e8ffc00000 : using 0x5d31d150000(5.8 TiB) RocksDBBlueFSVolumeSelector >>Settings<< extra=0 B, l0_size=1 GiB, l_base=1 GiB, l_multi=8 B DEV/LEV WAL DB SLOW * * REAL FILES LOG 0 B 16 MiB 0 B 0 B 0 B 15 MiB 1 WAL 0 B 18 MiB 0 B 0 B 0 B 6.3 MiB 1 DB 0 B 8.0 GiB 0 B 0 B 0 B 8.0 GiB 140 SLOW 0 B 0 B 4.5 GiB 0 B 0 B 4.5 GiB 78 TOTAL 0 B 8.0 GiB 4.5 GiB 0 B 0 B 0 B 220 MAXIMUMS: LOG 0 B 25 MiB 0 B 0 B 0 B 21 MiB WAL 0 B 118 MiB 0 B 0 B 0 B 93 MiB DB 0 B 8.2 GiB 0 B 0 B 0 B 8.2 GiB SLOW 0 B 0 B 14 GiB 0 B 0 B 14 GiB TOTAL 0 B 8.2 GiB 14 GiB 0 B 0 B 0 B >> SIZE << 0 B 79 GiB 8.5 TiB Help with what to do next will, be much appreciated _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx