Le 2025-08-18 17:00, Gilles Mocellin a écrit :
Le 2025-08-18 16:21, Anthony D'Atri a écrit :
Yes I have zapped all drives before each try...
Did you subsequently check for success with `ceph device ls` and
`lsblk`?
I've found that sometimes the orch zap doesn't succeed fully and one
must manually stop and remove LVMs before the drive can be truly
zapped.
I think yes, but I will retry.
Another thing I've never done before, is using encryption.
Perhaps it adds delays with my config, leading to timeouts...
Does someone know which exact ceph-volume command is launched by such a
spec file, if I want to launch it manually ?
service_type: osd
service_id: throughput_optimized
service_name: osd.throughput_optimized
placement:
host_pattern: '*'
spec:
unmanaged: false
data_devices:
rotational: 1
db_devices:
rotational: 0
encrypted: true
filter_logic: AND
objectstore: bluestore
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
H,
New try, from scratch every device clean (no Physical Volume).
I've found that issue and recomandations :
https://access.redhat.com/solutions/6545511
https://www.ibm.com/docs/en/storage-ceph/8.0.0?topic=80-bug-fixes
So I set a higher timeout for cephadm commands :
ceph config set global mgr/cephadm/default_cephadm_command_timeout 1800
Default was 900.
I see le orchestrator launching ceph-volume commands with a timeout of
1795 (it was 895 before, don't know why it's 5s less than the
config...).
But, I've done that change after creating the OSD spec, so some daemon
have failed/timed out still :
root@fidcl-lyo1-sto-sds-lab-01:~# ceph health detail
HEALTH_WARN Failed to apply 1 service(s): osd.throughput_optimized; 7
failed cephadm daemon(s); noout flag(s) set
[WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s):
osd.throughput_optimized
osd.throughput_optimized: Command timed out on host cephadm deploy
(osd daemon) (default 1800 second timeout)
[WRN] CEPHADM_FAILED_DAEMON: 7 failed cephadm daemon(s)
daemon osd.115 on fidcl-lyo1-sto-sds-lab-01 is in unknown state
daemon osd.24 on fidcl-lyo1-sto-sds-lab-02 is in unknown state
daemon osd.116 on fidcl-lyo1-sto-sds-lab-03 is in unknown state
daemon osd.118 on fidcl-lyo1-sto-sds-lab-04 is in unknown state
daemon osd.23 on fidcl-lyo1-sto-sds-lab-05 is in unknown state
daemon osd.14 on fidcl-lyo1-sto-sds-lab-06 is in unknown state
daemon osd.117 on fidcl-lyo1-sto-sds-lab-07 is in unknown state
As in the RedHat issue, I've launched on every host :
systemctl daemon-reload
systemctl reset-failed
But nothing. Until the end of cephadm OSD spec creation launch.
Then, every thing is normal on ceph health, but I only have 82 OSD up
out of 119 :
root@fidcl-lyo1-sto-sds-lab-01:~# ceph -s
cluster:
id: 46030d0e-7d08-11f0-a50b-246e96bd90a4
health: HEALTH_WARN
noout flag(s) set
services:
mon: 5 daemons, quorum
fidcl-lyo1-sto-sds-lab-01,fidcl-lyo1-sto-sds-lab-02,fidcl-lyo1-sto-sds-lab-03,fidcl-lyo1-sto-sds-lab-05,fidcl-lyo1-sto-sds-lab-04
(age 78m)
mgr: fidcl-lyo1-sto-sds-lab-01.ymlinv(active, since 89m), standbys:
fidcl-lyo1-sto-sds-lab-02.otnpcx, fidcl-lyo1-sto-sds-lab-03.zasagv
osd: 119 osds: 82 up (since 38m), 119 in (since 62m)
flags noout
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 577 KiB
usage: 1.8 TiB used, 90 TiB / 91 TiB avail
pgs: 1 active+clean
The OSDs are only visible in ceph osd ls command and in the dashboard.
Daemons are not started, but PV/VG/LV are created.
I will forget dmcrypt, I think with LVs tags, it's really easier to find
the link between an OSD and it's LVs...
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx