Need a clue about what appears to be a phantom duplicate osd
automagically created/discovered via the upgrade process -- which blocks
the upgrade.
The upgrade process on a known-good 19.2.2 to 19.2.3 proceeded normally
through the mgrs and mons. It upgraded most of the osds, then stopped
with the complaint "Error: UPGRADE_REDEPLOY_DAEMON: Upgrading daemon
osd.1 on host noc3 failed." The roster in the "Daemon Versions" table
on the dashboard looks normal except:
There are two entries for 'osd.1' One of them has the correct version
number, 19.2.2, the other is blank.
The upgrade appears 'stuck'. An attempt to 'resume' resulted in the
same error. The cluster operations are normal with all osds up and in.
The cluster is ipv6. Oddly ceph -s reports:
root@noc1:~# ceph -s
cluster:
id: 406xxxxxxx0f8
health: HEALTH_WARN
Public/cluster network defined, but can not be found on any
host
Upgrading daemon osd.1 on host noc3 failed.
services:
mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 39m)
mgr: noc2.yhyuxd(active, since 4h), standbys: noc3.sybsfb,
noc4.tvhgac, noc1.jtteqg
mds: 1/1 daemons up, 3 standby
osd: 27 osds: 27 up (since 3h), 27 in (since 10d)
data:
volumes: 1/1 healthy
pools: 16 pools, 1809 pgs
objects: 14.77M objects, 20 TiB
usage: 52 TiB used, 58 TiB / 111 TiB avail
pgs: 1808 active+clean
1 active+clean+scrubbing
io:
client: 835 KiB/s rd, 1.0 MiB/s wr, 24 op/s rd, 105 op/s wr
progress:
Upgrade to 19.2.3 (4h)
[============................] (remaining: 4h)
Related log entry:
29/7/25 02:40 PM[ERR]cephadm exited with an error code: 1, stderr:
Non-zero exit code 1 from /usr/bin/docker container inspect --format
{{.State.Status}} ceph-4067126d-01cb-40af-824a-881c130140f8-osd-1
/usr/bin/docker: stdout /usr/bin/docker: stderr Error response from
daemon: No such container: ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd-1
Non-zero exit code 1 from /usr/bin/docker container inspect --format
{{.State.Status}} ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1
/usr/bin/docker: stdout /usr/bin/docker: stderr Error response from
daemon: No such container: ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1
Reconfig daemon osd.1 ... Traceback (most recent call last): File "",
line 198, in _run_module_as_main File "", line 88, in _run_code File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 5581, in File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 5569, in main File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 3051, in command_deploy_from File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 3086, in _common_deploy File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 3106, in _deploy_daemon_container File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 1077, in deploy_daemon File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 765, in create_daemon_dirs File
"/usr/lib/python3.12/contextlib.py", line 144, in __exit__
next(self.gen) File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/cephadmlib/file_utils.py",
line 52, in write_new IsADirectoryError: [Errno 21] Is a directory:
'/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config.new' ->
'/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config' Traceback
(most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py",
line 1145, in _check_daemons self.mgr._daemon_action(daemon_spec,
action=action) File "/usr/share/ceph/mgr/cephadm/module.py", line 2545,
in _daemon_action return self.wait_async( File
"/usr/share/ceph/mgr/cephadm/module.py", line 815, in wait_async return
self.event_loop.get_result(coro, timeout) File
"/usr/share/ceph/mgr/cephadm/ssh.py", line 136, in get_result return
future.result(timeout) File
"/lib64/python3.9/concurrent/futures/_base.py", line 446, in result
return self.__get_result() File
"/lib64/python3.9/concurrent/futures/_base.py", line 391, in
__get_result raise self._exception File
"/usr/share/ceph/mgr/cephadm/serve.py", line 1381, in _create_daemon
out, err, code = await self._run_cephadm( File
"/usr/share/ceph/mgr/cephadm/serve.py", line 1724, in _run_cephadm raise
OrchestratorError( orchestrator._interface.OrchestratorError: cephadm
exited with an error code: 1, stderr: Non-zero exit code 1 from
/usr/bin/docker container inspect --format {{.State.Status}}
ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd-1 /usr/bin/docker: stdout
/usr/bin/docker: stderr Error response from daemon: No such container:
ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd-1 Non-zero exit code 1 from
/usr/bin/docker container inspect --format {{.State.Status}}
ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 /usr/bin/docker: stdout
/usr/bin/docker: stderr Error response from daemon: No such container:
ceph-4067126dXXXXXXXXXXXXXXXXXXX40f8-osd.1 Reconfig daemon osd.1 ...
Traceback (most recent call last): File "", line 198, in
_run_module_as_main File "", line 88, in _run_code File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 5581, in File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 5569, in main File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 3051, in command_deploy_from File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 3086, in _common_deploy File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 3106, in _deploy_daemon_container File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 1077, in deploy_daemon File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/__main__.py",
line 765, in create_daemon_dirs File
"/usr/lib/python3.12/contextlib.py", line 144, in __exit__
next(self.gen) File
"/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/cephadm.1a8853661a9c1798390b8e8d13c27688c1b1327a075745af2ee40ac466f0ac36/cephadmlib/file_utils.py",
line 52, in write_new IsADirectoryError: [Errno 21] Is a directory:
'/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config.new' ->
'/var/lib/ceph/4067126dXXXXXXXXXXXXXXXXXXX40f8/osd.1/config'
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx