Re: [External] Re: Best/Safest way to power off cluster

"Hand, Gerard" <g.hand@xxxxxxxxxxxxxxx> · Thu, 7 Aug 2025 09:03:41 +0000

Thanks for the post Kristaps Cudars.  It was an interesting read and something to keep and eye out for.

This is the script I use to start ceph backup:

echo "Starting Ceph...."

echo "- Clearing OSD Flags"
ceph osd unset noout
ceph osd unset norecover
ceph osd unset norebalance
ceph osd unset nobackfill
ceph osd unset nodown
ceph osd unset pause

for FS in $(ceph fs ls -f json | jq -r '.[] | .name'); do
   echo "- Setting $FS joinable"
   ceph fs set $FS joinable true
done

ceph status

We have 768 OSDs and I haven't had any problem using it.

Regards
Gerard

________________________________
From: Kristaps Cudars <kristaps.cudars@xxxxxxxxx>
Sent: 06 August 2025 19:40
To: Eugen Block <eblock@xxxxxx>
Cc: gagan tiwari <gagan.tiwari@xxxxxxxxxxxxxxxxxx>; ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject: [External]  Re: Best/Safest way to power off cluster

This email originated outside the University. Check before clicking links or attachments.

Read this blog post on how (not) to shut down a Ceph cluster
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.croit.io%2Fblog%2Fhow-not-to-shut-down-a-ceph-cluster&data=05%7C02%7Chandg%40live.lancs.ac.uk%7Cf68ae6b556044e009d4808ddd518f716%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C638901025400003294%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=1A%2FG1rVocxNc7JO9G%2B51A0y9jXTi603dV7b1XxjgphY%3D&reserved=0<https://www.croit.io/blog/how-not-to-shut-down-a-ceph-cluster>

On Wed, 6 Aug 2025 at 21:24, Eugen Block <eblock@xxxxxx> wrote:

> You don't need to stop all the OSDs (or other daemons) manually, just
> shut down the servers (most likely you have services colocated). When
> they boot again, the Ceph daemons will also start automatically
> (they're handled by systemd). I can check tomorrow which steps exactly
> our shutdown procedure consists of when we have planned power outages
> etc.
>
> Zitat von gagan tiwari <gagan.tiwari@xxxxxxxxxxxxxxxxxx>:
>
> > HI Eugen,
> >                       We have 80 osds in the cluster.
> >
> > So,  to stop them  I will need to run the command *ceph orch stop
> osd.<ID>*
> > for all 80 osds one by one.
> > Is there any way to stop all of them in one command ?
> >
> > Also, once all nodes will be back after power maintenance activity ,
> will I
> > need to start all daemons mon , mds , osd,etc
> > ? or will they be up automatically once servers will be up ?
> >
> > Thanks,
> > Gagan
> >
> > On Wed, Aug 6, 2025 at 4:45 PM Eugen Block <eblock@xxxxxx> wrote:
> >
> >> Although SUSE discontinued their product, the procedure is still correct
> >> [0]:
> >>
> >> 1. Tell the Ceph cluster not to mark OSDs as out:
> >>
> >> ceph osd set noout
> >>
> >> 2. Stop daemons and nodes in the following order:
> >>
> >>      Storage clients
> >>
> >>      Gateways, for example NFS Ganesha or Object Gateway
> >>
> >>      Metadata Server
> >>
> >>      Ceph OSD
> >>
> >>      Ceph Manager
> >>
> >>      Ceph Monitor
> >>
> >> 3. If required, perform maintenance tasks.
> >>
> >> 4. Start the nodes and servers in the reverse order of the shutdown
> >> process:
> >>
> >>      Ceph Monitor
> >>
> >>      Ceph Manager
> >>
> >>      Ceph OSD
> >>
> >>      Metadata Server
> >>
> >>      Gateways, for example NFS Ganesha or Object Gateway
> >>
> >>      Storage clients
> >>
> >> 5. Remove the noout flag:
> >>
> >> ceph osd unset noout
> >>
> >> [0]
> >>
> >>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocumentation.suse.com%2Fen-us%2Fses%2F7.1%2Fhtml%2Fses-all%2Fstorage-salt-cluster.html%23sec-salt-cluster-reboot&data=05%7C02%7Chandg%40live.lancs.ac.uk%7Cf68ae6b556044e009d4808ddd518f716%7C9c9bcd11977a4e9ca9a0bc734090164a%7C0%7C0%7C638901025400033362%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=KoPtLGvnWdP%2FD241fRI%2B05EtscEXOHc0sozV7L8YMtw%3D&reserved=0<https://documentation.suse.com/en-us/ses/7.1/html/ses-all/storage-salt-cluster.html#sec-salt-cluster-reboot>
> >>
> >> Zitat von gagan tiwari <gagan.tiwari@xxxxxxxxxxxxxxxxxx>:
> >>
> >> > Hi Guys,
> >> >                   I have recently set-up a production ceph cluster
> which
> >> > consists of 3 monitor nodes and 7 Osd nodes.
> >> >
> >> > There is power maintenance activity scheduled at the data centre
> coming
> >> > weekend and due to that I need to power off all the devices.
> >> >
> >> > Can you please advise me on the safest way to power off all servers?
> >> >
> >> > Should I power off all 7 OSD servers one by one followed by all 3
> monitor
> >> > nodes or vice versa?
> >> >
> >> > Thanks,
> >> > Gagan
> >> > _______________________________________________
> >> > ceph-users mailing list -- ceph-users@xxxxxxx
> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx