Re: discarding an rbd device results in partial zero-filling without any errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 11, 2025 at 8:09 PM Viacheslav Dubeyko
<Slava.Dubeyko@xxxxxxx> wrote:
>
> On Fri, 2025-07-11 at 10:57 -0400, Anthony D'Atri wrote:
> > My sense is that blkdiscard is intended for something different from what you’re intending.  From the man page:
> >
> > DESCRIPTION
> >        blkdiscard is used to discard device sectors.
> > …
> >        -s, --secure
> >            Perform a secure discard. A secure discard is the same as a regular discard except that all copies of the discarded blocks that were possibly created by garbage collection must also be
> >            erased. This requires support from the device.
> >
> >        -z, --zeroout
> >            Zero-fill rather than discard.
> >
> > It’s about sending TRIM commands to the device telling it that SSD or thin-provisioned blocks may be reallocated. Zero-filling or erasing is a different operation.
> >
> > If your intent is to free RADOS pool capacity,`blkdiscard` should do that, or if there’s a filesystem on the RBD device, mount it and run `fstrim`.  Was there a mounted filesystem when you ran the below?
> >
> > If your intent is to erase data, any new clients getting discarded or freed blocks see them thin-provisioned, so any existing old data is not visible to them.
> >
>
> I think I could add here. I am not sure that RBD should support blkdiscard.
> First of all, "Ceph block devices are thin-provisioned, resizable, and store
> data striped over multiple OSDs." (https://docs.ceph.com/en/reef/rbd/). So, it
> means that OSDs could use HDDs, SSDs, or any other type of storage device that
> could not support TRIM command. Even if we are talking about SSD, then not every
> SSD supports TRIM and blkdiscard command will be simply ignored. Also, Ceph is
> based on block replication model and if we have the same block replicated among
> HDDs and SSDs, then how blkdiscard command needs to be treated? Also, thin-
> provisioning implies that some logical space could be not allocated yet to
> physical space. And if you try to issue the blkdiscard to such space what does
> this command mean? So, it's pretty obvious how the blkdiscard command should
> work for SSDs, but it's not clear at all what it means for RBD case.

Hi Slava,

For RBD, discard translates to "free space at the RADOS level (i.e.
in the object store)".  It absolutely should be supported -- otherwise
things like fstrim wouldn't work and RBD images/devices would remain
thin-provisioned only for some (probably not too long) time after their
creation ;)

Whether discard at the RADOS level then gets further translated into
some TRIM or equivalent commands being send to the actual HDDs or SSDs
that back the OSDs is up to the object store backend (e.g. Bluestore)
and is configurable.  See bdev_enable_discard and related options.

Thanks,

                Ilya





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux