Re: [PATCH v3 RESEND] ext4: clear extent index structure after file delete

"Theodore Ts'o" <tytso@xxxxxxx> · Thu, 4 Sep 2025 08:45:35 -0400

On Wed, Sep 03, 2025 at 04:30:27AM -0700, Nicolas Bretz wrote:
> The extent index structure in the top inode is not being cleared after a file
> is deleted, which leaves the path to the data blocks intact. This patch clears
> this extent index structure.
> 
> Extent structures are already being cleared, so this also makes the
> behavior consistent between extent and extent _index_ structures.

Actually, if we are going to make things consistent, we would be *not*
be clearing the extent leaf blocks if we are deleting the file ---
when possible.

Clearing the extent structures was never for security concerns.  The
reality is that removing the pointers to the data blocks is security
theater (e.g., like the TSA in US airports).  It makes people feel
good, but programs like photorec can be used to find the data blocks.
If they really want to securly delete a file, they should use shred or
wipe to overwrite the datablocks before deleting the file.

[1] https://www.cgsecurity.org/wiki/photoRec

The reason why we wipe the extent structures is because when
journalling is enabled, a file truncation or deletion might not fit in
a single journal transaction, and might need to span two transactions.
For that reason, we put the inode on the orphan list, and then if the
operation doesn't fit in a single transaction, we need to keep the
file system in a consistent state at each transaction boundary.  So
that's why we zero out the extent structures as we go; so if we need
to pause the truncation so we can do a journal commit, the data block
pointers to the blocks that have been released are properly zeroed.

Now, if we know that all of the blocks in an extent leaf block can be
released in the current transaction, we could omit zeroing the leaf
block --- so long as we can drop the pointer to the leaf block in the
parent index block.  This also has the benefit that if we don't need
to modify the extent leaf block, we save two 4k writes to the disk ---
one in the journal and one in the extent leaf block, which would
improve the performance of an "rm -rf" workload.

The reason why we haven't done this is that the benefits aren't that
big, and so we haven't gotten around toit.  But if you are interested
in looking into it, if we can keep the code complexity down and avoid
impacting the maintainability of the code base feel free to take a
look at it.

					- Ted

P.S.  A related project would be adding support for the "secure
deletion" flag (see the chattr man page), which is currently
unimplemented.  The tricky bit is (a) we can't zero the blocks until
the transaction releasing the blockshas been commited, and (b) we need
to avoid a race where a block being zero gets reallocated since we
don't want to zero out data belonging to a newly realocated data block
that has been associated with an in-use inode.  The marginal utility
is a bit small, since userspace tools like shred and wipe already
exist, which is why no one has actually implemented to date.  But if
you're looking for an fun/interesting project, it's a possibility.