Re: deadlock when swapping to encrypted swapfile

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 10 Sep 2025, Robert Beckett wrote:

>  > > Yeah, unfortunately we are currently restricted to using a swapfile due to many units already shipped with that.
>  > > We have longer term plans to dynamically allocate the swapfiles as neded based on a new query for estimated size
>  > > required for hibernation etc. Moving to swap partition is just not viable currently.
>  > 
>  > You can try the dm-loop target that I created some times ago. It won't be 
>  > in the official kernel because the Linux developers don't like the idea of 
>  > creating fixed mapping for a file, but it may work better for you. Unlike 
>  > the in-kernel loop driver, the dm-loop driver doesn't allocate memory when 
>  > processing reads and write.
> 
> oh interesting. I hadn't seen that.
> I was discussing a quick idea of potentially adding a new fallocate mode bit to request contiguous non-moveable
> block assignment as it pre-allocates, which filesystems could then implement support for.

You can ask the VFS maintainers about this, but I think they'll reject it.

> Then use the known
> file range with dm-crypt directly instead of going via the block device.
> I guess this is roughly analaguous to that idea.
> 
> I see that dm-loop is very old at this point. Do you know the rationale for rejection?

The reason was that the filesystem developers think that the filesystems 
should have freedom to move the allocated blocks around.

The dm-loop patch sets the flag S_SWAPFILE to prevent that from happening, 
but they don't want more code to use this flag.

> was there any hope to get it included with more work?

No - because they don't like the idea of creating a map of file blocks in 
advance.

One could rework the dm-loop patch to use standard filesystem methods read 
and write, but then it would allocate memory when processing requests and 
it would be unsuitable for swapping.

> If the main objection was regarding file spans that they can't gurantee persist, maybe a new fallocate based
> contrace with the filesystems could aleviate the worries? 
> 
> 
>  > 
>  > Create a swap file on the filesystem, load the dm-loop target on the top 
>  > of that file and then create dm-crypt on the top of the dm-loop target. 
>  > Then, run mkswap and swapon on the dm-crypt device.
>  > 
>  > > I tried halving /sys/module/dm_mod/parameters/swap_bios but it didn't help, which based on your more recent
>  > > reply is not unexpected.
>  > > 
>  > > I have a work around for now, which is to run a userland earlyoom daemon. That seems to get in and oomkill in time.
>  > > I guess another option would be to have the swapfile in a luks encrypted partition, but that equally is not viable for
>  > > steamdeck currently.
>  > > 
>  > > However, I'm still interested in the longer term solution of fixing the kernel so that it can handle scenarios
>  > > like this no matter how ill advised they may be. Telling users not to do something seems like a bad solution :)
>  > 
>  > You would have to rewrite the filesystems not to allocate memory when 
>  > processing reads and writes. I think that this is not feasible.
>  > 
>  > > Do you have any ideas about the unreliable kernel oomkiller stepping in? I definitely fill ram and swap, seems like
>  > > it should be firing.
>  > 
>  > I think that the main problem with the OOM killer is that it sometimes 
>  > doesn't fire for big applications.
> 
> perhaps oom_kill_allocating_task helps in that scenario?
> in this lockup scenario I don't see oomkiller starting at all. It looks like it soft
> locks and never feels the need to step in.
> Perhaps because it sees some (tiny) amount of forward progress with
> some swapout requests completing?

If you are swapping to an encrypted file, it may deadlock even before you 
exhaust the memory and swap.

>  > I think that using userspace OOM killer is appropriate to prevent this 
>  > problem with the kernel OOM killer. 
> 
> Turns out I spoke too soon on the userland earloom daemon being a solution.
> It worked for some patterns, but not others.
> It mostly worked well when swap was either pre-filled with data greater than it's
> threshold so as soon as ram is exhausted it stepped in, or when the allocations are
> sufficiently spaced for it to fill greater than it's threshold without many more
> outstanding swapouts before it gets to evaluate again.
> 
> For now the only really reliable way to work around is to disable memory
> overcommit, but we really don't want to go looking down that route as it 
> will have all sorts of other impacts.

You can allocate a file, use e4defrag to reduce the number of fragments, 
then use the FS_IOC_FIEMAP ioctl to find out the location of the file and 
then use the dm-linear target to map the file - and place encryption and 
swapping on the top of that.

If you have control over the whole device and make sure that no one moves 
the file, it should work.

Note that you can't use device mapper on a block device that has 
filesystem mounted, so you'll have to add one dm-linear device underneath 
the filesystem.

Mikulas





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux