在 2025/8/8 15:00, Michal Hocko 写道:
On Fri 08-08-25 09:13:30, Zihuan Zhang wrote:
[...]
However, in practice, we’ve observed cases where tasks appear stuck in
uninterruptible sleep (D state) during the freeze phase — and thus cannot
respond to signals or enter the refrigerator. These tasks are technically
TASK_FREEZABLE, but due to the nature of their sleep state, they don’t
freeze promptly, and may require multiple retry rounds, or cause the entire
suspend to fail.
Right, but that is an inherent problem of the freezer implemenatation.
It is not really clear to me how priorities or layers improve on that.
Could you please elaborate on that?
Thanks for the follow-up.
From our observations, we’ve seen processes like Xorg that are in a
normal state before freezing begins, but enter D state during the freeze
window. Upon investigation,
we found that these processes often depend on other user processes
(e.g., I/O helpers or system services), and when those dependencies are
frozen first, the dependent process (like Xorg) gets stuck and can’t be
frozen itself.
This led us to treat such processes as “hard to freeze” tasks — not
because they’re inherently unfreezable, but because they are more likely
to become problematic if not frozen early enough.
So our model works as follows:
• By default, freezer tries to freeze all freezable tasks in
each round.
• With our approach, we only attempt to freeze tasks whose
freeze_priority is less than or equal to the current round number.
• This ensures that higher-priority (i.e., harder-to-freeze)
tasks are attempted earlier, increasing the chance that they freeze
before being blocked by others.
Since we cannot know in advance which tasks will be difficult to freeze,
we use heuristics:
• Any task that causes freeze failure or is found in D state
during the freeze window is treated as hard-to-freeze in the next
attempt and its priority is increased.
• Additionally, users can manually raise/reduce the freeze
priority of known problematic tasks via an exposed sysfs interface,
giving them fine-grained control.
This doesn’t change the fundamental logic of the freezer — it still
retries until all tasks are frozen — but by adjusting the traversal order,
we’ve observed significantly fewer retries and more reliable success
in scenarios where these D state transitions occur.