On Thu 07-08-25 20:14:09, Zihuan Zhang wrote: > The Linux task freezer was designed in a much earlier era, when userspace was relatively simple and flat. > Over the years, as modern desktop and mobile systems have become increasingly complex—with intricate IPC, > asynchronous I/O, and deep event loops—the original freezer model has shown its age. A modern userspace might be more complex or convoluted but I do not think the above statement is accurate or even correct. > ## Background > > Currently, the freezer traverses the task list linearly and attempts to freeze all tasks equally. > It sends a signal and waits for `freezing()` to become true. While this model works well in many cases, it has several inherent limitations: > > - Signal-based logic cannot freeze uninterruptible (D-state) tasks > - Dependencies between processes can cause freeze retries > - Retry-based recovery introduces unpredictable suspend latency > > ## Real-world problem illustration > > Consider the following scenario during suspend: > > Freeze Window Begins > > [process A] - epoll_wait() > │ > ▼ > [process B] - event source (already frozen) > > → A enters D-state because of waiting for B I thought opoll_wait was waiting in interruptible sleep. > → Cannot respond to freezing signal > → Freezer retries in a loop > → Suspend latency spikes > > In such cases, we observed that a normal 1–2ms freezer cycle could balloon to **tens of milliseconds**. > Worse, the kernel has no insight into the root cause and simply retries blindly. > > ## Proposed solution: Freeze priority model > > To address this, we propose a **layered freeze model** based on per-task freeze priorities. > > ### Design > > We introduce 4 levels of freeze priority: > > > | Priority | Level | Description | > |----------|-------------------|-----------------------------------| > | 0 | HIGH | D-state TASKs | > | 1 | NORMAL | regular use space TASKS | > | 2 | LOW | not yet used | > | 4 | NEVER_FREEZE | zombie TASKs , PF_SUSPNED_TASK | > > > The kernel will freeze processes **in priority order**, ensuring that higher-priority tasks are frozen first. > This avoids dependency inversion scenarios and provides a deterministic path forward for tricky cases. > By freezing control or event-source threads first, we prevent dependent tasks from entering D-state prematurely — effectively avoiding dependency inversion. I really fail to see how that is supposed to work to be honest. If a process is running in the userspace then the priority shouldn't really matter much. Tasks will get a signal, freeze themselves and you are done. If they are running in the userspace and e.g. sleeping while not TASK_FREEZABLE then priority simply makes no difference. And if they are TASK_FREEZABLE then the priority doens't matter either. What am I missing? -- Michal Hocko SUSE Labs