Re: [RFC PATCH v2] cgroup: Track time in cgroup v2 freezer

Michal Koutný <mkoutny@xxxxxxxx> · Thu, 17 Jul 2025 14:56:13 +0200

Hello Tiffany.

On Sun, Jul 13, 2025 at 10:00:09PM -0700, Tiffany Yang <ynaffit@xxxxxxxxxx> wrote:

> Other sources of delay can cause similar issues, but this change focuses
> on allowing frozen time to be accounted for in particular because of how
> large it can grow and how unevenly it can affect applications running on
> the system.

I'd like to incorporate the reason from your other mail:
| Since there isn't yet a clear way to identify a set of "lost" time
| that everyone (or at least a wider group of users) cares about, it
| seems like iterating over components of interest is the best way 
into this commit message (because that's a stronger ponit that your use
case alone).

> Any feedback would be much appreciated!

I can see benefits of this new stat field conceptually, I have some
remarks to implementation and suggestions to conventions below.

> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1018,6 +1018,14 @@ All cgroup core files are prefixed with "cgroup."
>  	it's possible to delete a frozen (and empty) cgroup, as well as
>  	create new sub-cgroups.
>  
> +  cgroup.freeze.stat

With the given implementation (and use scenario), this'd better exposed
in
  cgroup.freeze.stat.local

I grok the hierarchical summing would make little sense and it'd make
implementaion more complex. With that I'm thinking about formulation:

	Cumulative time that cgroup has spent between freezing and
	thawing, regardless of whether by self or ancestor cgroups. NB
	(not) reaching "frozen" state is not accounted here.

> +	A read-only flat-keyed file which exists in non-root cgroups.
> +	The following entry is defined:
> +
> +	  freeze_time_total_ns
> +		Cumulative time that this cgroup has spent in the freezing
> +		state, regardless of whether or not it reaches "frozen".
> +

Rather use microseconds, it's the cgroup API convention and I'm not
sure nanosecods exposed here are the needed precision.

       1    _____
frozen 0 __/     \__
          ab    cd

Yeah, I find the mesurent between a and c the sanest.

> +static int cgroup_freeze_stat_show(struct seq_file *seq, void *v)
> +{
> +	struct cgroup *cgrp = seq_css(seq)->cgroup;
> +	u64 freeze_time = 0;
> +
> +	spin_lock_irq(&css_set_lock);
> +	if (test_bit(CGRP_FREEZE, &cgrp->flags))
> +		freeze_time = ktime_get_ns() - cgrp->freezer.freeze_time_start_ns;
> +
> +	freeze_time += cgrp->freezer.freeze_time_total_ns;
> +	spin_unlock_irq(&css_set_lock);

I don't like taking this spinlock only for the matter of reading this
attribute. The intention should be to keep the (un)freezeing mostly
unaffected at the expense of these readers (seqcount or u64 stats?).

Alternative approach: either there's outer watcher who can be notified
by cgroup.events:frozen or it's an inner watcher who couldn't actively
read the field anyway. So the field could only show completed
freeze/thaw cycles from the past (i.e. not substitute clock_gettime(2)
when the cgroup is frozen), which could simplify querying the flag too.

> @@ -5758,6 +5780,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name,
>  	 * if the parent has to be frozen, the child has too.
>  	 */
>  	cgrp->freezer.e_freeze = parent->freezer.e_freeze;
> +	cgrp->freezer.freeze_time_total_ns = 0;

struct cgroup is kzalloc'd, this is unnecessary

Attachment:
signature.asc

Description: PGP signature