On 2025/8/23 3:32, Tiffany Yang wrote: > Hi Chen, > > Thanks again for taking a look! > > Chen Ridong <chenridong@xxxxxxxxxxxxxxx> writes: > >> On 2025/8/22 14:14, Chen Ridong wrote: > > >>> On 2025/8/22 9:37, Tiffany Yang wrote: >>>> There isn't yet a clear way to identify a set of "lost" time that >>>> everyone (or at least a wider group of users) cares about. However, >>>> users can perform some delay accounting by iterating over components of >>>> interest. This patch allows cgroup v2 freezing time to be one of those >>>> components. > >>>> Track the cumulative time that each v2 cgroup spends freezing and expose >>>> it to userland via a new local stat file in cgroupfs. Thank you to >>>> Michal, who provided the ASCII art in the updated documentation. > >>>> To access this value: >>>> $ mkdir /sys/fs/cgroup/test >>>> $ cat /sys/fs/cgroup/test/cgroup.stat.local >>>> freeze_time_total 0 > >>>> Ensure consistent freeze time reads with freeze_seq, a per-cgroup >>>> sequence counter. Writes are serialized using the css_set_lock. > > ... > >>>> spin_lock_irq(&css_set_lock); >>>> - if (freeze) >>>> + write_seqcount_begin(&cgrp->freezer.freeze_seq); >>>> + if (freeze) { >>>> set_bit(CGRP_FREEZE, &cgrp->flags); >>>> - else >>>> + cgrp->freezer.freeze_start_nsec = ts_nsec; >>>> + } else { >>>> clear_bit(CGRP_FREEZE, &cgrp->flags); >>>> + cgrp->freezer.frozen_nsec += (ts_nsec - >>>> + cgrp->freezer.freeze_start_nsec); >>>> + } >>>> + write_seqcount_end(&cgrp->freezer.freeze_seq); >>>> spin_unlock_irq(&css_set_lock); > > >>> Hello Tiffany, > >>> I wanted to check if there are any specific considerations regarding how we should input the ts_nsec >>> value. > >>> Would it be possible to define this directly within the cgroup_do_freeze function rather than >>> passing it as a parameter? This approach might simplify the implementation and potentially improve >>> timing accuracy when it have lots of descendants. > > >> I revisited v3, and this was Michal's point. >> p >> / | \ >> 1 ... n >> When we freeze the parent group p, is it expected that all descendant cgroups (1 to n) should share >> the same frozen timestamp? > > > Yes, this is the expectation from the current change. I understand your > concern about the accuracy of this measurement (especially when there > are many descendants), but I agree with Michal's point that the time to > traverse the descendant cgroups is basically noise relative to the > quantity we're trying to measure here. > >> If the cgroup tree structure is stable, the exact frozen time may not be really matter. However, if >> the tree is not stable, obtaining the same frozen time is acceptable? > > I'm a little unclear as to what you mean about when the cgroup tree is > unstable. In the case where a new descendant of p is being created, I > believe the cgroup_mutex prevents that from happening at the same time > as we are freezing p's other descendants. If it won the race, was > created unfrozen under p, and then became frozen during cgroup_freeze, > it would have the same timestamp as the other descendants. If it lost > the race and was created as a frozen cgroup under p, it would get its > own timestamp in cgroup_create, so its freezing duration would be > slightly less than that of the others in the hierarchy. Both values > would be acceptable for our purposes, but if there was a different case > you had in mind, please let me know! > > Thanks, What I mean by "stable" is that while cgroup 1 through n might be deleted or have more descendants created. For example: n n-1 n-2 ... 1 frozen a a+1 a+2 a+n unfozen b b+1 b+2 ... b+n nsec b-a ... In this case, all frozen_nsec values are b - a, which I believe is correct. However, consider a scenario where some cgroups are deleted: n n-1 n-2 ... 1 frozen a a+1 a+2 a+n // 2 ... n-1 are deleted. unfozen b b+1 Here, the frozen_nsec for cgroup n would be b - a, but for cgroup 1 it would be (b + 1) - (a + n). This could introduce some discrepancy / timing inaccuracies. -- Best regards, Ridong