On Tue, Sep 2, 2025 at 2:38 PM Matthieu Baerts <matttbe@xxxxxxxxxx> wrote: > > 2 Sept 2025 23:18:56 Catalin Marinas <catalin.marinas@xxxxxxx>: > > > On Tue, Sep 02, 2025 at 08:50:19PM +0200, Matthieu Baerts wrote: > >> Hi Catalin, > >> > >> 2 Sept 2025 20:25:19 Catalin Marinas <catalin.marinas@xxxxxxx>: > >> > >>> On Tue, Sep 02, 2025 at 08:27:59AM -0700, Jakub Kicinski wrote: > >>>> On Tue, 2 Sep 2025 16:51:47 +0200 Matthieu Baerts wrote: > >>>>> It is unclear why a second scan is needed and only the second one caught > >>>>> something. Was it the same with the strange issues you mentioned in > >>>>> driver tests? Do you think I should re-add the second scan + cat? > >>>> > >>>> Not sure, cc: Catalin, from experience it seems like second scan often > >>>> surfaces issues the first scan missed. > >>> > >>> It's some of the kmemleak heuristics to reduce false positives. It does > >>> a checksum of the object during scanning and only reports a leak if the > >>> checksum is the same in two consecutive scans. > >> > >> Thank you for the explanation! > >> > >> Does that mean a scan should be triggered at the end of the tests, > >> then wait 5 second for the grace period, then trigger another scan > >> and check the results? > >> > >> Or wait 5 seconds, then trigger two consecutive scans? > > > > The 5 seconds is the minimum age of an object before it gets reported as > > a leak. It's not related to the scanning process. So you could do two > > scans in succession and wait 5 seconds before checking for leaks. > > > > However, I'd go with the first option - do a scan, wait 5 seconds and do > > another. That's mostly because at the end of the scan kmemleak prints if > > it found new unreferenced objects. It might not print the message if a > > leaked object is younger than 5 seconds. In practice, though, the scan > > may take longer, depending on how loaded your system is. > > > > The second option works as well but waiting between them has a better > > chance of removing false positives if, say, some objects are moved > > between lists and two consecutive scans do not detect the list_head > > change (and update the object's checksum). > > Thank you for this very nice reply, that's very clear! > > I will then adapt our CI having CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF > to do a manual scan at the very end, wait 5 seconds and do another. FWIW - I am able to pretty reliably reproduce the kmemleak. However, I also tried adding an inline kmemleak scan to the test harness (did it once with, once without a sleep). When I do that the kmemleak disappears :-) (not saying that adding the scan isn't useful, just pointing out that this particular leak seems to be related to how quickly we iterate over the testcases) Christoph