On Thu, Sep 11, 2025 at 6:00 AM James Clark <james.clark@xxxxxxxxxx> wrote: > > > > On 10/09/2025 4:00 pm, Ian Rogers wrote: > > On Wed, Sep 10, 2025 at 4:14 AM James Clark <james.clark@xxxxxxxxxx> wrote: > >> > >> On 28/08/2025 9:59 pm, Ian Rogers wrote: > >>> Mirroring similar work for software events in commit 6e9fa4131abb > >>> ("perf parse-events: Remove non-json software events"). These changes > >>> migrate the legacy hardware and cache events to json. With no hard > >>> coded legacy hardware or cache events the wild card, case > >>> insensitivity, etc. is consistent for events. This does, however, mean > >>> events like cycles will wild card against all PMUs. A change doing the > >>> same was originally posted and merged from: > >>> https://lore.kernel.org/r/20240416061533.921723-10-irogers@xxxxxxxxxx > >>> and reverted by Linus in commit 4f1b067359ac ("Revert "perf > >>> parse-events: Prefer sysfs/JSON hardware events over legacy"") due to > >>> his dislike for the cycles behavior on ARM with perf record. Earlier > >>> patches in this series make perf record event opening failures > >>> non-fatal and hide the cycles event's failure to open on ARM in perf > >>> record, so it is expected the behavior will now be transparent in perf > >>> record on ARM. perf stat with a cycles event will wildcard open the > >>> event on all PMUs. > >> > >> Hi Ian, > >> > >> Briefly testing perf record and perf stat seem to work now. i.e "perf > >> record -e cycles" doesn't fail and just skips the uncore cycles event. > >> And "perf stat" now includes the uncore cycles event which I think is > >> harmless. > > > > Thanks for confirming this. > > > >> But there are a few perf test failures. For example "test event parsing": > >> > >> evlist after sorting/fixing: 'arm_cmn_0/cycles/,{cycles,cache- > >> misses,branch-misses}' > >> FAILED tests/parse-events.c:1589 wrong number of entries > >> Event test failure: test 57 '{cycles,cache-misses,branch- > >> misses}:e'running test 58 'cycles/name=name/' > > > > I suspect the easiest fix for this is to change "cycles" to the > > "cpu-cycles" legacy hardware event for this test. The test has always > > had issues on ARM due to hardcoded expectations of the core PMU being > > "cpu". > > > >> The tests "Perf time to TSC" and "Use a dummy software event to keep > >> tracking" are using libperf to open the cycles event as a sampling event > >> which now fails. It seems like we've fixed Perf record to ignore this > >> failure, but we didn't think about libperf until now. > > > > I'm not clear on the connection here. libperf doesn't do event parsing > > and so there are no changes in tools/lib/perf. If a test has an > > expectation that "cycles" is a core event, again we can change it to > > "cpu-cycles" as a workaround for ARM. As "cycles" will wildcard now, > > we don't want that behavior in say API probing as we'll end up never > > lazily processing the PMUs. That code has been altered in these > > changes to specify the core PMU. For tests it is less of an issue and > > so the changes are more limited. > > > > Thanks, > > Ian > > Sure makes sense if there's an easy fix for the tests, we can do that. I > suppose the main reason I mentioned it was that the tests might be > highlighting that other genuine non-Perf and non-test users would see > the same breakage though. For a non-perf user to see a perf change they must transitively depend on perf to care. I think the complaint is that we've gone from 1 event (ignoring BIG.little/hybrid) to possibly many, particularly on ARM. What I'm thinking is we should have something like: #if defined(__aarch64__) || defined(__arm__) #define HW_CYCLES_STR "cpu-cycles" #else #define HW_CYCLES_STR "cycles" #endif and remove all use of just raw "cycles" in the code to use this #define. This should avoid the >1 event issue on ARM in things like tests. It does cause a new problem if the evsel->name is assumed to be cycles, which is something that can happen a lot in shell scripts. Perhaps all those use-cases should switch to specifying a PMU, which would be a good thing performance wise to avoid scanning lots of PMUs. I'll add somethings to v4 to do a mix of this. Thanks, Ian