On 2025-06-15 20:35, Greg KH wrote: > On Sat, Jun 14, 2025 at 07:53:34AM -0700, Marc Herbert wrote: >>> the kernel relies on this not being "optimized away" by the compiler >>> in many places. >> >> I think "undefined behavior" is the more general topic, more important >> than null pointer checks specifically? > > Is this really "undefined behaviour"? This is apparently debatable: https://stackoverflow.com/questions/26906621/does-struct-name-null-b-cause-undefined-behaviour-in-c11 Also note: https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html Dereferencing a NULL Pointer: contrary to popular belief, dereferencing a null pointer in C is undefined. [...] NULL pointer dereferences being undefined enables a broad range of optimizations [...] This significantly punishes scheduling and other optimizations. In C-based languages, NULL being undefined enables a large number of simple scalar optimizations that are exposed as a result of macro expansion and inlining. > There are a lot of things that the kernel requires for a compiler to > be able to build it, and this is one of those things, it can't do this > type of "optimization" and expect the output to actually work > properly. According to page 2, this type of optimizations exists for a reason and makes a real impact https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html While this is intentionally a simple and contrived example, this sort of thing happens all the time with inlining: inlining a function often exposes a number of secondary optimization opportunities. This means that if the optimizer decides to inline a function, a variety of local optimizations can kick in, which change the behavior of the code. This is both perfectly valid according to the standard, and _important for performance in practice_. [emphasis mine] In other words, by turning this off unconditionally at the global level, the kernel could actually lose (surprise!) some performance. > Again, that's not the issue here. The issue is that we rely on this > type of optimization to not happen in order to work properly. I'm interested in examples where this deviation is actually required, if anyone can think of some from the top of their head. But in this particular case it is not required because it's trivial and enough to swap the two lines and check the pointer first. Even if the language lawyers eventually agree that this particular case is not UB (for instance: because it does not dereference "for real"), I still miss the value of involving lawyers (or tripping some analyzers) at all in situations where this can be avoided so easily. There are plenty enough complex situations already, no need for even more C torture :-) > So no need to "fix" anything here except perhaps the compiler for not > attempting to do foolish things like this :) It looks foolish when assuming that C is some sort of low-level language a.k.a. "portable assembly", which it stopped being a long time ago; for performance reasons: https://queue.acm.org/detail.cfm?id=3212479 https://stefansf.de/post/pointers-are-more-abstract-than-you-might-expect/ Does this mean C has evolved into some hybrid monster good at neither low-level nor high-level stuff? I think yes. >>> If "tooling" trips over stuff like this, then we should fix the tooling >> >> Because of its old age, many quirks and limitations, C needs and has a >> pretty large number of external "tools": static and run-time analyzers, >> coding rules (CERT, MISRA,...) and what not. It's not realistic to "fix" >> them all so they all "support" undefined behaviors like this one. > > If they wish to analize Linux, then yes, they do need to be fixed to > recognize that this is not an issue for us. There is no requirement > that we have that _all_ tools must be able to parse our source code. Not sure why I wrote "all", this should have been "any". Undefined behavior is in the "Most Wanted" list of pretty much all these tools for obvious reasons. Most these tools want to make C _in general_ less broken; not any specific C project in particular. If some projects prefer being left out and use some project-specific C dialect instead then it's their loss.