Em Thu, 31 Jul 2025 18:13:18 -0600 Jonathan Corbet <corbet@xxxxxxx> escreveu: > dump_struct is one of the longest functions in the kdoc_parser class, > making it hard to read and reason about. Move the definition of the prefix > transformations out of the function, join them with the definition of > "attribute" (which was defined at the top of the file but only used here), > and reformat the code slightly for shorter line widths. > > Just code movement in the end. This patch itself LGTM: Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> but see my notes below: > +struct_prefixes = [ > + # Strip attributes > + (struct_attribute, ' '), > + (KernRe(r'\s*__aligned\s*\([^;]*\)', re.S), ' '), > + (KernRe(r'\s*__counted_by\s*\([^;]*\)', re.S), ' '), > + (KernRe(r'\s*__counted_by_(le|be)\s*\([^;]*\)', re.S), ' '), > + (KernRe(r'\s*__packed\s*', re.S), ' '), > + (KernRe(r'\s*CRYPTO_MINALIGN_ATTR', re.S), ' '), > + (KernRe(r'\s*____cacheline_aligned_in_smp', re.S), ' '), > + (KernRe(r'\s*____cacheline_aligned', re.S), ' '), > + # > + # Unwrap struct_group macros based on this definition: > + # __struct_group(TAG, NAME, ATTRS, MEMBERS...) > + # which has variants like: struct_group(NAME, MEMBERS...) > + # Only MEMBERS arguments require documentation. > + # > + # Parsing them happens on two steps: > + # > + # 1. drop struct group arguments that aren't at MEMBERS, > + # storing them as STRUCT_GROUP(MEMBERS) > + # > + # 2. remove STRUCT_GROUP() ancillary macro. > + # > + # The original logic used to remove STRUCT_GROUP() using an > + # advanced regex: > + # > + # \bSTRUCT_GROUP(\(((?:(?>[^)(]+)|(?1))*)\))[^;]*; > + # > + # with two patterns that are incompatible with > + # Python re module, as it has: > + # > + # - a recursive pattern: (?1) > + # - an atomic grouping: (?>...) > + # > + # I tried a simpler version: but it didn't work either: > + # \bSTRUCT_GROUP\(([^\)]+)\)[^;]*; > + # > + # As it doesn't properly match the end parenthesis on some cases. > + # > + # So, a better solution was crafted: there's now a NestedMatch > + # class that ensures that delimiters after a search are properly > + # matched. So, the implementation to drop STRUCT_GROUP() will be > + # handled in separate. > + # > + (KernRe(r'\bstruct_group\s*\(([^,]*,)', re.S), r'STRUCT_GROUP('), > + (KernRe(r'\bstruct_group_attr\s*\(([^,]*,){2}', re.S), r'STRUCT_GROUP('), > + (KernRe(r'\bstruct_group_tagged\s*\(([^,]*),([^,]*),', re.S), r'struct \1 \2; STRUCT_GROUP('), > + (KernRe(r'\b__struct_group\s*\(([^,]*,){3}', re.S), r'STRUCT_GROUP('), > + # > + # Replace macros > + # > + # TODO: use NestedMatch for FOO($1, $2, ...) matches This comment is actually related to patch 03/12: regex cleanups: If you want to simplify a lot the regular expressions here, the best is to take a look at the NestedMatch class and improve it. There are lots of regular expressions here that are very complex because they try to ensure that something like these: 1. function(<arg1>) 2. function(<arg1>, <arg2>,<arg3>,...) are properly parsed[1], but if we turn it into something that handle (2) as well, we could use it like: match = NestedMatch.search("function", string) # or, alternatively: # match = NestedMatch.search("function($1, $2, $3)", string) if match: arg1 = match.group(1) arg2 = match.group(2) arg3 = match.group(3) or even do more complex changes like: NestedMatch.sub("foo($1, $2)", "new_name($2)", string) A class implementing that will help to transform all sorts of functions and simplify the more complex regexes on kernel-doc. Doing that will very likely simplify a lot the struct_prefixes, replacing it by something a lot more easier to understand: # Nice and simpler set of replacement rules struct_nested_matches = [ ("__aligned", ""), ("__counted_by", ""), ("__counted_by_(be|le)", ""), ... # Picked those from stddef.h macro replacement rules ("struct_group(NAME, MEMBERS...)", "__struct_group(, NAME, , MEMBERS)"), ("struct_group(TAG, NAME, ATTRS, MEMBERS...)", """ __struct_group(TAG, NAME, ATTRS, MEMBERS...) union { struct { MEMBERS } ATTRS; struct __struct_group_tag(TAG) { MEMBERS } ATTRS NAME; } ATTRS"""), ... ] members = trim_private_members(members) for from, to in struct_nested_matches: members = NestedMatch.sub(from, to, members) Granted, wiring this up takes some time and lots of testing - we should likely have some unit tests to catch issues there - but IMO it is worth the effort. - [1] NestedMatch() is currently limited to match function(<args>), as it was written to replace really complex regular expressions with recursive patterns and atomic grouping, that were used only to capture macro calls for: STRUCT_GROUP(...) I might have used instead "import regex", but I didn't want to add the extra dependency of a non-standard Python library at the Kernel build. Thanks, Mauro