Re: [PATCH 10/12] docs: kdoc: further rewrite_struct_members() cleanup

Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> · Wed, 6 Aug 2025 11:05:38 +0200

Em Tue, 05 Aug 2025 16:46:10 -0600
Jonathan Corbet <corbet@xxxxxxx> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx> writes:
> 
> > Perhaps one alternative would do something like:
> >
> > 	tuples = struct_members.findall(members)
> >         if not tuples:
> >             break
> >
> > 	maintype, -, -, content, -, s_ids = tuples
> >
> > (assuming that we don't need t[1], t[2] and t[4] here)
> >
> > Btw, on this specific case, better to use non-capture group matches
> > to avoid those "empty" spaces, e.g. (if I got it right):  
> 
> The problem is this line here:
> 
>                 oldmember = "".join(t) # Reconstruct the original formatting
> 
> The regex *has* to capture the entire match string so that it can be
> reconstructed back to its original form, which we need to edit the full
> list of members later on.
> 
> This code could use a deep rethink, but it works for now :)

well, we can still do:

	for t in tuples:
	    maintype, -, -, content, -, s_ids = t
	    oldmember = "".join(t)

this way, we'll be naming the relevant parameters and reconstructing
the the original form.

IMO, this is a lot better than using t[0], t[3], t[5] at the code,
as the names makes it clear what each one actually captured.

-

Btw, while re.findall() has an API that doesn't return match
objects which is incoherent with the normal re API, while looking
at the specs today(*), there is an alternative: re.finditer(). 
We could add it to KernRE cass and use it on a way that it will use
a Match instance. Something like:

	# Original regex expression
	res = Re.finditer(...)

	# Not much difference here. Probably not worh using it
	for match in res:
	    oldmember = "".join(match.groups())
            maintype, -, -, content, -, s_ids = match.groups()

Or alternatively:

	res = Re.finditer(...)

	# Not much difference here. Probably not worth using it
	for match in res:
	    oldmember = "".join(match.groups())

		# replace at the code below:
		#	maintype -> match.group('maintype')
		#	content -> match.group('content')
		#	s_ids -> match.group('s_ids')

No idea about performance differences between findall and finditer.
(*) https://docs.python.org/3/library/re.html

btw, one possibility that would avoid having tuples and t is
to do something like:

	struct_members = KernRe("(" +			# group 1: the entire pattern
				type_pattern +	        # Capture main type
				r'([^\{\};]+)' +
				r'(?:\{)' +
				r'(?:[^\{\}]*)' +	# Capture content
				r'(?:\})' +
				r'([^\{\};]*)(;)')	# Capture IDs
				")")

	match = struct_members.finditer(line)
	for match in res:
	    oldmember, maintype, content, s_ids = match.groups()

(disclaimer notice: none of the above was tested)

Thanks,
Mauro