Moumita <dhar61595@xxxxxxxxx> writes: > PATTERNS("bash", > - /* Optional leading indentation */ > + /* Optional leading indentation */ What is this change about? > "^[ \t]*" > - /* Start of captured text */ > + /* Start of captured function name */ > "(" > "(" > - /* POSIX identifier with mandatory parentheses */ > - "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\))" > + /* POSIX identifier with mandatory parentheses (allow spaces inside) */ > + "[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\)" Is indentation-change intended and required for this patch to work correctly? > "|" > - /* Bashism identifier with optional parentheses */ > - "(function[ \t]+[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))" > + /* Bash-style function definitions, allowing optional `function` keyword */ > + "(?:function[ \t]+(?=[a-zA-Z_]))?[a-zA-Z_][a-zA-Z0-9_]*(([ \t]*\\([ \t]*\\))|([ \t]+))?" Ditto. Regular expressions are write-only language; please make sure that you do not add any unnecessary changes to distract eyes of reviewers from spotting the _real_ changes that improves the current codebase. > ")" > /* Optional whitespace */ > "[ \t]*" > - /* Compound command starting with `{`, `(`, `((` or `[[` */ > - "(\\{|\\(\\(?|\\[\\[)" > - /* End of captured text */ > + /* Allow function body to start with `{`, `(` (subshell), `[[` */ > + "(\\{|\\(|\\[\\[)" > + /* End of captured function name */ > ")", > /* -- */ > - /* Characters not in the default $IFS value */ > - "[^ \t]+"), We used to pretty-much use "a run of non-whitespace characters is a token". Now we are a bit more picky. Which may or may not be good, but it is hard to tell if it is an improvement. > + /* Identifiers: variable and function names */ > + "[a-zA-Z_][a-zA-Z0-9_]*" > + /* Numeric constants: integers and decimals */ > + "|[-+]?[0-9]+(\\.[0-9]*)?|[-+]?\\.[0-9]+" > + /* Shell variables: `$VAR`, `${VAR}` */ > + "|\\$[a-zA-Z_][a-zA-Z0-9_]*|\\$\\{[^}]+\\}" > + /* Logical and comparison operators */ > + "|\\|\\||&&|<<|>>|==|!=|<=|>=" > + /* Assignment and arithmetic operators */ > + "|[-+*/%&|^!=<>]=?" > + /* Command-line options (to avoid splitting `-option`) */ > + "|--?[a-zA-Z0-9_-]+" > + /* Brackets and grouping symbols */ > + "|\\(|\\)|\\{|\\}|\\[|\\]"), The fact that this patch does not have any changes to "t/" hierarchy suggests me that we do not have existing tests to see how sample text files in the supported languages are tokenized (otherwise the above changes would require adjusting such existing tests), so I think it should be left outside of this topic, but I wonder if adding such tests gives us a good way to demonstrate the effect of these changes to userdiff patterns. Thanks.