On Fri, 2025-09-12 at 10:07 -0400, Tom Lane wrote: > Dominique Devienne <ddevienne@xxxxxxxxx> writes: > > > This DOES look like a bug, no? I've done regexes for a long time, > > > and these two forms should be equivalent IMHO. --DD > > Yeah, I agree it's busted. You can use EXPLAIN VERBOSE to see the > translated-to-POSIX pattern, and it's wrong: > > regression=# explain verbose with t(v) as (values ('foo:bar'), ('foo/bar'), ('foo0bar')) > select v from t where v similar to 'foo[\d\w]_%'; > QUERY PLAN > -------------------------------------------------------------- > Values Scan on "*VALUES*" (cost=0.00..0.05 rows=1 width=32) > Output: "*VALUES*".column1 > Filter: ("*VALUES*".column1 ~ '^(?:foo[\d\w]_%)$'::text) > (3 rows) > > The _ and % are not getting converted to their POSIX equivalents > ("." and ".*"). Indeed, and I have to take the blame for introducing a bug in a minor release :^( The attached patch should fix the problem. Yours, Laurenz Albe
From 8d0f8ebac6c42fe7da36ec8c30ee091d20270068 Mon Sep 17 00:00:00 2001 From: Laurenz Albe <laurenz.albe@xxxxxxxxxxx> Date: Sat, 13 Sep 2025 01:32:55 +0200 Subject: [PATCH v1] Amend recent fix for SIMILAR TO regex conversion Commit e3ffc3e91d fixed the translation of character classes in SIMILAR TO regular expressions. Unfortunately the fix broke a corner case: if there is an escape character right after the opening bracket, (for example in "[\q]") a closing bracket right after the escape sequence would not be seen as closing the character class. Author: Laurenz Albe <laurenz.albe@xxxxxxxxxxx> Reported-By: Dominique Devienne <ddevienne@xxxxxxxxx> Reported-By: Stephan Springl <springl-psql@xxxxxxxxxxxxx> Backpatch-through: 13 --- src/backend/utils/adt/regexp.c | 6 ++++++ src/test/regress/expected/strings.out | 9 +++++++++ src/test/regress/sql/strings.sql | 3 +++ 3 files changed, 18 insertions(+) diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c index 6e2864cbbda..29692fb1a9d 100644 --- a/src/backend/utils/adt/regexp.c +++ b/src/backend/utils/adt/regexp.c @@ -948,6 +948,12 @@ similar_escape_internal(text *pat_text, text *esc_text) */ *r++ = '\\'; *r++ = pchar; + /* + * If we encounter an escaped character in a character class, + * we must be past an initial ^ or ]. + */ + if (charclass_depth > 0) + charclass_start = 3; } afterescape = false; } diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out index ba302da51e7..2d6cb02ad60 100644 --- a/src/test/regress/expected/strings.out +++ b/src/test/regress/expected/strings.out @@ -693,6 +693,15 @@ EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[^^]^'; Filter: (f1 ~ '^(?:[^^]\^)$'::text) (2 rows) +-- Closing square bracket after an escape sequence at the beginning of +-- a character closes the character class +EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[|a]%' ESCAPE '|'; + QUERY PLAN +--------------------------------------- + Seq Scan on text_tbl + Filter: (f1 ~ '^(?:[\a].*)$'::text) +(2 rows) + -- Test backslash escapes in regexp_replace's replacement string SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', E'(\\1) \\2-\\3'); regexp_replace diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql index b94004cc08c..5ed421d6205 100644 --- a/src/test/regress/sql/strings.sql +++ b/src/test/regress/sql/strings.sql @@ -218,6 +218,9 @@ EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[]%][^]%][^%]%'; -- Closing square bracket effective after two carets at the beginning -- of character class. EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[^^]^'; +-- Closing square bracket after an escape sequence at the beginning of +-- a character closes the character class +EXPLAIN (COSTS OFF) SELECT * FROM TEXT_TBL WHERE f1 SIMILAR TO '[|a]%' ESCAPE '|'; -- Test backslash escapes in regexp_replace's replacement string SELECT regexp_replace('1112223333', E'(\\d{3})(\\d{3})(\\d{4})', E'(\\1) \\2-\\3'); -- 2.51.0