Обсуждение: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

Поиск

Список

Период

Сортировка

BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

PG Bug reporting form

Дата:

13 февраля, 10:46:22

The following bug has been logged on the website:

Bug reference:      19406
Logged by:          SATŌ Kentarō
Email address:      ranvis@gmail.com
PostgreSQL version: 15.16
Operating system:   Linux 5.14.0-611.27.1.el9_7.x86_64
Description:

After upgrading from PostgreSQL 15.15 to 15.16, substring(text) raises:
>ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x97
on valid UTF-8 text stored in a TOAST-compressed column.

The same SQL works on PostgreSQL 15.15.
The data is valid UTF-8. The problem occurs only when operating on a
TOAST-compressed value.

Steps to Reproduce:
$ psql
psql (15.16)
Type "help" for help.

user=> \x
Expanded display is on.
user=> \set VERBOSITY verbose
user=> create table toast_repro (data text);
CREATE TABLE
user=> insert into toast_repro select repeat('日向', 2000); -- e6 97 a5 e5 90
91
INSERT 0 1
user=> select pg_column_size(data), octet_length(data) from toast_repro;
-[ RECORD 1 ]--+------
pg_column_size | 153
octet_length   | 12000
user=> select substring(data from 1 for 2) from toast_repro;
-[ RECORD 1 ]---
substring | 日向
user=> select substring(data from 1 for 1) from toast_repro;
ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xe6 0x97
LOCATION:  report_invalid_encoding_int, mbutils.c:1796
user=> select substring('日向' from 1 for 1);
-[ RECORD 1 ]-
substring | 日
user=> select substring(data::varchar(100) from 1 for 1) from toast_repro;
-[ RECORD 1 ]-
substring | 日
user=> select substring(data from 1 for 1) from ( select data || '' as data
from toast_repro) t;
-[ RECORD 1 ]-
substring | 日

Environment:
version | PostgreSQL 15.16 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
11.5.0 20240719 (Red Hat 11.5.0-11), 64-bit
server_encoding | UTF8
client_encoding | UTF8
lc_collate | ja_JP.UTF-8
lc_ctype | ja_JP.UTF-8

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Noah Misch

Дата:

13 февраля, 20:27:02

On Fri, Feb 13, 2026 at 07:46:22AM +0000, PG Bug reporting form wrote:
> After upgrading from PostgreSQL 15.15 to 15.16, substring(text) raises:
> >ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> on valid UTF-8 text stored in a TOAST-compressed column.

> user=> select substring(data from 1 for 1) from toast_repro;
> ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xe6 0x97

Thanks for the report.  That is a bug and a regression; I regret missing it
during review.  The substring operation works by taking a 4-byte slice from
the toasted value (4 bytes being the max length of a UTF8 char in PostgreSQL),
the finding the actual first character within those bytes.  However, it
incorrectly requires those four bytes to be a valid UTF8 string.  I'll start
on a fix.

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Thomas Munro

Дата:

14 февраля, 00:58:50

On Sat, Feb 14, 2026 at 6:27 AM Noah Misch <noah@leadboat.com> wrote:
> On Fri, Feb 13, 2026 at 07:46:22AM +0000, PG Bug reporting form wrote:
> > After upgrading from PostgreSQL 15.15 to 15.16, substring(text) raises:
> > >ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> > on valid UTF-8 text stored in a TOAST-compressed column.
>
> > user=> select substring(data from 1 for 1) from toast_repro;
> > ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xe6 0x97
>
> Thanks for the report.  That is a bug and a regression; I regret missing it
> during review.  The substring operation works by taking a 4-byte slice from
> the toasted value (4 bytes being the max length of a UTF8 char in PostgreSQL),
> the finding the actual first character within those bytes.  However, it
> incorrectly requires those four bytes to be a valid UTF8 string.  I'll start
> on a fix.

Ack.  Also looking into this.

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Noah Misch

Дата:

14 февраля, 01:48:04

On Fri, Feb 13, 2026 at 09:27:02AM -0800, Noah Misch wrote:
> On Fri, Feb 13, 2026 at 07:46:22AM +0000, PG Bug reporting form wrote:
> > After upgrading from PostgreSQL 15.15 to 15.16, substring(text) raises:
> > >ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> > on valid UTF-8 text stored in a TOAST-compressed column.
> 
> > user=> select substring(data from 1 for 1) from toast_repro;
> > ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> 
> Thanks for the report.  That is a bug and a regression; I regret missing it
> during review.  The substring operation works by taking a 4-byte slice from
> the toasted value (4 bytes being the max length of a UTF8 char in PostgreSQL),
> the finding the actual first character within those bytes.  However, it
> incorrectly requires those four bytes to be a valid UTF8 string.  I'll start
> on a fix.

Attached.  I may add some more tests, e.g. a toasted invalid string where the
detoasted length is less than the slice we request.  This version is viable,
however.

I audited the other pg_mbstrlen_with_len(), and I think they're all okay with
an error if the input has an incomplete char.  Hence, those don't need changes
beyond what we're already released.  Most pass either parser input or an
existing datum with its len.  text_position_get_match_pos() is the most subtle
caller, and I think it's fine.

I audited other uses of slice detoast.  The only other one is bytea substring,
which is obviously indifferent to character encoding.

Вложения

toast-slice-mblen-v1.patch

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Noah Misch

Дата:

14 февраля, 03:21:13

On Fri, Feb 13, 2026 at 02:48:04PM -0800, Noah Misch wrote:
> On Fri, Feb 13, 2026 at 09:27:02AM -0800, Noah Misch wrote:
> > On Fri, Feb 13, 2026 at 07:46:22AM +0000, PG Bug reporting form wrote:
> > > After upgrading from PostgreSQL 15.15 to 15.16, substring(text) raises:
> > > >ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> > > on valid UTF-8 text stored in a TOAST-compressed column.
> > 
> > > user=> select substring(data from 1 for 1) from toast_repro;
> > > ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> > 
> > Thanks for the report.  That is a bug and a regression; I regret missing it
> > during review.  The substring operation works by taking a 4-byte slice from
> > the toasted value (4 bytes being the max length of a UTF8 char in PostgreSQL),
> > the finding the actual first character within those bytes.  However, it
> > incorrectly requires those four bytes to be a valid UTF8 string.  I'll start
> > on a fix.
> 
> Attached.  I may add some more tests, e.g. a toasted invalid string where the
> detoasted length is less than the slice we request.

Tests already covered that in particular, but I added some other tests.  I
think this is now ready.  Review welcome.  I have a Valgrind test run ongoing.

Вложения

toast-slice-mblen-v2.patch

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Noah Misch

Дата:

14 февраля, 08:38:21

On Fri, Feb 13, 2026 at 04:21:13PM -0800, Noah Misch wrote:
> Review welcome.  I have a Valgrind test run ongoing.

Valgrind found the complaint below, but I think this an instrumentation
problem.  I've added a fix for that instrumentation.  I also made minor edits
to the log message of the main patch, hence v3.

The release team is preparing to announce a 2026-02-26 out-of-cycle release in
light of this regression.  I plan to push these fixes at 2026-02-14T20:00+0000
to unblock that formal announcement.


==00:00:01:11.756 3464664== VALGRINDERROR-BEGIN
==00:00:01:11.756 3464664== Unaddressable byte(s) found during client check request
==00:00:01:11.769 3464664==    at 0xC6076A: pg_mblen_with_len (mbutils.c:1115)
==00:00:01:11.769 3464664==    by 0xC07CD3: pg_mbcharcliplen_chars (varlena.c:807)
==00:00:01:11.770 3464664==    by 0xC07AAD: text_substring (varlena.c:732)
==00:00:01:11.770 3464664==    by 0xC07797: text_substr (varlena.c:553)
==00:00:01:11.770 3464664==    by 0x779688: ExecInterpExpr (execExprInterp.c:953)
==00:00:01:11.770 3464664==    by 0x77BBD6: ExecInterpExprStillValid (execExprInterp.c:2299)
==00:00:01:11.770 3464664==    by 0x7DBBC4: ExecEvalExprNoReturn (executor.h:423)
==00:00:01:11.770 3464664==    by 0x7DBC73: ExecEvalExprNoReturnSwitchContext (executor.h:464)
==00:00:01:11.770 3464664==    by 0x7DBCD3: ExecProject (executor.h:496)
==00:00:01:11.770 3464664==    by 0x7DC134: ExecScanExtended (execScan.h:234)
==00:00:01:11.770 3464664==    by 0x7DC474: ExecSeqScanWithProject (nodeSeqscan.c:162)
==00:00:01:11.770 3464664==    by 0x794561: ExecProcNodeFirst (execProcnode.c:469)
==00:00:01:11.770 3464664==  Address 0x19340a5e is 12,062 bytes inside a block of size 12,064 alloc'd
==00:00:01:11.770 3464664==    at 0x4C29F73: malloc (vg_replace_malloc.c:309)
==00:00:01:11.770 3464664==    by 0xC79F03: AllocSetAllocLarge (aset.c:756)
==00:00:01:11.770 3464664==    by 0xC7AA4C: AllocSetAlloc (aset.c:1033)
==00:00:01:11.770 3464664==    by 0xC8B37E: palloc (mcxt.c:1408)
==00:00:01:11.770 3464664==    by 0x4B0182: detoast_attr_slice (detoast.c:324)
==00:00:01:11.770 3464664==    by 0xC5177F: pg_detoast_datum_slice (fmgr.c:1825)
==00:00:01:11.770 3464664==    by 0xC07A11: text_substring (varlena.c:716)
==00:00:01:11.770 3464664==    by 0xC07797: text_substr (varlena.c:553)
==00:00:01:11.770 3464664==    by 0x779688: ExecInterpExpr (execExprInterp.c:953)
==00:00:01:11.770 3464664==    by 0x77BBD6: ExecInterpExprStillValid (execExprInterp.c:2299)
==00:00:01:11.770 3464664==    by 0x7DBBC4: ExecEvalExprNoReturn (executor.h:423)
==00:00:01:11.770 3464664==    by 0x7DBC73: ExecEvalExprNoReturnSwitchContext (executor.h:464)
==00:00:01:11.770 3464664== 
==00:00:01:11.770 3464664== VALGRINDERROR-END
{
   <insert_a_suppression_name_here>
   Memcheck:User
   fun:pg_mblen_with_len
   fun:pg_mbcharcliplen_chars
   fun:text_substring
   fun:text_substr
   fun:ExecInterpExpr
   fun:ExecInterpExprStillValid
   fun:ExecEvalExprNoReturn
   fun:ExecEvalExprNoReturnSwitchContext
   fun:ExecProject
   fun:ExecScanExtended
   fun:ExecSeqScanWithProject
   fun:ExecProcNodeFirst
}
2026-02-13 21:03:38.905 PST client backend[3464664] pg_regress/encoding ERROR:  invalid byte sequence for encoding
"UTF8":0xe2 0x80
 
2026-02-13 21:03:38.905 PST client backend[3464664] pg_regress/encoding STATEMENT:  SELECT SUBSTRING(c FROM 4001 FOR 1)
FROMtoast_3b_utf8;
 
**00:00:01:11.771 3464664** Valgrind detected 1 error(s) during execution of "SELECT SUBSTRING(c FROM 4001 FOR 1) FROM
toast_3b_utf8;"

Вложения

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Thomas Munro

Дата:

14 февраля, 11:15:45

On Sat, Feb 14, 2026 at 6:38 PM Noah Misch <noah@leadboat.com> wrote:
> [mblen-valgrind-after-report-v1.patch]

LGTM.  The new valgrind check should clearly be after the new non-local exit.

Studying the other patch...

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Thomas Munro

Дата:

14 февраля, 22:07:22

On Sat, Feb 14, 2026 at 9:15 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sat, Feb 14, 2026 at 6:38 PM Noah Misch <noah@leadboat.com> wrote:
> > [mblen-valgrind-after-report-v1.patch]
>
> LGTM.  The new valgrind check should clearly be after the new non-local exit.
>
> Studying the other patch...

                        /*
-                        * Total slice size in bytes can't be any
longer than the start
-                        * position plus substring length times the
encoding max length.
-                        * If that overflows, we can just use -1.
+                        * Total slice size in bytes can't be any
longer than the
+                        * inclusive end position times the encoding
max length.  If that
+                        * overflows, we can just use -1.
                         */
-                       if (pg_mul_s32_overflow(E, eml, &slice_size))
+                       if (pg_mul_s32_overflow(E - 1, eml, &slice_size))
                                slice_size = -1;

Isn't it still conceptually "exclusive", but adjusted to be zero-indexed?

                /* Now we can get the actual length of the slice in MB
characters */
-               slice_strlen = pg_mbstrlen_with_len(VARDATA_ANY(slice),
-
                 slice_len);
+               slice_strlen =
+                       (slice_size == -1 ?
+                        pg_mbstrlen_with_len(VARDATA_ANY(slice), slice_len) :
+                        pg_mbcharcliplen_chars(VARDATA_ANY(slice),
slice_len, E - 1));

Comment presumably needs adjustment to say that we only count as far
as we need to, and why.

There is something a bit strange about all this, though.
pg_mbstrlen_with_len(..., -1) returns 0, so if you ask for characters
that really exist past 2^29 (~500 million), you must get an empty
string, right?   That's hard to reach, pre-existing and out of scope
for the immediate problem report, except ... now we're contorting the
code even further to keep it.

The outline I had come up with before seeing your patch was: let's
just delete it.  The position search can check bounds incrementally,
following our general approach.  This avoids the reported problem by
ditching the pre-flight scan through the slice (up to 4x more
pg_mblen_XXX calls and memory access than we strictly need), and also
the special cases for empty strings since they already fall out of the
general behaviour (am I missing something?), not leaving much code
behind.  As far as I can see so far, the only user-visible side-effect
requires corruption: substring() moves from the
internal-NUL-is-terminator category to internal-NUL-is-character
category, but that's an implementation detail.

When I saw your patch yesterday, I initially abandoned the thought,
thinking that your idea looked more conservative, but after sleeping
on it and reflecting again on these oddities, I have merged my draft
implementation with your tests, ancient detoasting fence post
observation and commit message, just to see if you think this approach
might be worth considering further.

Вложения

v2tm-0001-Fix-SUBSTRING-for-toasted-multibyte-characters.patch

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Noah Misch

Дата:

14 февраля, 22:33:44

On Sun, Feb 15, 2026 at 08:07:22AM +1300, Thomas Munro wrote:
> On Sat, Feb 14, 2026 at 9:15 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Sat, Feb 14, 2026 at 6:38 PM Noah Misch <noah@leadboat.com> wrote:
> > > [mblen-valgrind-after-report-v1.patch]
> >
> > LGTM.  The new valgrind check should clearly be after the new non-local exit.
> >
> > Studying the other patch...

Thank you.

>                         /*
> -                        * Total slice size in bytes can't be any
> longer than the start
> -                        * position plus substring length times the
> encoding max length.
> -                        * If that overflows, we can just use -1.
> +                        * Total slice size in bytes can't be any
> longer than the
> +                        * inclusive end position times the encoding
> max length.  If that
> +                        * overflows, we can just use -1.
>                          */
> -                       if (pg_mul_s32_overflow(E, eml, &slice_size))
> +                       if (pg_mul_s32_overflow(E - 1, eml, &slice_size))
>                                 slice_size = -1;
> 
> Isn't it still conceptually "exclusive", but adjusted to be zero-indexed?

Since it's discrete (being an integer), "up to E, exclusive" and "up to E - 1,
inclusive" are the same thing.  My comment may not be the optimal way to
express that.

>                 /* Now we can get the actual length of the slice in MB
> characters */
> -               slice_strlen = pg_mbstrlen_with_len(VARDATA_ANY(slice),
> -
>                  slice_len);
> +               slice_strlen =
> +                       (slice_size == -1 ?
> +                        pg_mbstrlen_with_len(VARDATA_ANY(slice), slice_len) :
> +                        pg_mbcharcliplen_chars(VARDATA_ANY(slice),
> slice_len, E - 1));
> 
> Comment presumably needs adjustment to say that we only count as far
> as we need to, and why.

Changed to:

        /*
         * Now we can get the actual length of the slice in MB characters,
         * stopping at the end of the substring.  Continuing beyond the
         * substring end could find an incomplete character attributable
         * solely to DatumGetTextPSlice() chopping in the middle of a
         * character, and it would be superfluous work at best.
         */

> There is something a bit strange about all this, though.
> pg_mbstrlen_with_len(..., -1) returns 0, so if you ask for characters
> that really exist past 2^29 (~500 million), you must get an empty
> string, right?   That's hard to reach, pre-existing and out of scope
> for the immediate problem report, except ... now we're contorting the
> code even further to keep it.

- slice_size is the amount we *requested* from the toaster.  It can be -1,
  which retrieves the max available.
- slice_len is the amount *returned* from the toaster.  It's nonnegative.

Does a behavior specific to strings >2^29 still exist?

> The outline I had come up with before seeing your patch was: let's
> just delete it.  The position search can check bounds incrementally,
> following our general approach.  This avoids the reported problem by
> ditching the pre-flight scan through the slice (up to 4x more
> pg_mblen_XXX calls and memory access than we strictly need), and also
> the special cases for empty strings since they already fall out of the
> general behaviour (am I missing something?), not leaving much code
> behind.

Like you, I made a note that it's wasteful to make two mblen passes over the
string.  I'm only seeing a 50% reduction in mblen calls, not an 80% reduction,
but I didn't look too closely.  I guessed such a change would be less clearly
correct, so I figured it would be less suitable for back branches.  Hence, I
didn't draft it.

> As far as I can see so far, the only user-visible side-effect
> requires corruption: substring() moves from the
> internal-NUL-is-terminator category to internal-NUL-is-character
> category, but that's an implementation detail.

That does carry some risk, not necessarily too much to accept.

> When I saw your patch yesterday, I initially abandoned the thought,
> thinking that your idea looked more conservative, but after sleeping
> on it and reflecting again on these oddities, I have merged my draft
> implementation with your tests, ancient detoasting fence post
> observation and commit message, just to see if you think this approach
> might be worth considering further.

My first impression, hurried due to the commit ETA in 30 minutes, is that this
is less conservative and should hold for master-only.

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Thomas Munro

Дата:

14 февраля, 23:10:00

On Sun, Feb 15, 2026 at 8:33 AM Noah Misch <noah@leadboat.com> wrote:
> - slice_len is the amount *returned* from the toaster.  It's nonnegative.

Ah, right, that makes more sense.

> > The outline I had come up with before seeing your patch was: let's
> > just delete it.  The position search can check bounds incrementally,
> > following our general approach.  This avoids the reported problem by
> > ditching the pre-flight scan through the slice (up to 4x more
> > pg_mblen_XXX calls and memory access than we strictly need), and also
> > the special cases for empty strings since they already fall out of the
> > general behaviour (am I missing something?), not leaving much code
> > behind.
>
> Like you, I made a note that it's wasteful to make two mblen passes over the
> string.  I'm only seeing a 50% reduction in mblen calls, not an 80% reduction,
> but I didn't look too closely.  I guessed such a change would be less clearly
> correct, so I figured it would be less suitable for back branches.  Hence, I
> didn't draft it.

I was comparing to unpatched master, but yeah of course your patch
already gets part of the way there.

> My first impression, hurried due to the commit ETA in 30 minutes, is that this
> is less conservative and should hold for master-only.

Got it.  Will add it to the pile of master-only fallout from this area.

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Thomas Munro

Дата:

14 февраля, 23:19:04

Also, LGTM.

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Noah Misch

Дата:

15 февраля, 01:46:21

On Sun, Feb 15, 2026 at 09:19:04AM +1300, Thomas Munro wrote:
> Also, LGTM.

Pushed as commit 9f4fd11

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Tom Lane

Дата:

16 февраля, 21:21:45

Noah Misch <noah@leadboat.com> writes:
> Pushed as commit 9f4fd11

Various BF animals are complaining

varlena.c: In function 'text_substring':
varlena.c:590:9: warning: 'E' may be used uninitialized in this function [-Wmaybe-uninitialized]
  int32  E;    /* end position, exclusive */
         ^

and I've also seen that locally depending on which gcc version and -O
level I'm using.  Could we silence that?


 bollworm      | 2026-02-16 09:32:03 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
 camel         | 2026-02-16 09:32:30 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
 chimaera      | 2026-02-16 11:40:29 |
/home/debian/20-chimaera/buildroot/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/varlena.c:739:5:warning: 'E' may be
useduninitialized in this function [-Wmaybe-uninitialized] 
 comma         | 2026-02-16 14:55:48 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
 conchuela     | 2026-02-16 15:20:01 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
 flea          | 2026-02-16 09:31:52 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
 krait         | 2026-02-16 09:50:17 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
 pipit         | 2026-02-15 02:36:02 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
 urocryon      | 2026-02-15 01:45:07 | varlena.c:739:5: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
 wireworm      | 2026-02-16 15:20:49 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
 ziege         | 2026-02-15 01:18:25 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]

            regards, tom lane

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Noah Misch

Дата:

17 февраля, 07:20:43

On Mon, Feb 16, 2026 at 01:21:45PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Pushed as commit 9f4fd11
> 
> Various BF animals are complaining
> 
> varlena.c: In function 'text_substring':
> varlena.c:590:9: warning: 'E' may be used uninitialized in this function [-Wmaybe-uninitialized]
>   int32  E;    /* end position, exclusive */
>          ^
> 
> and I've also seen that locally depending on which gcc version and -O
> level I'm using.  Could we silence that?
> 
> 
>  bollworm      | 2026-02-16 09:32:03 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
>  camel         | 2026-02-16 09:32:30 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
>  chimaera      | 2026-02-16 11:40:29 |
/home/debian/20-chimaera/buildroot/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/varlena.c:739:5:warning: 'E' may be
useduninitialized in this function [-Wmaybe-uninitialized]
 
>  comma         | 2026-02-16 14:55:48 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
>  conchuela     | 2026-02-16 15:20:01 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
>  flea          | 2026-02-16 09:31:52 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
>  krait         | 2026-02-16 09:50:17 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
>  pipit         | 2026-02-15 02:36:02 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
>  urocryon      | 2026-02-15 01:45:07 | varlena.c:739:5: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
>  wireworm      | 2026-02-16 15:20:49 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]
>  ziege         | 2026-02-15 01:18:25 | varlena.c:590:9: warning: 'E' may be used uninitialized in this function
[-Wmaybe-uninitialized]

Thanks.  I've pushed commit 8cef93d.

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

От

Noah Misch

Дата:

17 февраля, 21:15:18

On Fri, Feb 13, 2026 at 04:21:13PM -0800, Noah Misch wrote:
> On Fri, Feb 13, 2026 at 02:48:04PM -0800, Noah Misch wrote:
> > Attached.  I may add some more tests, e.g. a toasted invalid string where the
> > detoasted length is less than the slice we request.
> 
> Tests already covered that in particular, but I added some other tests.

I cast a wider net for string function bugs specific to toasted strings and/or
multibyte text.  That yielded the attached.  This did reproduce $SUBJECT, but
it didn't find additional bugs.  The method is crude, but I'm archiving it
here so folks know the testing happened and in case someone pursues similar
testing in the future.

Вложения

ascii2utf8sql-v0.1.patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16

Вложения

Вложения

Вложения

Вложения

Вложения