Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16
| От | Noah Misch |
|---|---|
| Тема | Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16 |
| Дата | |
| Msg-id | 20260214002113.1f.noahmisch@microsoft.com обсуждение исходный текст |
| Ответ на | Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16 (Noah Misch <noah@leadboat.com>) |
| Ответы |
Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16
Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16 |
| Список | pgsql-bugs |
On Fri, Feb 13, 2026 at 02:48:04PM -0800, Noah Misch wrote: > On Fri, Feb 13, 2026 at 09:27:02AM -0800, Noah Misch wrote: > > On Fri, Feb 13, 2026 at 07:46:22AM +0000, PG Bug reporting form wrote: > > > After upgrading from PostgreSQL 15.15 to 15.16, substring(text) raises: > > > >ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x97 > > > on valid UTF-8 text stored in a TOAST-compressed column. > > > > > user=> select substring(data from 1 for 1) from toast_repro; > > > ERROR: 22021: invalid byte sequence for encoding "UTF8": 0xe6 0x97 > > > > Thanks for the report. That is a bug and a regression; I regret missing it > > during review. The substring operation works by taking a 4-byte slice from > > the toasted value (4 bytes being the max length of a UTF8 char in PostgreSQL), > > the finding the actual first character within those bytes. However, it > > incorrectly requires those four bytes to be a valid UTF8 string. I'll start > > on a fix. > > Attached. I may add some more tests, e.g. a toasted invalid string where the > detoasted length is less than the slice we request. Tests already covered that in particular, but I added some other tests. I think this is now ready. Review welcome. I have a Valgrind test run ongoing.
Вложения
В списке pgsql-bugs по дате отправления: