Re: Built-in CTYPE provider

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: Built-in CTYPE provider
Дата
Msg-id 20240706195129.fd@rfd.leadboat.com
обсуждение исходный текст
Ответ на Re: Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Built-in CTYPE provider
Re: Built-in CTYPE provider
Список pgsql-hackers
On Fri, Jul 05, 2024 at 02:38:45PM -0700, Jeff Davis wrote:
> On Thu, 2024-07-04 at 14:26 -0700, Noah Misch wrote:
> > I think you're saying that if some Unicode update changes the results
> > of a
> > STABLE function but does not change the result of any IMMUTABLE
> > function, we
> > may as well import that update.  Is that about right?  If so, I
> > agree.
> 
> If you are proposing that Unicode updates should not be performed if
> they affect the results of any IMMUTABLE function, then that's a new
> policy.
> 
> For instance, the results of NORMALIZE() changed from PG15 to PG16 due
> to commit 1091b48cd7:
> 
>   SELECT NORMALIZE(U&'\+01E030',nfkc)::bytea;
> 
>   Version 15: \xf09e80b0
> 
>   Version 16: \xd0b0

As a released feature, NORMALIZE() has a different set of remedies to choose
from, and I'm not proposing one.  I may have sidetracked this thread by
talking about remedies without an agreement that pg_c_utf8 has a problem.  My
question for the PostgreSQL maintainers is this:

  textregexeq(... COLLATE pg_c_utf8, '[[:alpha:]]') and lower(), despite being
  IMMUTABLE, will change behavior in some major releases.  pg_upgrade does not
  have a concept of IMMUTABLE functions changing, so index scans will return
  wrong query results after upgrade.  Is it okay for v17 to release a
  pg_c_utf8 planned to behave that way when upgrading v17 to v18+?

If the answer is yes, the open item closes.  If the answer is no, determining
the remedy can come next.


Lest concrete details help anyone reading, here are some affected objects:

  CREATE TABLE t (s text COLLATE pg_c_utf8);
  INSERT INTO t VALUES (U&'\+00a7dc'), (U&'\+001dd3');
  CREATE INDEX iexpr ON t ((lower(s)));
  CREATE INDEX ipred ON t (s) WHERE s ~ '[[:alpha:]]';

v17 can simulate the Unicode aspect of a v18 upgrade, like this:

  sed -i 's/^UNICODE_VERSION.*/UNICODE_VERSION = 16.0.0/' src/Makefile.global.in
  # ignore test failures (your ICU likely doesn't have the Unicode 16.0.0 draft)
  make -C src/common/unicode update-unicode
  make
  make install
  pg_ctl restart

Behavior after that:

-- 2 rows w/ seq scan, 0 rows w/ index scan
SELECT 1 FROM t WHERE s ~ '[[:alpha:]]';
SET enable_seqscan = off;
SELECT 1 FROM t WHERE s ~ '[[:alpha:]]';

-- ERROR:  heap tuple (0,1) from table "t" lacks matching index tuple within index "iexpr"
SELECT bt_index_parent_check('iexpr', heapallindexed => true);
-- ERROR:  heap tuple (0,1) from table "t" lacks matching index tuple within index "ipred"
SELECT bt_index_parent_check('ipred', heapallindexed => true);



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: XML test error on Arch Linux
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Built-in CTYPE provider