Обсуждение: BUG #17277: write past chunk when calling normalize() on an empty string

Поиск
Список
Период
Сортировка

BUG #17277: write past chunk when calling normalize() on an empty string

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17277
Logged by:          Matthijs van der Vleuten
Email address:      postgresql@zr40.nl
PostgreSQL version: 14.0
Operating system:   Debian sid
Description:

When calling normalize(''), that is, on an empty string, a warning is
raised: "problem in alloc set ExprContext: detected write past chunk end".

I believe this is due to an error in unicode_norm.c. In unicode_normalize(),
when recompose is true (that is, when using NFC or NFKC normalization) the
loop on line 498 will iterate once before checking count < decomp_size. When
the input is an empty string, this would cause a write outside of the memory
allocated for recomp_chars.

Reproduction:
zr40@[local]:5432 ~=# select version();
                                                     version
                                     
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 PostgreSQL 14.0 (Debian 14.0-1.pgdg+1) on x86_64-pc-linux-gnu, compiled by
gcc (Debian 10.3.0-11) 10.3.0, 64-bit
(1 row)
zr40@[local]:5432 ~=# select normalize('');
WARNING:  problem in alloc set ExprContext: detected write past chunk end in
block 0x55793d119620, chunk 0x55793d1196a8
WARNING:  problem in alloc set ExprContext: detected write past chunk end in
block 0x55793d119620, chunk 0x55793d1196a8
 normalize 
───────────
 
(1 row)


Re: BUG #17277: write past chunk when calling normalize() on an empty string

От
Michael Paquier
Дата:
On Tue, Nov 09, 2021 at 09:55:08PM +0000, PG Bug reporting form wrote:
> When calling normalize(''), that is, on an empty string, a warning is
> raised: "problem in alloc set ExprContext: detected write past chunk end".

Well, direct callers of unicode_normalize_kc() in ~12 would have the
same problem because this code was not written with this case in mind
as far as I recall, after looking at the git history (60f11b8) as
pg_saslprep() does not allow the case of empty passwords.

> I believe this is due to an error in unicode_norm.c. In unicode_normalize(),
> when recompose is true (that is, when using NFC or NFKC normalization) the
> loop on line 498 will iterate once before checking count < decomp_size. When
> the input is an empty string, this would cause a write outside of the memory
> allocated for recomp_chars.

No, the code does not take the recomposition loop in this case, but
the initialization of target_pos to 1 would cause recomp_chars to be
written past its allocation position by one byte.

As there could be callers of unicode_normalize[_kc]() outside core,
I'd rather fix that at the source and patch unicode_norm.c.  One way
to do that would be to leave once you know that there is nothing to
decompose after the loop over decompose_code() and return decomp_chars
that would be set with an empty set of points, as per the attached.

There may be a point in issuing an error if there is an empty string,
though.  Another thing would be to consider if is_normalized() should
return false for an empty string, but we have considered empty strings
as normalized since this has been released:
=# SELECT '' IS NFD NORMALIZED;
 is_normalized
---------------
  t
(1 row)

That feels more natural this way.  Still, I can see some perl modules
that would return false for such a case, by the way.  The
normalization docs don't seem to mention that directly, except for the
stream-safe text format:
https://www.unicode.org/faq/normalization.html
https://unicode.org/reports/tr15/tr15-51.html
--
Michael

Вложения

Re: BUG #17277: write past chunk when calling normalize() on an empty string

От
Michael Paquier
Дата:
On Wed, Nov 10, 2021 at 03:33:29PM +0900, Michael Paquier wrote:
> That feels more natural this way.  Still, I can see some perl modules
> that would return false for such a case, by the way.  The
> normalization docs don't seem to mention that directly, except for the
> stream-safe text format:
> https://www.unicode.org/faq/normalization.html
> https://unicode.org/reports/tr15/tr15-51.html

I have expanded the tests, and fixed this one as of 098c1345.  Thanks
for the report, Matthijs!
--
Michael

Вложения