Re: [PATCH] json_lex_string: don't overread on bad UTF8

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [PATCH] json_lex_string: don't overread on bad UTF8
Дата
Msg-id ZjL5Ed6LDZGDGILj@paquier.xyz
обсуждение исходный текст
Ответ на Re: [PATCH] json_lex_string: don't overread on bad UTF8  (Jacob Champion <jacob.champion@enterprisedb.com>)
Ответы Re: [PATCH] json_lex_string: don't overread on bad UTF8
Список pgsql-hackers
On Wed, May 01, 2024 at 04:22:24PM -0700, Jacob Champion wrote:
> On Tue, Apr 30, 2024 at 11:09 PM Michael Paquier <michael@paquier.xyz> wrote:
>> Not sure to like much the fact that this advances token_terminator
>> first.  Wouldn't it be better to calculate pg_encoding_mblen() first,
>> then save token_terminator?  I feel a bit uneasy about saving a value
>> in token_terminator past the end of the string.  That a nit in this
>> context, still..
>
> v2 tries it that way; see what you think. Is the concern that someone
> might add code later that escapes that macro early?

Yeah, I am not sure if that's something that would really happen, but
that looks like a good practice to keep anyway to keep a clean stack
at any time.

>> Ah, that makes sense.  That looks OK here.  A comment around the test
>> would be adapted to document that, I guess.
>
> Done.

That seems OK at quick glance.  I don't have much room to do something
about this patch this week as an effect of Golden Week and the
buildfarm effect, but I should be able to get to it next week once the
next round of minor releases is tagged.

About the fact that we may finish by printing unfinished UTF-8
sequences, I'd be curious to hear your thoughts.  Now, the information
provided about the partial byte sequences can be also useful for
debugging on top of having the error code, no?
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: New GUC autovacuum_max_threshold ?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [PATCH] json_lex_string: don't overread on bad UTF8