Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.

Поиск
Список
Период
Сортировка
От Spyridon Dimitrios Agathos
Тема Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.
Дата
Msg-id CAFM5RaqekstfZ9OTOw2Kn5LWs_T9WnC2FfEnB55tzDdF7EGPEA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.  (Nikolay Shaplov <dhyan@nataraj.su>)
Ответы Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi all,

this is to verify that the .patch proposed here:


fixes the issue. I applied the patch and:
1) The build type doesn't affect the result of the query result
2) The valgrind doesn't complain about out of bound reads
3) The output of the "faulty" insertion is shown in "\ooo format".

Looking forward to the next steps.

--
Spiros
(ServiceNow)


Στις Πέμ 14 Ιουλ 2022 στις 2:08 μ.μ., ο/η Nikolay Shaplov <dhyan@nataraj.su> έγραψε:
В письме от среда, 13 июля 2022 г. 16:14:39 MSK пользователь Aleksander
Alekseev написал:

Hi! Let me join the review process. Postgres data types is field of expertise I
am interested in.

0. Though it looks like a steady bug, I can't reproduce it. Not using
valgrind, not using ASan (address sanitizer should catch reading out of bounds
too). I am running Debian Bullseye, and tried to build both postgresl 14.4 and
current master.

Never the less I would dig into this issue. And start with the parts that is
not covered by the patch, but seems to be important for me.

1. typename "char" (with quotes) is very-very-very confusing. it is described
in documentation, but you need to be postgres expert or careful documentation
reader, to notice important difference between "char" and char.
What is the history if "char" type? Is it specified by some standard? May be it
is good point to create more understandable alias, like byte_char, ascii_char
or something for usage in practice, and keep "char" for backward compatibility
only.

2. I would totally agree with  Tom Lane and Isaac Morland, that problem should
be also fixed  on the side of type conversion.  There is whole big thread about
it. Guess we should come to some conclusion there

3.Fixing out of bound reading for broken unicode is also important.  Though
for now I am not quite sure it is possible.


> -                     p += pg_mblen(p);
> +             {
> +                     int t = pg_mblen(p);
> +                     p += t;
> +                     max_copy_bytes -= t;
> +             }

Minor issue: Here I would change variable name from "t" to "char_len" or
something, to make code more easy to understand.

Major issue: is pg_mblen function safe to call with broken encoding at the end
of buffer? What if last byte of the buffer is 0xF0 and you call pg_mblen for it?


>+              copy_bytes = p - s;
>+              if(copy_bytes > max_copy_bytes)
>+                      copy_bytes = max_copy_bytes;

Here I would suggest to add comment about broken utf encoding case. That would
explain why we might come to situation when we can try to copy more than we
have.

I would also suggest to issue a warning here. I guess person that uses
postgres would prefer to know that he managed to stuff into postgres a string
with broken utf encoding, before it comes to some terrible consequences. 

> Hi Spyridon,
>
> > The column "single_byte_col" is supposed to store only 1 byte.
> > Nevertheless, the INSERT command implicitly casts the '🀆' text into
> > "char". This means that only the first byte of '🀆' ends up stored in the
> > column. gdb reports that "pg_mblen(p) = 4" (line 1046), which is expected
> > since the pg_mblen('🀆') is indeed 4. Later at line 1050, the memcpy will
> > copy 4 bytes instead of 1, hence an out of bounds memory read happens for
> > pointer 's', which effectively copies random bytes.
> Many thanks for reporting this!
>
> > - OS: Ubuntu 20.04
> > - PSQL version 14.4
>
> I can confirm the bug exists in the `master` branch as well and
> doesn't depend on the platform.
>
> Although the bug is easy to fix for this particular case (see the
> patch) I'm not sure if this solution is general enough. E.g. is there
> something that generally prevents pg_mblen() from doing out of bound
> reading in cases similar to this one? Should we prevent such an INSERT
> from happening instead?


--
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: The "char" type versus non-ASCII characters
Следующее
От: Ranier Vilela
Дата:
Сообщение: Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)