Re: Bytea as C string in pg_convert?

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: Bytea as C string in pg_convert?
Дата
Msg-id 46F7A467.8040009@dunslane.net
обсуждение исходный текст
Ответ на Bytea as C string in pg_convert?  ("Brendan Jurd" <direvus@gmail.com>)
Ответы Re: Bytea as C string in pg_convert?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers

Brendan Jurd wrote:
> Hi hackers,
>
> In the process of trying to unify the various text/cstring conversions
> in the backend, I came across some stuff that seemed weird in
> pg_convert().
>
> >From src/backend/utils/mb/mbutils.c:345:
>
> Datum
> pg_convert(PG_FUNCTION_ARGS)
> {
>         bytea      *string = PG_GETARG_TEXT_P(0);
>
> Is this a typo?  Seems this should be PG_GETARG_BYTEA_P.
>
> Moving on from that, the function takes the bytea argument and
> converts it into a C string (using the exact the same technique as
> textout, which is why I noticed it).
>
> The documentation is very clear that bytea values "specifically allow
> storing octets of value zero and other "non-printable" octets".  That
> being the case, is it sane to convert a bytea to a cstring at all?
> The possibility of having valid nulls in the value renders the whole
> point of a null-terminated character array ... well, null.
>
> Before putting it into a cstring, the function does put the bytea
> value through pg_verify_mbstr(), so basically the issue goes away if
> we accept the premise that we will never allow a character encoding
> where the null byte is valid.  However, if we reject that premise
> there's a problem.
>
> pg_convert() does pass the length of the bytea along to
> pg_do_encoding_conversion(), so either
>
>  a) the encoding functions properly respect length and ignore nulls in
> the string, in which case the null at the end is worthless and we may
> as well just operate on the VARDATA of the bytea, or
>  b) the encoding functions treat a null byte as the end of the string,
> in which case they are broken w.r.t. to bytea input.
>
>
>   

Please read the recent discussions about encoding issues. convert() now 
returns a bytea precisely because we cannot be sure that the data 
returned will be valid in the database encoding. The behaviour here is 
entirely intentional. We have just closed every hole we are aware of 
whereby data that isn't valid in the database encoding can enter the 
database. We're not about to reopen them.

And yes, we will not accept an encoding with a null byte. We don't even 
accept nulls in Unicode. If we do accept such an encoding then this 
would be among the least of our problems, I suspect.

We can and possibly should change the GETARG call, but the varlena types 
are structurally equivalent, so it's not a mortal sin being committed here.

cheers

andrew


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gregory Stark
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: CREATE DATABASE cannot be executed from a function or multi-command string