Обсуждение: [GENERAL] Support for \u0000?

Поиск
Список
Период
Сортировка

[GENERAL] Support for \u0000?

От
Matthew Byrne
Дата:
Are there any plans to support \u0000 in JSONB and, relatedly, UTF code point 0 in TEXT?  To the best of my knowledge \u0000 is valid in JSON and code point 0 is valid in UTF-8 but Postgres rejects both, which severely limits its usefulness in many cases.

I am currently working around the issue by using the JSON type, which allows \u0000 to be stored, but this is far from ideal because it can't be cast to TEXT or JSONB and can't even be accessed:

mydb=# select '{"thing":"\u0000"}'::json->>'thing';
ERROR:  unsupported Unicode escape sequence
DETAIL:  \u0000 cannot be converted to text.
CONTEXT:  JSON data, line 1: {"thing":...

Regards,

Matt

Re: [GENERAL] Support for \u0000?

От
Tom Lane
Дата:
Matthew Byrne <mjw.byrne@gmail.com> writes:
> Are there any plans to support \u0000 in JSONB and, relatedly, UTF code
> point 0 in TEXT?

No.  It's basically never going to happen because of the widespread use
of C strings (nul-terminated strings) inside the backend.  Making \0 a
legal member of strings would break all those internal APIs, requiring
touching far more code than anyone would want to do.  It'd likely break
a great deal of client-side code as well.

            regards, tom lane


Re: [GENERAL] Support for \u0000?

От
Matthew Byrne
Дата:
Thanks for the response Tom.  I understand this would be a mammoth task.

Would a more feasible approach be to introduce new types (say, TEXT2 and JSONB2 - or something better-sounding) which are the same as the old ones but add for support \u0000 and UTF 0?  This would isolate nul-containing byte arrays to the implementations of those types and keep backward compatibility by leaving TEXT and JSONB alone.

Matt

On Wed, Jul 19, 2017 at 7:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matthew Byrne <mjw.byrne@gmail.com> writes:
> Are there any plans to support \u0000 in JSONB and, relatedly, UTF code
> point 0 in TEXT?

No.  It's basically never going to happen because of the widespread use
of C strings (nul-terminated strings) inside the backend.  Making \0 a
legal member of strings would break all those internal APIs, requiring
touching far more code than anyone would want to do.  It'd likely break
a great deal of client-side code as well.

                        regards, tom lane

Re: [GENERAL] Support for \u0000?

От
Tom Lane
Дата:
Matthew Byrne <mjw.byrne@gmail.com> writes:
> Would a more feasible approach be to introduce new types (say, TEXT2 and
> JSONB2 - or something better-sounding) which are the same as the old ones
> but add for support \u0000 and UTF 0?  This would isolate nul-containing
> byte arrays to the implementations of those types and keep backward
> compatibility by leaving TEXT and JSONB alone.

The problem is not inside those datatypes; either text or jsonb could
trivially store \0 bytes.  The problem is passing such values through
APIs that don't support it.  Changing those APIs would affect *all*
datatypes.

            regards, tom lane


Re: [GENERAL] Support for \u0000?

От
Matthew Byrne
Дата:
I see.  Thanks for the quick responses!

On Wed, Jul 19, 2017 at 11:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matthew Byrne <mjw.byrne@gmail.com> writes:
> Would a more feasible approach be to introduce new types (say, TEXT2 and
> JSONB2 - or something better-sounding) which are the same as the old ones
> but add for support \u0000 and UTF 0?  This would isolate nul-containing
> byte arrays to the implementations of those types and keep backward
> compatibility by leaving TEXT and JSONB alone.

The problem is not inside those datatypes; either text or jsonb could
trivially store \0 bytes.  The problem is passing such values through
APIs that don't support it.  Changing those APIs would affect *all*
datatypes.

                        regards, tom lane