Re: WIP Patch: Add a function that returns binary JSONB as a bytea

Поиск
Список
Период
Сортировка
От Christian Ohler
Тема Re: WIP Patch: Add a function that returns binary JSONB as a bytea
Дата
Msg-id CAOsiKEKAJn0TEVg=_nCOw=vdCDRyMrJGBrPKs4uZuerczj2KfA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP Patch: Add a function that returns binary JSONB as a bytea  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: WIP Patch: Add a function that returns binary JSONB as a bytea  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Oct 31, 2018 at 7:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
If we're going to expose the
internal format, let's just change the definition of the type's binary
I/O format, thereby getting a win for purposes like COPY BINARY as well.
We'd need to ensure that jsonb_recv could tell whether it was seeing the
old or new format, but at worst that'd require prepending a header of
some sort.  (In practice, I suspect we'd end up with a wire-format
definition that isn't exactly the bits-on-disk, but something easily
convertible to/from that and more easily verifiable by jsonb_recv.
Numeric subfields, for instance, ought to match the numeric wire
format, which IIRC isn't exactly the bits-on-disk either.)

How would this work from the driver's and application's perspective?  What does the driver do when receiving JSONB data?

Applications currently receive JSON strings when reading JSONB data, and the driver presumably has to stay compatible with that.  Does that mean the driver transparently converts JSONB to JSON before handing it to the application?  That scales better than doing it in Postgres itself, but is still the kind of inefficiency we're trying to avoid.

We want to convert JSONB directly to language-native objects.  This shouldn't be the responsibility of the Postgres driver, since the conversion is complex and can be done in different ways, such as instantiating objects from a class hierarchy vs instantiating generic containers, or eager vs lazy conversion.  Applications that are sensitive to JSONB performance likely want control over these aspects.  Postgres drivers aren't coupled to specific JSON parsers; they shouldn't be coupled to specific JSONB parsers either.

So, AFAICS, when the application requests JSONB data, the driver has to hand it the raw JSONB bytes.  But that's incompatible with what currently happens.  To preserve compatibility, does the application have to opt in by setting some out-of-band per-query per-result-column flags to tell the driver how it wants the JSONB data returned?  That's workable in principle but bloats every driver's API with some rarely-used performance feature.  Seems much simpler to put this into the query.

The idea behind the proposal is to improve efficiency by avoiding conversions, and the most straightforward way to do that is for every layer to pass through the raw bytes.  With an explicit conversion to BYTEA in the query, this is automatic without any changes to drivers, since every layer already knows to leave BYTEAs untouched.

I don't have an argument against _also_ adding a binary format version 2 for JSONB once we define a portable JSONB format; but I am not sure it alone solves the problem we have.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: partitioned indexes and tablespaces
Следующее
От: Corey Huinker
Дата:
Сообщение: Re: COPY FROM WHEN condition