Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

Поиск
Список
Период
Сортировка
От Terry Laurenzo
Тема Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Дата
Msg-id AANLkTi=Lc2wOp1XMguj_OMQoWYTW5V507gbtirZur-xe@mail.gmail.com
обсуждение исходный текст
Ответ на Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Список pgsql-hackers
On Tue, Oct 19, 2010 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Oct 19, 2010 at 3:40 PM, Joseph Adams
<joeyadams3.14159@gmail.com> wrote:
> On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think we should take a few steps back and ask why we think that
>> binary encoding is the way to go.  We store XML as text, for example,
>> and I can't remember any complaints about that on -bugs or
>> -performance, so why do we think JSON will be different?  Binary
>> encoding is a trade-off.  A well-designed binary encoding should make
>> it quicker to extract a small chunk of a large JSON object and return
>> it; however, it will also make it slower to return the whole object
>> (because you're adding serialization overhead).  I haven't seen any
>> analysis of which of those use cases is more important and why.
>
> Speculation: the overhead involved with retrieving/sending and
> receiving/storing JSON (not to mention TOAST
> compression/decompression) will be far greater than that of
> serializing/unserializing.

I speculate that your speculation is incorrect.  AIUI, we, unlike
$COMPETITOR, tend to be CPU-bound rather than IO-bound on COPY.  But
perhaps less speculation and more benchmarking is in order.


After spending a week in the morass of this, I have to say that I am less certain than I was on any front regarding the text/binary distinction.  I'll take some time and benchmark different cases.  My hypothesis is that a well implemented binary structure and conversions will add minimal overhead in the IO + Validate case which would be the typical in/out flow.  It could be substantially faster for binary send/receive because the validation step could be eliminated/reduced.  Further storing as binary reduces the overhead of random access to the data by database functions.

I'm envisioning staging this up as follows:
   1. Create a "jsontext".  jsontext uses text as its internal representation.  in/out functions are essentially a straight copy or a copy + validate.
   2. Create a "jsonbinary" type.  This uses an optimized binary format for internal rep and send/receive.  in/out is a parse/transcode operation to standard JSON text.
   3. Internal data access functions and JSON Path require a jsonbinary.
   4. There are implicit casts to/from jsontext and jsonbinary.

I've got a grammar in mind for the binary structure that I'll share later when I've got some more time.  It's inspired by $COMPETITOR's format but a little more sane, using type tags that implicitly define the size of the operands, simplifying parsing.

I'll then define the various use cases and benchmark using the different types.  Some examples include such as IO No Validate, IO+Validate, Store and Index, Internal Processing, Internal Composition, etc.

The answer may be to have both a jsontext and jsonbinary type as each will be optimized for a different case.

Make sense?  It may be a week before I get through this.
Terry

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marios Vodas
Дата:
Сообщение: Re: gist DatumGetPointer returns pointer to corrupted data
Следующее
От: "Stephen R. van den Berg"
Дата:
Сообщение: pg_rawdump