Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Дата
Msg-id AANLkTimV=t-xZrXs+cGK=c7QGq3XoYo53CKT7jnJB_m2@mail.gmail.com
обсуждение исходный текст
Ответ на Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)  (Terry Laurenzo <tj@laurenzo.org>)
Ответы Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)  (Terry Laurenzo <tj@laurenzo.org>)
Список pgsql-hackers
On Sun, Oct 24, 2010 at 2:21 AM, Terry Laurenzo <tj@laurenzo.org> wrote:
>> It doesn't do particularly well on my previous example of [1,2,3].  It
>> comes out slightly shorter on ["a","b","c"] and better if the strings
>> need any escaping.  I don't think the float4 and float8 formats are
>> very useful; how could you be sure that the output was going to look
>> the same as the input?  Or alternatively that dumping a particular
>> object to text and reloading it will produce the same internal
>> representation?  I think it would be simpler to represent integers
>> using a string of digits; that way you can be assured of going from
>> text -> binary -> text without change.
>>
>> Perhaps it would be enough to define the high two bits as follows: 00
>> = array, 01 = object, 10 = string, 11 = number/true/false/null.  The
>> next 2 bits specify how the length is stored.  00 = remaining 4 bits
>> store a length of up to 15 bytes, 01 = remaining 4 bits + 1 additional
>> byte store a 12-bit length of up to 4K, 10 = remaining 4 bits + 2
>> additional bytes store a 20-bit length of up to 1MB, 11 = 4 additional
>> bytes store a full 32-bit length word.  Then, the array, object, and
>> string representations can work as you've specified them.  Anything
>> else can be represented by itself, or perhaps we should say that
>> numbers represent themselves and true/false/null are represented by a
>> 1-byte sequence, t/f/n (or perhaps we could define 111100{00,01,10} to
>> mean those values, since there's no obvious reason for the low bits to
>> be non-zero if a 4-bit length word ostensibly follows).
>>
>> So [1,2,3] = 06 C1 '1' C1 '2' C1 '3' and ["a","b","c"] = 06 81 'a' 81 'b'
>> 81 'c'
>>
>> (I am still worried about the serialization/deserialization overhead
>> but that's a different issue.)
>>
>
> Thanks.  I'll play around with the bit and numeric encodings you've
> recommended.  Arrays are certainly the toughest to economize on as a text
> encoding has minimum of 2 + n - 1 overhead bytes.  Text encoding for objects
> has some more wiggle room with 2 + (n-1) + (n*3) extra bytes.  I was
> admittedly thinking about more complicated objects.
> I'm still worried about transcoding overhead as well.  If comparing to a
> simple blind storage of JSON text with no validation or normalization, there
> is obviously no way to beat a straight copy.  However, its not outside the
> realm of reason to think that it may be possible to match or beat the clock
> if comparing against text to text normalization, or perhaps adding slight
> overhead to a validator.  The advantage to the binary structure is the
> ability to scan it hierarchically in sibling order and provide mutation
> operations with simple memcpy's as opposed to parse_to_ast -> modify_ast ->
> serialize_ast.

Yeah, my concern is not whether the overhead will be zero; it's
whether it will be small, yet allow large gains on other operations.
Like, how much slower will it be to pull out a moderately complex 1MB
JSON blob (not just a big string) out of a single-row, single-column
table?  If it's 5% slower, that's probably OK, since this is a
reasonable approximation of a worst-case scenario.  If it's 50%
slower, that sounds painful.  It would also be worth testing with a
much smaller size, such as a 1K object with lots of internal
structure.  In both cases, all data cached in shared_buffers, etc.

Then on the flip side how do we do on val[37]["whatever"]?  You'd like
to hope that this will be significantly faster than the text encoding
on both large and small objects.  If it's not, there's probably not
much point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: ask for review of MERGE
Следующее
От: Robert Haas
Дата:
Сообщение: Re: knngist - 0.8