Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
От | Robert Haas |
---|---|
Тема | Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP) |
Дата | |
Msg-id | AANLkTimV=t-xZrXs+cGK=c7QGq3XoYo53CKT7jnJB_m2@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP) (Terry Laurenzo <tj@laurenzo.org>) |
Ответы |
Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
(Terry Laurenzo <tj@laurenzo.org>)
|
Список | pgsql-hackers |
On Sun, Oct 24, 2010 at 2:21 AM, Terry Laurenzo <tj@laurenzo.org> wrote: >> It doesn't do particularly well on my previous example of [1,2,3]. It >> comes out slightly shorter on ["a","b","c"] and better if the strings >> need any escaping. I don't think the float4 and float8 formats are >> very useful; how could you be sure that the output was going to look >> the same as the input? Or alternatively that dumping a particular >> object to text and reloading it will produce the same internal >> representation? I think it would be simpler to represent integers >> using a string of digits; that way you can be assured of going from >> text -> binary -> text without change. >> >> Perhaps it would be enough to define the high two bits as follows: 00 >> = array, 01 = object, 10 = string, 11 = number/true/false/null. The >> next 2 bits specify how the length is stored. 00 = remaining 4 bits >> store a length of up to 15 bytes, 01 = remaining 4 bits + 1 additional >> byte store a 12-bit length of up to 4K, 10 = remaining 4 bits + 2 >> additional bytes store a 20-bit length of up to 1MB, 11 = 4 additional >> bytes store a full 32-bit length word. Then, the array, object, and >> string representations can work as you've specified them. Anything >> else can be represented by itself, or perhaps we should say that >> numbers represent themselves and true/false/null are represented by a >> 1-byte sequence, t/f/n (or perhaps we could define 111100{00,01,10} to >> mean those values, since there's no obvious reason for the low bits to >> be non-zero if a 4-bit length word ostensibly follows). >> >> So [1,2,3] = 06 C1 '1' C1 '2' C1 '3' and ["a","b","c"] = 06 81 'a' 81 'b' >> 81 'c' >> >> (I am still worried about the serialization/deserialization overhead >> but that's a different issue.) >> > > Thanks. I'll play around with the bit and numeric encodings you've > recommended. Arrays are certainly the toughest to economize on as a text > encoding has minimum of 2 + n - 1 overhead bytes. Text encoding for objects > has some more wiggle room with 2 + (n-1) + (n*3) extra bytes. I was > admittedly thinking about more complicated objects. > I'm still worried about transcoding overhead as well. If comparing to a > simple blind storage of JSON text with no validation or normalization, there > is obviously no way to beat a straight copy. However, its not outside the > realm of reason to think that it may be possible to match or beat the clock > if comparing against text to text normalization, or perhaps adding slight > overhead to a validator. The advantage to the binary structure is the > ability to scan it hierarchically in sibling order and provide mutation > operations with simple memcpy's as opposed to parse_to_ast -> modify_ast -> > serialize_ast. Yeah, my concern is not whether the overhead will be zero; it's whether it will be small, yet allow large gains on other operations. Like, how much slower will it be to pull out a moderately complex 1MB JSON blob (not just a big string) out of a single-row, single-column table? If it's 5% slower, that's probably OK, since this is a reasonable approximation of a worst-case scenario. If it's 50% slower, that sounds painful. It would also be worth testing with a much smaller size, such as a 1K object with lots of internal structure. In both cases, all data cached in shared_buffers, etc. Then on the flip side how do we do on val[37]["whatever"]? You'd like to hope that this will be significantly faster than the text encoding on both large and small objects. If it's not, there's probably not much point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: