On Tue, Oct 19, 2010 at 11:22 AM, Terry Laurenzo <tj@laurenzo.org> wrote:
> Perhaps we should enumerate the attributes of what would make a good binary
> encoding?
Not sure if we're discussing the internal storage format or the binary
send/recv format, but in my humble opinion, some attributes of a good
internal format are:
1. Lightweight - it'd be really nice for the JSON datatype to be
available in core (even if extra features like JSONPath aren't).
2. Efficiency - Retrieval and storage of JSON datums should be
efficient. The internal format should probably closely resemble the
binary send/recv format so there's a good reason to use it.
A good attribute of the binary send/recv format would be
compatibility. For instance, if MongoDB (which I know very little
about) has binary send/receive, perhaps the JSON data type's binary
send/receive should use it.
Efficient retrieval and update of values in a large JSON tree would be
cool, but would be rather complex, and IMHO, overkill. JSON's main
advantage is that it's sort of a least common denominator of the type
systems of many popular languages, making it easy to transfer
information between them. Having hierarchical key/value store support
would be pretty cool, but I don't think it's what PostgreSQL's JSON
data type should do.
On Tue, Oct 19, 2010 at 3:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think we should take a few steps back and ask why we think that
> binary encoding is the way to go. We store XML as text, for example,
> and I can't remember any complaints about that on -bugs or
> -performance, so why do we think JSON will be different? Binary
> encoding is a trade-off. A well-designed binary encoding should make
> it quicker to extract a small chunk of a large JSON object and return
> it; however, it will also make it slower to return the whole object
> (because you're adding serialization overhead). I haven't seen any
> analysis of which of those use cases is more important and why.
Speculation: the overhead involved with retrieving/sending and
receiving/storing JSON (not to mention TOAST
compression/decompression) will be far greater than that of
serializing/unserializing.