Обсуждение: Add jsonb_compact(...) for whitespace-free jsonb to text

Поиск
Список
Период
Сортировка

Add jsonb_compact(...) for whitespace-free jsonb to text

От
Sehrope Sarkuni
Дата:
Hi,

The default text representation of jsonb adds whitespace in between key/value pairs (after the colon ":") and after successive properties (after the comma ","):

postgres=# SELECT '{"b":2,"a":1}'::jsonb::text;
       text       
------------------
 {"a": 1, "b": 2}
(1 row)

AFAIK, there's also no guarantee on the specific order of the resulting properties in the text representation either. I would suppose it's fixed for a given jsonb value within a database major version but across major versions it could change (if the underlying representation changes).

I originally ran into this when comparing the hashes of the text representation of jsonb columns. The results didn't match up because the javascript function JSON.stringify(...) does not add any extra whitespace.

It'd be nice to have a stable text representation of a jsonb value with minimal whitespace. The latter would also save a few bytes per record in text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).

Attached is a *very* work in progress patch that adds a jsonb_compact(jsonb)::text function. It generates a text representation without extra whitespace but does not yet try to enforce a stable order of the properties within a jsonb value.

Here's some sample usage:

postgres=# SELECT x::text, jsonb_compact(x) FROM (VALUES ('{}'::jsonb), ('{"a":1,"b":2}'), ('[1,2,3]'), ('{"a":{"b":1}}')) AS t(x);
        x         | jsonb_compact 
------------------+---------------
 {}               | {}
 {"a": 1, "b": 2} | {"a":1,"b":2}
 [1, 2, 3]        | [1,2,3]
 {"a": {"b": 1}}  | {"a":{"b":1}}
(4 rows)

There aren't any tests yet but I'd like to continue working on it for inclusion in 9.7. I'm posting it now to see if there's interest in the idea and get some feedback on the approach (this is my first real patch...).

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Вложения

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Andrew Dunstan
Дата:

On 04/24/2016 06:02 PM, Sehrope Sarkuni wrote:
>
> AFAIK, there's also no guarantee on the specific order of the 
> resulting properties in the text representation either. I would 
> suppose it's fixed for a given jsonb value within a database major 
> version but across major versions it could change (if the underlying 
> representation changes).


The order is fixed and very unlikely to change, as it was chosen quite 
deliberately to help ensure efficient processing. Any change in on-disk 
representation of data types is something we work very hard to avoid, as 
it makes it impossible to run pg_upgrade.

It's true that the storage order of keys is not terribly intuitive.

Note that the json type, unlike jsonb, preserves exactly the white space 
and key order of its input. In fact, the input is exactly what it stores.

cheers

andrew




Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Stephen Frost
Дата:
* Andrew Dunstan (andrew@dunslane.net) wrote:
> On 04/24/2016 06:02 PM, Sehrope Sarkuni wrote:
> >AFAIK, there's also no guarantee on the specific order of the
> >resulting properties in the text representation either. I would
> >suppose it's fixed for a given jsonb value within a database major
> >version but across major versions it could change (if the
> >underlying representation changes).
>
> The order is fixed and very unlikely to change, as it was chosen
> quite deliberately to help ensure efficient processing. Any change
> in on-disk representation of data types is something we work very
> hard to avoid, as it makes it impossible to run pg_upgrade.

We do, from time-to-time, change on-disk formats in a
backwards-compatible way though.  In any case, it's my understanding
that we don't *guarantee* any ordering currently and therefore we should
discourage users from depending on it.  If we *are* going to guarantee
ordering, then we should document what that ordering is.

Thanks!

Stephen

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Sehrope Sarkuni
Дата:
On Sun, Apr 24, 2016 at 9:27 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Andrew Dunstan (andrew@dunslane.net) wrote:
> On 04/24/2016 06:02 PM, Sehrope Sarkuni wrote:
> >AFAIK, there's also no guarantee on the specific order of the
> >resulting properties in the text representation either. I would
> >suppose it's fixed for a given jsonb value within a database major
> >version but across major versions it could change (if the
> >underlying representation changes).
>
> The order is fixed and very unlikely to change, as it was chosen
> quite deliberately to help ensure efficient processing. Any change
> in on-disk representation of data types is something we work very
> hard to avoid, as it makes it impossible to run pg_upgrade.

We do, from time-to-time, change on-disk formats in a
backwards-compatible way though.  In any case, it's my understanding
that we don't *guarantee* any ordering currently and therefore we should
discourage users from depending on it.  If we *are* going to guarantee
ordering, then we should document what that ordering is.

Yes that's the idea, namely to have a fixed text format that will not change across releases. If the on-disk representation is already supports that then this could just be a doc change (assuming there's agreement that it's a good idea and said guarantee will be maintained).

Separately, I think the compact (i.e. whitespace free) output is useful on it's own. It adds up to two bytes per key/value pair (one after the colon and one after the comma) so the more keys you have the more the savings.

Here's a (contrived) example to show the size difference when serializing information_schema.columns. The row_to_json(...) function returns whitespace free output (as json, not jsonb) so it's a proxy for json_compact(..). It comes out to 7.5% smaller than the default jsonb text format:

app=> SELECT  
  MAX((SELECT COUNT(*) FROM json_object_keys(x))) AS num_keys,
  AVG(length(x::text)) AS json_text,
  AVG(length(x::jsonb::text)) AS jsonb_text,
  AVG(length(x::text)) / AVG(length(x::jsonb::text)) AS ratio
FROM (SELECT row_to_json(z.*) AS x
      FROM information_schema.columns z) t;
 num_keys |       json_text       |      jsonb_text       |         ratio          
----------+-----------------------+-----------------------+------------------------
       44 | 1077.0748522652659225 | 1164.0748522652659225 | 0.92526253803121012857
(1 row)

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Ryan Pedela
Дата:
On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
Hi,

The default text representation of jsonb adds whitespace in between key/value pairs (after the colon ":") and after successive properties (after the comma ","):

postgres=# SELECT '{"b":2,"a":1}'::jsonb::text;
       text       
------------------
 {"a": 1, "b": 2}
(1 row)

AFAIK, there's also no guarantee on the specific order of the resulting properties in the text representation either. I would suppose it's fixed for a given jsonb value within a database major version but across major versions it could change (if the underlying representation changes).

I originally ran into this when comparing the hashes of the text representation of jsonb columns. The results didn't match up because the javascript function JSON.stringify(...) does not add any extra whitespace.

It'd be nice to have a stable text representation of a jsonb value with minimal whitespace. The latter would also save a few bytes per record in text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).

+1

I cannot comment on the patch itself, but I welcome jsonb_compact() or some way to get JSON with no inserted whitespace.

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Stephen Frost
Дата:
* Ryan Pedela (rpedela@datalanche.com) wrote:
> On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
> > The default text representation of jsonb adds whitespace in between
> > key/value pairs (after the colon ":") and after successive properties
> > (after the comma ","):

[...]

> > It'd be nice to have a stable text representation of a jsonb value with
> > minimal whitespace. The latter would also save a few bytes per record in
> > text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
>
> +1
>
> I cannot comment on the patch itself, but I welcome jsonb_compact() or some
> way to get JSON with no inserted whitespace.

As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace.  It's not like
representing numbers using ASCII characters is terribly efficient
either.

Compression might be another option, though that's certainly less
flexible and only (easily) used in combination with SSL, today.

Thanks!

Stephen

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Merlin Moncure
Дата:
On Sun, Apr 24, 2016 at 8:16 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> Note that the json type, unlike jsonb, preserves exactly the white space and
> key order of its input. In fact, the input is exactly what it stores.

That is true, but the json serialization functions (to_json etc)
really out to have the same whitespace strategy is jsonb text out.  Of
the two ways it's currently done, the json serialization variant seems
better but a completely compact variant seems like a good idea basis
of efficiency.

merlin



Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Dave Cramer
Дата:
On 26 April 2016 at 12:49, Stephen Frost <sfrost@snowman.net> wrote:
* Ryan Pedela (rpedela@datalanche.com) wrote:
> On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
> > The default text representation of jsonb adds whitespace in between
> > key/value pairs (after the colon ":") and after successive properties
> > (after the comma ","):

[...]

> > It'd be nice to have a stable text representation of a jsonb value with
> > minimal whitespace. The latter would also save a few bytes per record in
> > text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
>
> +1
>
> I cannot comment on the patch itself, but I welcome jsonb_compact() or some
> way to get JSON with no inserted whitespace.

As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace.  It's not like
representing numbers using ASCII characters is terribly efficient
either.


+1 

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Merlin Moncure
Дата:
On Tue, Apr 26, 2016 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Ryan Pedela (rpedela@datalanche.com) wrote:
>> On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
>> > The default text representation of jsonb adds whitespace in between
>> > key/value pairs (after the colon ":") and after successive properties
>> > (after the comma ","):
>
> [...]
>
>> > It'd be nice to have a stable text representation of a jsonb value with
>> > minimal whitespace. The latter would also save a few bytes per record in
>> > text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
>>
>> +1
>>
>> I cannot comment on the patch itself, but I welcome jsonb_compact() or some
>> way to get JSON with no inserted whitespace.
>
> As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
> compact JSON format to reduce the amount of traffic over the wire or to
> do things with on the client side, we should probably come up with a
> binary format, rather than just hack out the whitespace.  It's not like
> representing numbers using ASCII characters is terribly efficient
> either.

-1

This will benefit pretty much nobody unless you are writing a hand
crafted C application that consumes and processes the data directly.
I'd venture to guess this is a tiny fraction of pg users these days.
I do not understand at all the objection to removing whitespace.
Extra whitespace does nothing but pad the document as humans will
always run the document through a prettifier tuned to their specific
requirements (generally starting with, intelligent placement of
newlines) if reading directly.

Also, binary formats are not always smaller than text formats.

> Compression might be another option, though that's certainly less
> flexible and only (easily) used in combination with SSL, today.

right, exactly.

merlin



Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Stephen Frost
Дата:
* Merlin Moncure (mmoncure@gmail.com) wrote:
> On Tue, Apr 26, 2016 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
> > compact JSON format to reduce the amount of traffic over the wire or to
> > do things with on the client side, we should probably come up with a
> > binary format, rather than just hack out the whitespace.  It's not like
> > representing numbers using ASCII characters is terribly efficient
> > either.
>
> -1
>
> This will benefit pretty much nobody unless you are writing a hand
> crafted C application that consumes and processes the data directly.

That's not accurate.  All that's needed is for the libraries which
either wrap libpq or re-implement it to be updated to understand the
format and then convert the data into whatever structure makes sense for
the language (or library that the language includes for working with
JSON data).

One of the unfortunate realities with JSON is that there isn't a
terribly good binary representation, afaik.  As I understand it, BSON
doesn't support all the JSON structures that we do; if it did, I'd
suggest we provide a way to convert our structure into BSON.

> I'd venture to guess this is a tiny fraction of pg users these days.
> I do not understand at all the objection to removing whitespace.
> Extra whitespace does nothing but pad the document as humans will
> always run the document through a prettifier tuned to their specific
> requirements (generally starting with, intelligent placement of
> newlines) if reading directly.

The objection is that it's a terribly poor solution as it simply makes
things ugly for a pretty small amount of improvement.  Looking at it
from the perspective of just "whitespace is bad!" it might look like a
good answer to just remove whitespace, but we should be looking at it
from the perspective of "how do we make this more efficient?".  Under
that lense, removing whitespace appears to be minimally effective
whereas passing the data back in a binary structure looks likely to
greatly improve the efficiency on a number of levels.

> Also, binary formats are not always smaller than text formats.

While true, I don't think that would be true in this case.

Of course, there's nothing like actually trying it and seeing.

Thanks!

Stephen

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
"David G. Johnston"
Дата:
On Sun, Apr 24, 2016 at 3:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
It'd be nice to have a stable text representation of a jsonb value with minimal whitespace. The latter would also save a few bytes per record in text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
Attached is a *very* work in progress patch that adds a jsonb_compact(jsonb)::text function. It generates a text representation without extra whitespace but does not yet try to enforce a stable order of the properties within a jsonb value.


This doesn't impact backups unless you do away with the function and change the output i/o function for the type.​  In that scenario the function is no longer necessary.

David J.
 

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
"David G. Johnston"
Дата:
On Sun, Apr 24, 2016 at 3:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
Attached is a *very* work in progress patch that adds a jsonb_compact(jsonb)::text function. It generates a text representation without extra whitespace but does not yet try to enforce a stable order of the properties within a jsonb value.

​I think that having a jsonb_compact function that complements the existing jsonb_pretty function is a well scoped and acceptable​ feature.  I do not believe that it should also take on the role of canonicalization.

I'd suggest that any discussions regarding stability of jsonb output be given its own thread.

That topic also seems separate from a discussion on how to implement a binary transport protocol for jsonb.

​Lastly would be whether we change our default text representation so that users utilizing COPY get the added benefit of a maximally minimized text representation.

As an aside on the last topic, has there ever been considered to have a way to specify a serialization function to use for a given type (or maybe column) specified in a copy command?

Something like: COPY [...] WITH (jsonb USING jsonb_compact)

I'm thinking this would hard and undesirable given the role copy plays and limited intelligence that it has in order to maximize its efficiency in fulfilling its role.

Backups get compressed already so bandwidth seems the bigger goal there.  Otherwise I'd say that we lack any kind of overwhelming evidence that making such a change would be warranted.

While these are definitely related topics it doesn't seem like any are pre-requisites for the others.  I think this thread is going to become hard to follow and trail off it continues to try and address all of these topics randomly as people see fit to reply.  And it will quickly become hard for anyone to jump in and understand the topics at hand.

David J.

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Ryan Pedela
Дата:
On Tue, Apr 26, 2016 at 10:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Ryan Pedela (rpedela@datalanche.com) wrote:
> On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
> > The default text representation of jsonb adds whitespace in between
> > key/value pairs (after the colon ":") and after successive properties
> > (after the comma ","):

[...]

> > It'd be nice to have a stable text representation of a jsonb value with
> > minimal whitespace. The latter would also save a few bytes per record in
> > text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
>
> +1
>
> I cannot comment on the patch itself, but I welcome jsonb_compact() or some
> way to get JSON with no inserted whitespace.

As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace.  It's not like
representing numbers using ASCII characters is terribly efficient
either.

Why build a Ferrari when a skateboard would suffice? Besides, that doesn't help one of the most common cases for JSONB: REST APIs.

Now that PG fully supports JSON, a user can use PG to construct the JSON payload of a REST API request. Then the web server would simply be a pass-through for the JSON payload. I personally have this use case, it is not hypothetical. However currently, a user must parse the JSON string from PG and re-stringify it to minimize the whitespace. Given that HTTP is text-based, removing all extraneous whitespace is the best way to compress it, and on top of that you can do gzip compression. Unless you are suggesting that the binary format is just a gzipped version of the minimized text format, I don't see how a binary format helps at all in the REST API case.

In addition, every JSON implementation I have ever seen fully minimizes JSON by default. PG appears to deviate from standard practice for no apparent reason.

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
"David G. Johnston"
Дата:
On Thu, Apr 28, 2016 at 10:00 AM, Ryan Pedela <rpedela@datalanche.com> wrote:
On Tue, Apr 26, 2016 at 10:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Ryan Pedela (rpedela@datalanche.com) wrote:
> On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
> > The default text representation of jsonb adds whitespace in between
> > key/value pairs (after the colon ":") and after successive properties
> > (after the comma ","):

[...]

> > It'd be nice to have a stable text representation of a jsonb value with
> > minimal whitespace. The latter would also save a few bytes per record in
> > text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
>
> +1
>
> I cannot comment on the patch itself, but I welcome jsonb_compact() or some
> way to get JSON with no inserted whitespace.

As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace.  It's not like
representing numbers using ASCII characters is terribly efficient
either.

Why build a Ferrari when a skateboard would suffice? Besides, that doesn't help one of the most common cases for JSONB: REST APIs.

​I'm agreeing with this sentiment.  This isn't an either-or situation so argue the white-space removal on its own merits.  The fact that we might implement a binary representation in the future doesn't, for me, influence whether we make this white-space change now.
 

Now that PG fully supports JSON, a user can use PG to construct the JSON payload of a REST API request. Then the web server would simply be a pass-through for the JSON payload. I personally have this use case, it is not hypothetical.
However currently, a user must parse the JSON string from PG and re-stringify it to minimize the whitespace. Given that HTTP is text-based, removing all extraneous whitespace is the best way to compress it, and on top of that you can do gzip compression.

Can you clarify what you mean by "and on top of that you can do gzip compression"?​​

Unless you are suggesting that the binary format is just a gzipped version of the minimized text format, I don't see how a binary format helps at all in the REST API case.


No, I'm pretty sure you still end up with uncompressed text in the application layer.​
 
In addition, every JSON implementation I have ever seen fully minimizes JSON by default. PG appears to deviate from standard practice for no apparent reason.

Sorry to nit-pick but that's called convention - the standard is likely silent on this point.  And its not like this was done in a vacuum - why is this only coming up now and not, say, during the beta?  Is it surprising that this seemingly easy-to-overlook dynamic was not explicitly addressed by the author and reviewer of the patch, especially when implementation of said feature consisted of a lot more things of greater import and impact?

David J.

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
David Fetter
Дата:
On Wed, Apr 27, 2016 at 05:05:18PM -0400, Stephen Frost wrote:
> * Merlin Moncure (mmoncure@gmail.com) wrote:
> > On Tue, Apr 26, 2016 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > > As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
> > > compact JSON format to reduce the amount of traffic over the wire or to
> > > do things with on the client side, we should probably come up with a
> > > binary format, rather than just hack out the whitespace.  It's not like
> > > representing numbers using ASCII characters is terribly efficient
> > > either.
> > 
> > -1
> > 
> > This will benefit pretty much nobody unless you are writing a hand
> > crafted C application that consumes and processes the data directly.
> 
> That's not accurate.  All that's needed is for the libraries which
> either wrap libpq or re-implement it to be updated to understand the
> format and then convert the data into whatever structure makes sense for
> the language (or library that the language includes for working with
> JSON data).
> 
> One of the unfortunate realities with JSON is that there isn't a
> terribly good binary representation, afaik.  As I understand it, BSON
> doesn't support all the JSON structures that we do; if it did, I'd
> suggest we provide a way to convert our structure into BSON.

How about MessagePack?
http://msgpack.org/index.html

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Alvaro Herrera
Дата:
David G. Johnston wrote:
> On Thu, Apr 28, 2016 at 10:00 AM, Ryan Pedela <rpedela@datalanche.com>
> wrote:

> > In addition, every JSON implementation I have ever seen fully minimizes
> > JSON by default. PG appears to deviate from standard practice for no
> > apparent reason.
> 
> Sorry to nit-pick but that's called convention - the standard is likely
> silent on this point.  And its not like this was done in a vacuum - why is
> this only coming up now and not, say, during the beta?  Is it surprising
> that this seemingly easy-to-overlook dynamic was not explicitly addressed
> by the author and reviewer of the patch, especially when implementation of
> said feature consisted of a lot more things of greater import and impact?

Actually we did have someone come up with a patch to "normalize" how
JSON stuff was output, because our code seems to do it in three
different, inconsistent ways.  And our response was for them to get the
heck outta here, because we're so much in love with our current
practice that we don't need to refactor the three implementations into a
single one.

Personally I don't see any reason we need to care one bit about how the
irrelevant whitespace is laid out, in other words why shouldn't we just
strip them all out?  Surely there's no backwards compatibility argument
there.  If somebody wants to see a nicely laid out structure they can
use the prettification function.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Sehrope Sarkuni
Дата:
On Wed, Apr 27, 2016 at 7:09 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Sun, Apr 24, 2016 at 3:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
Attached is a *very* work in progress patch that adds a jsonb_compact(jsonb)::text function. It generates a text representation without extra whitespace but does not yet try to enforce a stable order of the properties within a jsonb value.

​I think that having a jsonb_compact function that complements the existing jsonb_pretty function is a well scoped and acceptable​ feature.  I do not believe that it should also take on the role of canonicalization.

I'd suggest that any discussions regarding stability of jsonb output be given its own thread.

I'm fine with removing the stability aspect. I think it's nice-to-have but it definitely complicates things and has longer term consequences.
 
That topic also seems separate from a discussion on how to implement a binary transport protocol for jsonb.

Defining a binary format for jsonb is definitely out of scope.
 
​Lastly would be whether we change our default text representation so that users utilizing COPY get the added benefit of a maximally minimized text representation.

I see this applying to both COPY and the text format on the wire. The latter has the added benefit that it works with existing clients without any driver changes.

Outside of being a bit more pleasant in psql, I don't see a point in the added whitespace for jsonb::text. Even in psql it only helps with small fields as anything big isn't really legible without indenting it via jsonb_pretty(...).
 
As an aside on the last topic, has there ever been considered to have a way to specify a serialization function to use for a given type (or maybe column) specified in a copy command?

Something like: COPY [...] WITH (jsonb USING jsonb_compact)

I'm thinking this would hard and undesirable given the role copy plays and limited intelligence that it has in order to maximize its efficiency in fulfilling its role.

Backups get compressed already so bandwidth seems the bigger goal there.  Otherwise I'd say that we lack any kind of overwhelming evidence that making such a change would be warranted.

While these are definitely related topics it doesn't seem like any are pre-requisites for the others.  I think this thread is going to become hard to follow and trail off it continues to try and address all of these topics randomly as people see fit to reply.  And it will quickly become hard for anyone to jump in and understand the topics at hand.

That's a really cool idea but agree it's way out of scope for this.

I had a related idea, maybe something similar could be done for psql to set a jsonb output format. That way you could automatically prettify jsonb fields client side.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Andrew Dunstan
Дата:

On 04/28/2016 04:29 PM, Alvaro Herrera wrote:
> David G. Johnston wrote:
>> On Thu, Apr 28, 2016 at 10:00 AM, Ryan Pedela <rpedela@datalanche.com>
>> wrote:
>>> In addition, every JSON implementation I have ever seen fully minimizes
>>> JSON by default. PG appears to deviate from standard practice for no
>>> apparent reason.
>> Sorry to nit-pick but that's called convention - the standard is likely
>> silent on this point.  And its not like this was done in a vacuum - why is
>> this only coming up now and not, say, during the beta?  Is it surprising
>> that this seemingly easy-to-overlook dynamic was not explicitly addressed
>> by the author and reviewer of the patch, especially when implementation of
>> said feature consisted of a lot more things of greater import and impact?
> Actually we did have someone come up with a patch to "normalize" how
> JSON stuff was output, because our code seems to do it in three
> different, inconsistent ways.  And our response was for them to get the
> heck outta here, because we're so much in love with our current
> practice that we don't need to refactor the three implementations into a
> single one.


That's a pretty bad mischaracterization of the discussion. What was 
proposed was broken, as I pointed out to the OP, and then again later to 
you when you asked about it. What he wanted to achieve simply couldn't 
be done they way he was trying to achieve it.

Regarding the present proposal:

I wouldn't necessarily be opposed to us having one or more of the following:

a) suppressing whitespace addition in all json generation and text 
output, possibly governed by a GUC setting so we could maintain 
behaviour compatibility if required
b) a function to remove whitespace from json values, but otherwise 
preserve things like key order
c) a function to pretty-print json similar to the output from jsonb, but 
again preserving key order
d) a function to reorder keys in json so they were sorted according to 
the relevant collation.


None of these things except possibly the last should be terribly 
difficult to do.

cheers

andrew




Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
"Shulgin, Oleksandr"
Дата:
On Fri, Apr 29, 2016 at 3:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 04/28/2016 04:29 PM, Alvaro Herrera wrote:

Actually we did have someone come up with a patch to "normalize" how
JSON stuff was output, because our code seems to do it in three
different, inconsistent ways.  And our response was for them to get the
heck outta here, because we're so much in love with our current
practice that we don't need to refactor the three implementations into a
single one.

That's a pretty bad mischaracterization of the discussion. What was proposed was broken, as I pointed out to the OP, and then again later to you when you asked about it. What he wanted to achieve simply couldn't be done they way he was trying to achieve it.

Yeah, the original request was pretty invalid, but when I've proposed to resubmit just the normalization of whitespace nobody has shown enthusiasm about the idea either:


Regarding the present proposal:

I wouldn't necessarily be opposed to us having one or more of the following:

a) suppressing whitespace addition in all json generation and text output, possibly governed by a GUC setting so we could maintain behaviour compatibility if required

I'm not thrilled about GUC that would silently break stuff.  That being said, if someone's code is dependent on exact placement of whitespace in the JSON text, it's pretty broken already and it's just a matter of time when they hit an issue there.
 
b) a function to remove whitespace from json values, but otherwise preserve things like key order
c) a function to pretty-print json similar to the output from jsonb, but again preserving key order
d) a function to reorder keys in json so they were sorted according to the relevant collation.

None of these things except possibly the last should be terribly difficult to do.

My vote goes to remove all optional whitespace by default and have a single function to prettify it.  Key reordering can then be handled with an optional parameter to such prettifying function.

It would probably make sense model this function after Python's "dump-to-JSON-string" function: https://docs.python.org/2/library/json.html#json.dumps  With the optional parameters for sorting the keys, indentation size and punctuation.  This way all the prettiness enhancements could be contained in a single function w/o the need for generalized interface used in many places.

How about that?

--
Alex

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Merlin Moncure
Дата:
On Wed, Apr 27, 2016 at 4:05 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Merlin Moncure (mmoncure@gmail.com) wrote:
>> On Tue, Apr 26, 2016 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> > As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
>> > compact JSON format to reduce the amount of traffic over the wire or to
>> > do things with on the client side, we should probably come up with a
>> > binary format, rather than just hack out the whitespace.  It's not like
>> > representing numbers using ASCII characters is terribly efficient
>> > either.
>>
>> -1
>>
>> This will benefit pretty much nobody unless you are writing a hand
>> crafted C application that consumes and processes the data directly.
>
> That's not accurate.  All that's needed is for the libraries which
> either wrap libpq or re-implement it to be updated to understand the
> format and then convert the data into whatever structure makes sense for
> the language (or library that the language includes for working with
> JSON data).

Sure, that's pretty easy.   Note, I cowrote the only libpq wrapper
that demystifies pg binary formats, libpqtypes.  I can tell you that
binary formats are much faster than text formats in any cases where
parsing is non trivial -- geo types, timestamp types, containers etc.However I would be very surprised if postgres
binaryformat json
 
would replace language parsing of json in any popular language like
java for common usage.

I'll go further.   Postgres json support has pretty much made our
binary formats obsolete.  The main problem with text format data was
sending complex structured data to the client over our overlapping
escape mechanisms; client side parsing was slow and in certain
scenarios backslashes would proliferate exponentially.    json support
eliminates all of those problems and the performance advantages of
binary support (mainly parsing of complex types)  rarely justify the
development headaches.  These days, for the vast majority of data
traffic to the application it's a single row, single column json
coming in and out of the database.

>> I'd venture to guess this is a tiny fraction of pg users these days.
>> I do not understand at all the objection to removing whitespace.
>> Extra whitespace does nothing but pad the document as humans will
>> always run the document through a prettifier tuned to their specific
>> requirements (generally starting with, intelligent placement of
>> newlines) if reading directly.
>
> The objection is that it's a terribly poor solution as it simply makes
> things ugly for a pretty small amount of improvement.  Looking at it
> from the perspective of just "whitespace is bad!"

Whitespace is bad, because it simply pads documents on every stage of
processing.  You simply can't please everyone in terms of where it
should go so you don't and reserve that functionality for
prettification functions.  json is for *data serialization*.  We
should not inject extra characters for aesthetics in the general case;
reducing memory consumption by 10% on both the client and server
during parse is a feature.

Andrew mentions several solutions.  I like them all except I would
prefer not to introduce a GUC for controlling the output format.  I do
not think it's a good idea to set the expectation that clients can
rely on text out byte for byte for any type including json.

merlin



Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
"David G. Johnston"
Дата:
On Fri, Apr 29, 2016 at 7:15 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
Andrew mentions several solutions.  I like them all except I would
prefer not to introduce a GUC for controlling the output format.  I do
not think it's a good idea to set the expectation that clients can
rely on text out byte for byte for any type including json.

​+1​

​I agree on the GUC point and on the general ​desirability of making jsonb output not include insignificant whitespace.

​There seems to be enough coders who agree to this principle: could one of you please write a patch and start a new thread specifically for this change.  If we go that route the need for the subject of this thread becomes moot.

Thanks!

David J.

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Alvaro Herrera
Дата:
Thanks Alex for finding the previous thread.

Andrew Dunstan wrote:
> 
> On 04/28/2016 04:29 PM, Alvaro Herrera wrote:

> >Actually we did have someone come up with a patch to "normalize" how
> >JSON stuff was output, because our code seems to do it in three
> >different, inconsistent ways.  And our response was for them to get the
> >heck outta here, because we're so much in love with our current
> >practice that we don't need to refactor the three implementations into a
> >single one.
> That's a pretty bad mischaracterization of the discussion.

Sorry, I don't agree with that.

> What was proposed
> was broken, as I pointed out to the OP, and then again later to you when you
> asked about it.

I didn't find your argument convincing back then, but this doesn't have
enough value to me that I can spend much time arguing about it.

> I wouldn't necessarily be opposed to us having one or more of the following:
> 
> a) suppressing whitespace addition in all json generation and text output,
> possibly governed by a GUC setting so we could maintain behaviour
> compatibility if required

Sounds great to me, because we can unify the code so that we have *one*
piece to convert json to text instead of N, and not worry about the
non-relevant whitespace.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Merlin Moncure
Дата:
On Fri, Apr 29, 2016 at 12:31 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Thanks Alex for finding the previous thread.
>
> Andrew Dunstan wrote:
>>
>> On 04/28/2016 04:29 PM, Alvaro Herrera wrote:
>
>> >Actually we did have someone come up with a patch to "normalize" how
>> >JSON stuff was output, because our code seems to do it in three
>> >different, inconsistent ways.  And our response was for them to get the
>> >heck outta here, because we're so much in love with our current
>> >practice that we don't need to refactor the three implementations into a
>> >single one.
>> That's a pretty bad mischaracterization of the discussion.
>
> Sorry, I don't agree with that.
>
>> What was proposed
>> was broken, as I pointed out to the OP, and then again later to you when you
>> asked about it.
>
> I didn't find your argument convincing back then, but this doesn't have
> enough value to me that I can spend much time arguing about it.
>
>> I wouldn't necessarily be opposed to us having one or more of the following:
>>
>> a) suppressing whitespace addition in all json generation and text output,
>> possibly governed by a GUC setting so we could maintain behaviour
>> compatibility if required
>
> Sounds great to me, because we can unify the code so that we have *one*
> piece to convert json to text instead of N, and not worry about the
> non-relevant whitespace.

hurk -- no objection to unifying the text serialization code (if that
proves reasonable to do).   However I think using GUC to control
output format is not a good idea.  We did this for bytea and it did
not turn out well; much better to have code anticipating precise
formats to check the server version.  This comes up over and over
again: the GUC is not a solution for backwards compatibility...in
fact, it's pandora's box (see:
https://dev.mysql.com/doc/refman/5.5/en/sql-mode.html) .

merlin



Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Andrew Dunstan
Дата:

On 04/29/2016 02:34 PM, Merlin Moncure wrote:

>>> I wouldn't necessarily be opposed to us having one or more of the following:
>>>
>>> a) suppressing whitespace addition in all json generation and text output,
>>> possibly governed by a GUC setting so we could maintain behaviour
>>> compatibility if required
>> Sounds great to me, because we can unify the code so that we have *one*
>> piece to convert json to text instead of N, and not worry about the
>> non-relevant whitespace.
> hurk -- no objection to unifying the text serialization code (if that
> proves reasonable to do).   However I think using GUC to control
> output format is not a good idea.  We did this for bytea and it did
> not turn out well; much better to have code anticipating precise
> formats to check the server version.  This comes up over and over
> again: the GUC is not a solution for backwards compatibility...in
> fact, it's pandora's box (see:
> https://dev.mysql.com/doc/refman/5.5/en/sql-mode.html) .
>



OK, fine by me. It's trivial to do for jsonb - all the white space comes 
from on function, AFAIK. For json it's a bit more spread out, but only 
in  one or two files. Here's a question: say we have this table: 
mytable:(x text, y json). now we do: "select to_json(r) from mytable r;" 
Now y is a json field, which preserves the whitespace of the input. Do 
we squash the whitespace out or not when producing the output of this 
query? I'm inclined to say yes we do, but it's not a slam-dunk no-brainer.

One other point: I think we really need most of these pieces - if we are 
going to squash the whitespace we need functions to do that cleanly for 
json and to pretty-print json.

cheers

andrew



Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Merlin Moncure
Дата:
On Fri, Apr 29, 2016 at 4:06 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> One other point: I think we really need most of these pieces - if we are
> going to squash the whitespace we need functions to do that cleanly for json
> and to pretty-print json.

I don't think it should be squashed per se -- we just don't *add* any
whitespace.  So the current behavior of to_json,

postgres=# select to_json(q) from (select 1, '{"a"  :  "foo"}'::json) q;               to_json
───────────────────────────────────────{"?column?":1,"json":{"a"  :  "foo"}}

...is correct to me on the premise that the user deliberately chose
the whitespace preserving json type and did not run compat on it.
However,
postgres=# select row_to_json(q) from (select 1, '{"a"  :  "foo"}'::jsonb) q;            row_to_json
─────────────────────────────────────{"?column?":1,"jsonb":{"a": "foo"}}

really ought to render (note lack of space after "a"):{"?column?":1,"jsonb":{"a":"foo"}}

This is a simple matter of removing spaces in the occasional C string
literal in the serialization routines and adding a json_pretty
function.

merlin



Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Andrew Dunstan
Дата:

On 04/29/2016 06:11 PM, Merlin Moncure wrote:


> This is a simple matter of removing spaces in the occasional C string
> literal in the serialization routines and adding a json_pretty
> function.


I spent a few hours on this. See 
<https://bitbucket.org/adunstan/pgdevel/commits/branch/jsonformat> for 
WIP - there are three commits. No regression tests yet for the two new 
functions (json_squash and json_pretty), Otherwise fairly complete. 
Removing whitespace generation was pretty simple for both json and jsonb.

cheers

andrew





Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
"Shulgin, Oleksandr"
Дата:
On Sun, May 1, 2016 at 3:22 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 04/29/2016 06:11 PM, Merlin Moncure wrote:

This is a simple matter of removing spaces in the occasional C string
literal in the serialization routines and adding a json_pretty
function.

I spent a few hours on this. See <https://bitbucket.org/adunstan/pgdevel/commits/branch/jsonformat> for WIP - there are three commits. No regression tests yet for the two new functions (json_squash and json_pretty), Otherwise fairly complete. Removing whitespace generation was pretty simple for both json and jsonb.

Looks good, thank you!

It would make sense IMO to rename FormatState's `indent' field as `pretty': it's being used to add whitespace between the punctuation, not only at start of a line.  I'd also move the "if (indent)" check out of add_indent(): just don't call it if no indent is needed.

I'll try to play with the patch to produce some regression tests for the new functions.

--
Alex

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Andrew Dunstan
Дата:

On 05/02/2016 04:56 AM, Shulgin, Oleksandr wrote:
> On Sun, May 1, 2016 at 3:22 AM, Andrew Dunstan <andrew@dunslane.net 
> <mailto:andrew@dunslane.net>> wrote:
>
>
>     On 04/29/2016 06:11 PM, Merlin Moncure wrote:
>
>         This is a simple matter of removing spaces in the occasional C
>         string
>         literal in the serialization routines and adding a json_pretty
>         function.
>
>
>     I spent a few hours on this. See
>     <https://bitbucket.org/adunstan/pgdevel/commits/branch/jsonformat>
>     for WIP - there are three commits. No regression tests yet for the
>     two new functions (json_squash and json_pretty), Otherwise fairly
>     complete. Removing whitespace generation was pretty simple for
>     both json and jsonb.
>
>
> Looks good, thank you!
>
> It would make sense IMO to rename FormatState's `indent' field as 
> `pretty': it's being used to add whitespace between the punctuation, 
> not only at start of a line.  I'd also move the "if (indent)" check 
> out of add_indent(): just don't call it if no indent is needed.
>
> I'll try to play with the patch to produce some regression tests for 
> the new functions.
>
>


It was done the way it was to be as consistent as possible with how it's 
done for jsonb (c.f. jsonb.c:JsonbToCStringWorker and jsonb.c::add_indent).

cheers

andrew






Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Oleksandr Shulgin
Дата:
On Mon, May 2, 2016 at 4:04 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 05/02/2016 04:56 AM, Shulgin, Oleksandr wrote:
On Sun, May 1, 2016 at 3:22 AM, Andrew Dunstan <andrew@dunslane.net <mailto:andrew@dunslane.net>> wrote:

    On 04/29/2016 06:11 PM, Merlin Moncure wrote:

        This is a simple matter of removing spaces in the occasional C
        string
        literal in the serialization routines and adding a json_pretty
        function.


    I spent a few hours on this. See
    <https://bitbucket.org/adunstan/pgdevel/commits/branch/jsonformat>
    for WIP - there are three commits. No regression tests yet for the
    two new functions (json_squash and json_pretty), Otherwise fairly
    complete. Removing whitespace generation was pretty simple for
    both json and jsonb.


Looks good, thank you!

It would make sense IMO to rename FormatState's `indent' field as `pretty': it's being used to add whitespace between the punctuation, not only at start of a line.  I'd also move the "if (indent)" check out of add_indent(): just don't call it if no indent is needed.

I'll try to play with the patch to produce some regression tests for the new functions.


It was done the way it was to be as consistent as possible with how it's done for jsonb (c.f. jsonb.c:JsonbToCStringWorker and jsonb.c::add_indent).

Ah, I see.

Simply taking regression tests for jsonb_pretty() and using them against json_pretty() revealed a bug with extra indent being added before every array/object start.  Attached patch fixes that and adds the regression tests.

For json_squash() I've taken the same three test values, for lack of a better idea at the moment.  At least we are testing key order stability and that no whitespace is spit out.

Regards,
--
Alex

Вложения

Re: Add jsonb_compact(...) for whitespace-free jsonb to text

От
Jim Nasby
Дата:
On 4/29/16 8:56 AM, Shulgin, Oleksandr wrote:
> It would probably make sense model this function after Python's
> "dump-to-JSON-string"
> function: https://docs.python.org/2/library/json.html#json.dumps  With
> the optional parameters for sorting the keys, indentation size and
> punctuation.  This way all the prettiness enhancements could be
> contained in a single function w/o the need for generalized interface
> used in many places.

+1. I've found the output functions of json.dumps to be very handy.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461