Re: jsonb, unicode escapes and escaped backslashes

Поиск
Список
Период
Сортировка
От David G Johnston
Тема Re: jsonb, unicode escapes and escaped backslashes
Дата
Msg-id 1422468830519-5835824.post@n5.nabble.com
обсуждение исходный текст
Ответ на Re: jsonb, unicode escapes and escaped backslashes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: jsonb, unicode escapes and escaped backslashes  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane-2 wrote
> Andrew Dunstan <

> andrew@

> > writes:
>> On 01/27/2015 02:28 PM, Tom Lane wrote:
>>> Well, we can either fix it now or suffer with a broken representation
>>> forever.  I'm not wedded to the exact solution I described, but I think
>>> we'll regret it if we don't change the representation.
>>> 
>>> The only other plausible answer seems to be to flat out reject \u0000.
>>> But I assume nobody likes that.
> 
>> I don't think we can be in the business of rejecting valid JSON.
> 
> Actually, after studying the code a bit, I wonder if we wouldn't be best
> off to do exactly that, at least for 9.4.x.  At minimum we're talking
> about an API change for JsonSemAction functions (which currently get the
> already-de-escaped string as a C string; not gonna work for embedded
> nulls).  I'm not sure if there are already third-party extensions using
> that API, but it seems possible, in which case changing it in a minor
> release wouldn't be nice.  Even ignoring that risk, making sure
> we'd fixed everything seems like more than a day's work, which is as
> much as I for one could spare before 9.4.1.
> 
> So at this point I propose that we reject \u0000 when de-escaping JSON.
> Anybody who's seriously unhappy with that can propose a patch to fix it
> properly in 9.5 or later.

The hybrid option is to reject the values for 9.4.1 but then commit to
removing that hack and fixing this properly in 9.4.2; we can always call
that release 9.5...


> I think the "it would mean rejecting valid JSON" argument is utter
> hogwash.  We already reject, eg, "\u00A0" if you're not using a UTF8
> encoding.  And we reject "1e10000", not because that's invalid JSON
> but because of an implementation restriction of our underlying numeric
> type.  I don't see any moral superiority of that over rejecting "\u0000"
> because of an implementation restriction of our underlying text type.

Am I missing something or has there been no consideration in this "forbid"
plan on whether users will be able to retrieve, even if partially
incorrectly, any jsonb data that has already been stored?  If we mangled
their data on input we should at least return the data and provide them a
chance to manually (or automatically depending on their data) fix our
mistake.

Given we already disallow NUL in text ISTM that allowing said data in other
text-like areas is asking for just the kind of trouble we are seeing here. 
I'm OK with the proposition that those wishing to utilize NUL are relegated
to working with bytea.

From the commit Tom references down-thread:

"However, this led to some perverse results in the case of Unicode
sequences."

Given that said results are not documented in the commit its hard to judge
whether a complete revert is being overly broad...

Anyway, just some observations since I'm not currently a user of JSON. 
Tom's arguments and counter-arguments ring true to me in the general sense. 
The DBA staying on 9.4.0 because of this change probably just needs to be
told to go back to using "json" and then run the update.  Their data has
issues even they stay on 9.4.0 with the more accepting version of jsonb.

David J.




--
View this message in context:
http://postgresql.nabble.com/jsonb-unicode-escapes-and-escaped-backslashes-tp5834962p5835824.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: PQgetssl() and alternative SSL implementations
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Misaligned BufferDescriptors causing major performance problems on AMD