On Mon, Jan 26, 2015 at 09:20:54AM -0500, Andrew Dunstan wrote:
> On 01/23/2015 02:18 AM, Noah Misch wrote:
> >On Wed, Jan 21, 2015 at 06:51:34PM -0500, Andrew Dunstan wrote:
> >>We could probably fix this fairly easily for non- U+0000 cases by having
> >>jsonb_to_cstring use a different escape_json routine.
> >>Maybe we should detect such input and emit a warning of
> >>ambiguity? It's likely to be rare enough, but clearly not as rare as we'd
> >>like, since this is a report from the field.
> >Perhaps. Something like "WARNING: jsonb cannot represent \\u0000; reading as
> >\u0000"? Alas, but I do prefer that to silent data corruption.
>
> Maybe something like this patch. I have two concerns about it, though. The
> first is the possible performance impact of looking for the offending string
> in every jsonb input, and the second is that the test isn't quite right,
> since input of \\\u0000 doesn't raise this issue - i.e. the problem arises
> when u0000 is preceded by an even number of backslashes.
I share your second concern. It is important that this warning not cry wolf;
it should never fire for an input that is actually unaffected.
> For the moment, maybe I could commit the fix for the non U+0000 case for
> escape_json, and we could think some more about detecting and warning about
> the problem strings.
+1 for splitting development that way. Fixing the use of escape_json() is
objective, but the choices around the warning are more subtle.