Обсуждение: BUG #14048: copy issues with jsonb

Поиск
Список
Период
Сортировка

BUG #14048: copy issues with jsonb

От
rotten@windfish.net
Дата:
The following bug has been logged on the website:

Bug reference:      14048
Logged by:          Rick Otten
Email address:      rotten@windfish.net
PostgreSQL version: 9.5.1
Operating system:   Ubuntu 14.04
Description:

I have a few jsonb columns in a table and I've run into several issues with
COPY with that column type.  I didn't see anything specific about it in the
to-do list, and Google didn't turn up anyone else specifically complaining,
so I'm filing it here.  I apologize if this is a known issue.

The table is the backend for a web pixel (tracking) service.  It has 4 jsonb
columns which contain variable data about the browser and session.  (parsed
user agent, list of cookies, query parameters, etc...)

I wanted to copy a section of the table from one 9.5.1 database to another.

For some reason the CSV and TEXT format exports do not quote/escape the
nested JSON very well - after several tries, I gave up on that approach.
The "copy to" output appears to be too mangled.  It was not obvious, at
first glance, where it is getting confused.

Next I tried BINARY mode.  That worked ok for the rows I was moving last
week, but this week (for a different piece of the table) I'm getting the
following when I try to read it back in:

# copy (select * from pixels.pixel_requests where server_timestamp >
'2016-03-26 00:00') to '/home/rotten/pixels27Mar2016.bin' binary;

# copy pixels.pixel_requests_tmp_2 from '/home/rotten/pixels27Mar2016.bin'
binary;

ERROR:  unsupported jsonb version number 63
CONTEXT:  COPY pixel_requests_tmp_2, line 1, column parameters

The new tmp table has the same ddl as the original table.  Both tables are
UTF-8 enabled.  Both are the same version of PostgreSQL running on the same
version of Ubuntu.

I'm willing to share the copy output (313M - 288,000 rows) with someone who
is able to help troubleshoot the issue, but I don't want to post the data in
a public forum.

I'm not sure which row and column is causing the copy command to fail.
(There is already a  to-do that may help with that.)

Meanwhile, I'll use pg_dump to get the rows I need.  Hopefully that will be
able to extract and reload the jsonb data ok.  pg_dump hasn't been a problem
in the past anyway...

Re: BUG #14048: copy issues with jsonb

От
"David G. Johnston"
Дата:
On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote:

> The following bug has been logged on the website:
>
> Bug reference:      14048
> Logged by:          Rick Otten
> Email address:      rotten@windfish.net
> PostgreSQL version: 9.5.1
> Operating system:   Ubuntu 14.04
> Description:
>
> I have a few jsonb columns in a table and I've run into several issues wi=
th
> COPY with that column type.  I didn't see anything specific about it in t=
he
> to-do list, and Google didn't turn up anyone else specifically complainin=
g,
> so I'm filing it here.  I apologize if this is a known issue.
>
> The table is the backend for a web pixel (tracking) service.  It has 4
> jsonb
> columns which contain variable data about the browser and session.  (pars=
ed
> user agent, list of cookies, query parameters, etc...)
>
> I wanted to copy a section of the table from one 9.5.1 database to anothe=
r.
>
> For some reason the CSV and TEXT format exports do not quote/escape the
> nested JSON very well - after several tries, I gave up on that approach.
> The "copy to" output appears to be too mangled.  It was not obvious, at
> first glance, where it is getting confused.

=E2=80=8B[...]
>


> Meanwhile, I'll use pg_dump to get the rows I need.  Hopefully that will =
be
> able to extract and reload the jsonb data ok.  pg_dump hasn't been a
> problem
> in the past anyway...
>

=E2=80=8BI'm reasonably certain pg_dump/pg_restore makes use of "COPY" to p=
erform
its work - at least for custom format and plain non-inserts mode=E2=80=8B -=
 so if
that works but whatever you are doing manually does not there is some
degree of suspicion that what you are doing may be incorrect.

We really need a sample record and a self-contained script that you can
make reproduce this behavior.

David J.

Re: BUG #14048: copy issues with jsonb

От
Rick Otten
Дата:
Ok. There are 288K rows in the file that was failing yesterday. I'll
play with date ranges until I can narrow it down to a few rows and send
it along.

I wasn't able to get to trying pgdump yesterday, I'll give it a try
today and let you know.

What I'm doing manually is exactly what I posted.

On 2016-03-27 22:25, David G. Johnston wrote:

> On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference: 14048
>> Logged by: Rick Otten
>> Email address: rotten@windfish.net
>> PostgreSQL version: 9.5.1
>> Operating system: Ubuntu 14.04
>> Description:
>>
>> I have a few jsonb columns in a table and I've run into several issues with
>> COPY with that column type. I didn't see anything specific about it in the
>> to-do list, and Google didn't turn up anyone else specifically complaining,
>> so I'm filing it here. I apologize if this is a known issue.
>>
>> The table is the backend for a web pixel (tracking) service. It has 4 jsonb
>> columns which contain variable data about the browser and session. (parsed
>> user agent, list of cookies, query parameters, etc...)
>>
>> I wanted to copy a section of the table from one 9.5.1 database to another.
>>
>> For some reason the CSV and TEXT format exports do not quote/escape the
>> nested JSON very well - after several tries, I gave up on that approach.
>> The "copy to" output appears to be too mangled. It was not obvious, at
>> first glance, where it is getting confused.
>
>> ​[...]
>
>> Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will be
>> able to extract and reload the jsonb data ok. pg_dump hasn't been a problem
>> in the past anyway...
>
> ​I'm reasonably certain pg_dump/pg_restore makes use of "COPY" to perform its work - at least for custom format and
plainnon-inserts mode​ - so if that works but whatever you are doing manually does not there is some degree of
suspicionthat what you are doing may be incorrect.  
>
> We really need a sample record and a self-contained script that you can make reproduce this behavior.
>
> David J.

Re: BUG #14048: copy issues with jsonb

От
"David G. Johnston"
Дата:
On Sun, Mar 27, 2016 at 7:25 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

> On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:      14048
>> Logged by:          Rick Otten
>> Email address:      rotten@windfish.net
>> PostgreSQL version: 9.5.1
>> Operating system:   Ubuntu 14.04
>> Description:
>>
>> I have a few jsonb columns in a table and I've run into several issues
>> with
>> COPY with that column type.  I didn't see anything specific about it in
>> the
>> to-do list, and Google didn't turn up anyone else specifically
>> complaining,
>> so I'm filing it here.  I apologize if this is a known issue.
>>
>> The table is the backend for a web pixel (tracking) service.  It has 4
>> jsonb
>> columns which contain variable data about the browser and session.
>> (parsed
>> user agent, list of cookies, query parameters, etc...)
>>
>> I wanted to copy a section of the table from one 9.5.1 database to
>> another.
>>
>> For some reason the CSV and TEXT format exports do not quote/escape the
>> nested JSON very well - after several tries, I gave up on that approach.
>> The "copy to" output appears to be too mangled.  It was not obvious, at
>> first glance, where it is getting confused.
>
> =E2=80=8B[...]
>>
>
>
>> Meanwhile, I'll use pg_dump to get the rows I need.  Hopefully that will
>> be
>> able to extract and reload the jsonb data ok.  pg_dump hasn't been a
>> problem
>> in the past anyway...
>>
>
> =E2=80=8BI'm reasonably certain pg_dump/pg_restore makes use of "COPY" to=
 perform
> its work - at least for custom format and plain non-inserts mode=E2=80=8B=
 - so if
> that works but whatever you are doing manually does not there is some
> degree of suspicion that what you are doing may be incorrect.
>
> We really need a sample record and a self-contained script that you can
> make reproduce this behavior.
>
>
=E2=80=8BRick provided a file offline which I peeked at.  Responding to him
privately but lacking further public comments this report can be considered
"Not a bug".

David J.

Re: BUG #14048: copy issues with jsonb

От
Rick Otten
Дата:
So the problem is a column order thing and not something buried in my
jsonb data. Mr. Johnston was very helpful in resolving this.

Apparently when you do a "copy from with csv header", the 'header'
phrase is ignored in spite of the keyword being present in your
statement. By removing the header from the csv file and then pasting it
into the 'copy from' statement (to specify the column order), I was able
to load from csv format. The error looked to me like something was wrong
with the jsonb data, but really it was a column ordering problem.

Ditto for Binary. Unless you know the column order that was dumped into
the binary, you can't do the 'copy from'. (And since you can't see the
column order in the binary file very easily, you have to know it before
you dump it.) 'copy from binary' doesn't infer column order from the
file via any sort of binary header.

I noticed that pg_dump was being very explicit about the column
ordering, which is what tipped me off, and then it was confirmed by Mr.
Johnston and a series of experiments.

I apologize for thinking this was a bug. The slightly unexpected
behavior (to me) had me stumped for a while.

On 2016-03-28 11:48, David G. Johnston wrote:

> On Sun, Mar 27, 2016 at 7:25 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 14048
> Logged by: Rick Otten
> Email address: rotten@windfish.net
> PostgreSQL version: 9.5.1
> Operating system: Ubuntu 14.04
> Description:
>
> I have a few jsonb columns in a table and I've run into several issues with
> COPY with that column type. I didn't see anything specific about it in the
> to-do list, and Google didn't turn up anyone else specifically complaining,
> so I'm filing it here. I apologize if this is a known issue.
>
> The table is the backend for a web pixel (tracking) service. It has 4 jsonb
> columns which contain variable data about the browser and session. (parsed
> user agent, list of cookies, query parameters, etc...)
>
> I wanted to copy a section of the table from one 9.5.1 database to another.
>
> For some reason the CSV and TEXT format exports do not quote/escape the
> nested JSON very well - after several tries, I gave up on that approach.
> The "copy to" output appears to be too mangled. It was not obvious, at
> first glance, where it is getting confused.
> ​[...]
> Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will be
> able to extract and reload the jsonb data ok. pg_dump hasn't been a problem
> in the past anyway...
>
> ​I'm reasonably certain pg_dump/pg_restore makes use of "COPY" to perform its work - at least for custom format and
plainnon-inserts mode​ - so if that works but whatever you are doing manually does not there is some degree of
suspicionthat what you are doing may be incorrect.  
>
> We really need a sample record and a self-contained script that you can make reproduce this behavior.

​Rick provided a file offline which I peeked at. Responding to him
privately but lacking further public comments this report can be
considered "Not a bug".

David J.

Re: BUG #14048: copy issues with jsonb

От
"David G. Johnston"
Дата:
On Mon, Mar 28, 2016 at 9:35 AM, Rick Otten <rotten@windfish.net> wrote:

> So the problem is a column order thing and not something buried in my
> jsonb data.  Mr. Johnston was very helpful in resolving this.
>
> Apparently when you do a "copy from with csv header", the 'header' phrase
> is ignored in spite of the keyword being present in your statement.
>
=E2=80=8BIt is not ignored - it is informing COPY that the first line of yo=
ur input
file is not data but meta-data and thus must be ignored.=E2=80=8B

There are external utilities available that make doing the whole copy thing
much more user-friendly.  Given their existence not much effort is expended
trying to enhance the native copy facilities beyond making it extremely
fast for bulk loading well-specified data.

David J.