Обсуждение: BUG #14048: copy issues with jsonb
The following bug has been logged on the website: Bug reference: 14048 Logged by: Rick Otten Email address: rotten@windfish.net PostgreSQL version: 9.5.1 Operating system: Ubuntu 14.04 Description: I have a few jsonb columns in a table and I've run into several issues with COPY with that column type. I didn't see anything specific about it in the to-do list, and Google didn't turn up anyone else specifically complaining, so I'm filing it here. I apologize if this is a known issue. The table is the backend for a web pixel (tracking) service. It has 4 jsonb columns which contain variable data about the browser and session. (parsed user agent, list of cookies, query parameters, etc...) I wanted to copy a section of the table from one 9.5.1 database to another. For some reason the CSV and TEXT format exports do not quote/escape the nested JSON very well - after several tries, I gave up on that approach. The "copy to" output appears to be too mangled. It was not obvious, at first glance, where it is getting confused. Next I tried BINARY mode. That worked ok for the rows I was moving last week, but this week (for a different piece of the table) I'm getting the following when I try to read it back in: # copy (select * from pixels.pixel_requests where server_timestamp > '2016-03-26 00:00') to '/home/rotten/pixels27Mar2016.bin' binary; # copy pixels.pixel_requests_tmp_2 from '/home/rotten/pixels27Mar2016.bin' binary; ERROR: unsupported jsonb version number 63 CONTEXT: COPY pixel_requests_tmp_2, line 1, column parameters The new tmp table has the same ddl as the original table. Both tables are UTF-8 enabled. Both are the same version of PostgreSQL running on the same version of Ubuntu. I'm willing to share the copy output (313M - 288,000 rows) with someone who is able to help troubleshoot the issue, but I don't want to post the data in a public forum. I'm not sure which row and column is causing the copy command to fail. (There is already a to-do that may help with that.) Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will be able to extract and reload the jsonb data ok. pg_dump hasn't been a problem in the past anyway...
On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote: > The following bug has been logged on the website: > > Bug reference: 14048 > Logged by: Rick Otten > Email address: rotten@windfish.net > PostgreSQL version: 9.5.1 > Operating system: Ubuntu 14.04 > Description: > > I have a few jsonb columns in a table and I've run into several issues wi= th > COPY with that column type. I didn't see anything specific about it in t= he > to-do list, and Google didn't turn up anyone else specifically complainin= g, > so I'm filing it here. I apologize if this is a known issue. > > The table is the backend for a web pixel (tracking) service. It has 4 > jsonb > columns which contain variable data about the browser and session. (pars= ed > user agent, list of cookies, query parameters, etc...) > > I wanted to copy a section of the table from one 9.5.1 database to anothe= r. > > For some reason the CSV and TEXT format exports do not quote/escape the > nested JSON very well - after several tries, I gave up on that approach. > The "copy to" output appears to be too mangled. It was not obvious, at > first glance, where it is getting confused. =E2=80=8B[...] > > Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will = be > able to extract and reload the jsonb data ok. pg_dump hasn't been a > problem > in the past anyway... > =E2=80=8BI'm reasonably certain pg_dump/pg_restore makes use of "COPY" to p= erform its work - at least for custom format and plain non-inserts mode=E2=80=8B -= so if that works but whatever you are doing manually does not there is some degree of suspicion that what you are doing may be incorrect. We really need a sample record and a self-contained script that you can make reproduce this behavior. David J.
Ok. There are 288K rows in the file that was failing yesterday. I'll play with date ranges until I can narrow it down to a few rows and send it along. I wasn't able to get to trying pgdump yesterday, I'll give it a try today and let you know. What I'm doing manually is exactly what I posted. On 2016-03-27 22:25, David G. Johnston wrote: > On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote: > >> The following bug has been logged on the website: >> >> Bug reference: 14048 >> Logged by: Rick Otten >> Email address: rotten@windfish.net >> PostgreSQL version: 9.5.1 >> Operating system: Ubuntu 14.04 >> Description: >> >> I have a few jsonb columns in a table and I've run into several issues with >> COPY with that column type. I didn't see anything specific about it in the >> to-do list, and Google didn't turn up anyone else specifically complaining, >> so I'm filing it here. I apologize if this is a known issue. >> >> The table is the backend for a web pixel (tracking) service. It has 4 jsonb >> columns which contain variable data about the browser and session. (parsed >> user agent, list of cookies, query parameters, etc...) >> >> I wanted to copy a section of the table from one 9.5.1 database to another. >> >> For some reason the CSV and TEXT format exports do not quote/escape the >> nested JSON very well - after several tries, I gave up on that approach. >> The "copy to" output appears to be too mangled. It was not obvious, at >> first glance, where it is getting confused. > >> â[...] > >> Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will be >> able to extract and reload the jsonb data ok. pg_dump hasn't been a problem >> in the past anyway... > > âI'm reasonably certain pg_dump/pg_restore makes use of "COPY" to perform its work - at least for custom format and plainnon-inserts modeâ - so if that works but whatever you are doing manually does not there is some degree of suspicionthat what you are doing may be incorrect. > > We really need a sample record and a self-contained script that you can make reproduce this behavior. > > David J.
On Sun, Mar 27, 2016 at 7:25 PM, David G. Johnston < david.g.johnston@gmail.com> wrote: > On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote: > >> The following bug has been logged on the website: >> >> Bug reference: 14048 >> Logged by: Rick Otten >> Email address: rotten@windfish.net >> PostgreSQL version: 9.5.1 >> Operating system: Ubuntu 14.04 >> Description: >> >> I have a few jsonb columns in a table and I've run into several issues >> with >> COPY with that column type. I didn't see anything specific about it in >> the >> to-do list, and Google didn't turn up anyone else specifically >> complaining, >> so I'm filing it here. I apologize if this is a known issue. >> >> The table is the backend for a web pixel (tracking) service. It has 4 >> jsonb >> columns which contain variable data about the browser and session. >> (parsed >> user agent, list of cookies, query parameters, etc...) >> >> I wanted to copy a section of the table from one 9.5.1 database to >> another. >> >> For some reason the CSV and TEXT format exports do not quote/escape the >> nested JSON very well - after several tries, I gave up on that approach. >> The "copy to" output appears to be too mangled. It was not obvious, at >> first glance, where it is getting confused. > > =E2=80=8B[...] >> > > >> Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will >> be >> able to extract and reload the jsonb data ok. pg_dump hasn't been a >> problem >> in the past anyway... >> > > =E2=80=8BI'm reasonably certain pg_dump/pg_restore makes use of "COPY" to= perform > its work - at least for custom format and plain non-inserts mode=E2=80=8B= - so if > that works but whatever you are doing manually does not there is some > degree of suspicion that what you are doing may be incorrect. > > We really need a sample record and a self-contained script that you can > make reproduce this behavior. > > =E2=80=8BRick provided a file offline which I peeked at. Responding to him privately but lacking further public comments this report can be considered "Not a bug". David J.
So the problem is a column order thing and not something buried in my jsonb data. Mr. Johnston was very helpful in resolving this. Apparently when you do a "copy from with csv header", the 'header' phrase is ignored in spite of the keyword being present in your statement. By removing the header from the csv file and then pasting it into the 'copy from' statement (to specify the column order), I was able to load from csv format. The error looked to me like something was wrong with the jsonb data, but really it was a column ordering problem. Ditto for Binary. Unless you know the column order that was dumped into the binary, you can't do the 'copy from'. (And since you can't see the column order in the binary file very easily, you have to know it before you dump it.) 'copy from binary' doesn't infer column order from the file via any sort of binary header. I noticed that pg_dump was being very explicit about the column ordering, which is what tipped me off, and then it was confirmed by Mr. Johnston and a series of experiments. I apologize for thinking this was a bug. The slightly unexpected behavior (to me) had me stumped for a while. On 2016-03-28 11:48, David G. Johnston wrote: > On Sun, Mar 27, 2016 at 7:25 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Sun, Mar 27, 2016 at 8:04 AM, <rotten@windfish.net> wrote: > > The following bug has been logged on the website: > > Bug reference: 14048 > Logged by: Rick Otten > Email address: rotten@windfish.net > PostgreSQL version: 9.5.1 > Operating system: Ubuntu 14.04 > Description: > > I have a few jsonb columns in a table and I've run into several issues with > COPY with that column type. I didn't see anything specific about it in the > to-do list, and Google didn't turn up anyone else specifically complaining, > so I'm filing it here. I apologize if this is a known issue. > > The table is the backend for a web pixel (tracking) service. It has 4 jsonb > columns which contain variable data about the browser and session. (parsed > user agent, list of cookies, query parameters, etc...) > > I wanted to copy a section of the table from one 9.5.1 database to another. > > For some reason the CSV and TEXT format exports do not quote/escape the > nested JSON very well - after several tries, I gave up on that approach. > The "copy to" output appears to be too mangled. It was not obvious, at > first glance, where it is getting confused. > â[...] > Meanwhile, I'll use pg_dump to get the rows I need. Hopefully that will be > able to extract and reload the jsonb data ok. pg_dump hasn't been a problem > in the past anyway... > > âI'm reasonably certain pg_dump/pg_restore makes use of "COPY" to perform its work - at least for custom format and plainnon-inserts modeâ - so if that works but whatever you are doing manually does not there is some degree of suspicionthat what you are doing may be incorrect. > > We really need a sample record and a self-contained script that you can make reproduce this behavior. âRick provided a file offline which I peeked at. Responding to him privately but lacking further public comments this report can be considered "Not a bug". David J.
On Mon, Mar 28, 2016 at 9:35 AM, Rick Otten <rotten@windfish.net> wrote: > So the problem is a column order thing and not something buried in my > jsonb data. Mr. Johnston was very helpful in resolving this. > > Apparently when you do a "copy from with csv header", the 'header' phrase > is ignored in spite of the keyword being present in your statement. > =E2=80=8BIt is not ignored - it is informing COPY that the first line of yo= ur input file is not data but meta-data and thus must be ignored.=E2=80=8B There are external utilities available that make doing the whole copy thing much more user-friendly. Given their existence not much effort is expended trying to enhance the native copy facilities beyond making it extremely fast for bulk loading well-specified data. David J.