Обсуждение: Trouble Upgrading Postgres
Chuck Martin
I'd be grateful for some help. I am trying to move a large database from PostgreSQL 9.6 on Centos 6 to a different server using PostgreSQL 11 on Centos 7. I can't do a pg_dump because it always fails on the largest table.
What error message?
Angular momentum makes the world go 'round.
On 11/3/18 12:57 PM, Charles Martin wrote: > I'd be grateful for some help. I am trying to move a large database from > PostgreSQL 9.6 on Centos 6 to a different server using PostgreSQL 11 on > Centos 7. I can't do a pg_dump because it always fails on the largest > table. I would answer Ron's question on this first as solving it would be the easiest fix. >So tried to do pb_basebackup and copy that to the new PG 11 > server. Except that pg_upgrade expects the new and old versions of PG to > be side-by-side. So I installed 9.6 on the new server, ran initdb, The is probably the issue, you now have two 9.6 data directory instances, the one you created with initdb and the one that came over with pg_basebackup. I am guessing the editing below has left the server in a confused state about which directory to use. The error messages you got when trying to restart the server would be helpful. > verified that it started, then stopped it and edited postgresql.conf > data path to the location of the pg_basebackup files. Then 9.6 would no > longer start. So how can I get my PG 9.6 data into a new PG 11 database? > > Probably related to my troubles are my attempts to get replication set > up. But before I dive back into that, I thought I'd better try getting > my 9.6 data into the new 9.6 server, then run PG 11's pg_upgrade and > mount the data in PG 11. Then maybe I can get replication started. > > I've read that logical replication can be used to migrate from 9.6 to > 11, but haven't found any documentation on doing that. > > Chuck Martin -- Adrian Klaver adrian.klaver@aklaver.com
pg_dump: Dumping the contents of table "docfile" failed: PQgetCopyData() failed.
pg_dump: Error message from server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.docfile (docfile_pkey, docfileoriginalname, ordernumber, versionnum, docfilecontents, docfilepath, docfileextension, enddatetime, endby, editnum, insby, insdatetime, modby, moddatetime, active, doc_fkey) TO stdout;
Chuck
On 11/3/18 2:56 PM, Charles Martin wrote:
Please reply to list also.
Ccing list.
> Yes, if I could get pg_dump to work, I think I'd be fine. Maybe. But it
> doesn't.
Post the error you got to the list and we maybe able to help.
Also which version of Postgres where you using to take the dump?
>
> I agree that I've confused Postgres, but I don't know how to resolve the
> confusion. It is complicated by the fact that my original Centos 7
> install included Postgres 9.2, so those files are hanging around, along
> with 9.6 and 11.
>
> I posted the error messages I got when postgresql.conf had the data
> directory set to my basebackup data:
>
> *postgresql-9.6.service: main process exited, code=exited, status=1/FAILURE*
>
> *
> *
>
> Not very helpful.
>
>
> systemctl status postgresql-9.6 provided a bit more info:
>
> *●*postgresql-9.6.service - PostgreSQL 9.6 database server
>
> Loaded: loaded (/usr/lib/systemd/system/postgresql-9.6.service;
> disabled; vendor preset: disabled)
>
> Active: *failed*(Result: exit-code) since Sat 2018-11-03 15:05:30
> EDT; 15s ago
>
> Docs: https://www.postgresql.org/docs/9.6/static/
>
> Process: 32570 ExecStart=/usr/pgsql-9.6/bin/postmaster -D ${PGDATA}
> *(code=exited, status=1/FAILURE)*
>
> Process: 32563
> ExecStartPre=/usr/pgsql-9.6/bin/postgresql96-check-db-dir ${PGDATA}
> (code=exited, status=0/SUCCESS)
>
> Main PID: 32570 (code=exited, status=1/FAILURE)
>
>
> Yet this went away, and PG 9.6 started, when I changed postgresql.conf
> to point to the new (empty) data directory, which is confusing.
No not confusing. Not that familiar with RPM packaging as I am with the
Debian/Ubunto packaging. Still if I remember correctly it also allows
multiple instances of Postgres to run. To do that it has its own system
of tracking the data directories. Where you created the new data
directory is obviously where the package scripts expect to find it. The
pg_basebackup directory is not.
>
> Chuck
>
>
> On Sat, Nov 3, 2018 at 5:17 PM Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
> On 11/3/18 12:57 PM, Charles Martin wrote:
> > I'd be grateful for some help. I am trying to move a large
> database from
> > PostgreSQL 9.6 on Centos 6 to a different server using PostgreSQL
> 11 on
> > Centos 7. I can't do a pg_dump because it always fails on the
> largest
> > table.
>
> I would answer Ron's question on this first as solving it would be the
> easiest fix.
>
> >So tried to do pb_basebackup and copy that to the new PG 11
> > server. Except that pg_upgrade expects the new and old versions
> of PG to
> > be side-by-side. So I installed 9.6 on the new server, ran initdb,
>
> The is probably the issue, you now have two 9.6 data directory
> instances, the one you created with initdb and the one that came over
> with pg_basebackup. I am guessing the editing below has left the server
> in a confused state about which directory to use. The error messages
> you
> got when trying to restart the server would be helpful.
>
> > verified that it started, then stopped it and edited postgresql.conf
> > data path to the location of the pg_basebackup files. Then 9.6
> would no
> > longer start. So how can I get my PG 9.6 data into a new PG 11
> database?
> >
> > Probably related to my troubles are my attempts to get
> replication set
> > up. But before I dive back into that, I thought I'd better try
> getting
> > my 9.6 data into the new 9.6 server, then run PG 11's pg_upgrade and
> > mount the data in PG 11. Then maybe I can get replication started.
> >
> > I've read that logical replication can be used to migrate from
> 9.6 to
> > 11, but haven't found any documentation on doing that.
> >
> > Chuck Martin
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/3/18 3:47 PM, Charles Martin wrote: > When I do a pg_dump using PG 9.6, I got this: > >> pg_dump: Dumping the contents of table "docfile" failed: >> PQgetCopyData() failed. >> >> pg_dump: Error message from server: server closed the connection >> unexpectedly Is this error the client reporting? Is this the same that is showing up in the server log? >> >> This probably means the server terminated abnormally So where is the server located relative to the pg_dump client? On the same machine? If so is it a virtual machine e.g AWS? Across a local or remote network? >> >> before or while processing the request. >> >> pg_dump: The command was: COPY public.docfile (docfile_pkey, >> docfileoriginalname, ordernumber, versionnum, docfilecontents, >> docfilepath, docfileextension, enddatetime, endby, editnum, insby, >> insdatetime, modby, moddatetime, active, doc_fkey) TO stdout; >> > I've looked and been unable to find where Centos 7, or Postgres 9.6, > stores the path to the config/data directory outside the > data/postgresql.conf file. But I agree there must be something somewhere. > > Chuck > > On Sat, Nov 3, 2018 at 6:06 PM Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 11/3/18 2:56 PM, Charles Martin wrote: > > Please reply to list also. > Ccing list. > > > Yes, if I could get pg_dump to work, I think I'd be fine. Maybe. > But it > > doesn't. > > Post the error you got to the list and we maybe able to help. > Also which version of Postgres where you using to take the dump? > > > > > I agree that I've confused Postgres, but I don't know how to > resolve the > > confusion. It is complicated by the fact that my original Centos 7 > > install included Postgres 9.2, so those files are hanging around, > along > > with 9.6 and 11. > > > > I posted the error messages I got when postgresql.conf had the data > > directory set to my basebackup data: > > > > *postgresql-9.6.service: main process exited, code=exited, > status=1/FAILURE* > > > > * > > * > > > > Not very helpful. > > > > > > systemctl status postgresql-9.6 provided a bit more info: > > > > *●*postgresql-9.6.service - PostgreSQL 9.6 database server > > > > Loaded: loaded (/usr/lib/systemd/system/postgresql-9.6.service; > > disabled; vendor preset: disabled) > > > > Active: *failed*(Result: exit-code) since Sat 2018-11-03 > 15:05:30 > > EDT; 15s ago > > > > Docs: https://www.postgresql.org/docs/9.6/static/ > > > > Process: 32570 ExecStart=/usr/pgsql-9.6/bin/postmaster -D > ${PGDATA} > > *(code=exited, status=1/FAILURE)* > > > > Process: 32563 > > ExecStartPre=/usr/pgsql-9.6/bin/postgresql96-check-db-dir ${PGDATA} > > (code=exited, status=0/SUCCESS) > > > > Main PID: 32570 (code=exited, status=1/FAILURE) > > > > > > Yet this went away, and PG 9.6 started, when I changed > postgresql.conf > > to point to the new (empty) data directory, which is confusing. > > No not confusing. Not that familiar with RPM packaging as I am with > the > Debian/Ubunto packaging. Still if I remember correctly it also allows > multiple instances of Postgres to run. To do that it has its own system > of tracking the data directories. Where you created the new data > directory is obviously where the package scripts expect to find it. > The > pg_basebackup directory is not. > > > > > Chuck > > > > > > On Sat, Nov 3, 2018 at 5:17 PM Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > <mailto:adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>>> wrote: > > > > On 11/3/18 12:57 PM, Charles Martin wrote: > > > I'd be grateful for some help. I am trying to move a large > > database from > > > PostgreSQL 9.6 on Centos 6 to a different server using > PostgreSQL > > 11 on > > > Centos 7. I can't do a pg_dump because it always fails on the > > largest > > > table. > > > > I would answer Ron's question on this first as solving it > would be the > > easiest fix. > > > > >So tried to do pb_basebackup and copy that to the new PG 11 > > > server. Except that pg_upgrade expects the new and old > versions > > of PG to > > > be side-by-side. So I installed 9.6 on the new server, ran > initdb, > > > > The is probably the issue, you now have two 9.6 data directory > > instances, the one you created with initdb and the one that > came over > > with pg_basebackup. I am guessing the editing below has left > the server > > in a confused state about which directory to use. The error > messages > > you > > got when trying to restart the server would be helpful. > > > > > verified that it started, then stopped it and edited > postgresql.conf > > > data path to the location of the pg_basebackup files. Then 9.6 > > would no > > > longer start. So how can I get my PG 9.6 data into a new PG 11 > > database? > > > > > > Probably related to my troubles are my attempts to get > > replication set > > > up. But before I dive back into that, I thought I'd better try > > getting > > > my 9.6 data into the new 9.6 server, then run PG 11's > pg_upgrade and > > > mount the data in PG 11. Then maybe I can get replication > started. > > > > > > I've read that logical replication can be used to migrate from > > 9.6 to > > > 11, but haven't found any documentation on doing that. > > > > > > Chuck Martin > > > > > > -- > > Adrian Klaver > > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > <mailto:adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > -- Adrian Klaver adrian.klaver@aklaver.com
Charles Martin wrote: > When I do a pg_dump using PG 9.6, I got this: > > > pg_dump: Dumping the contents of table "docfile" failed: PQgetCopyData() failed. > > pg_dump: Error message from server: server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > pg_dump: The command was: COPY public.docfile (docfile_pkey, docfileoriginalname, > > ordernumber, versionnum, docfilecontents, docfilepath, docfileextension, enddatetime, > > endby, editnum, insby, insdatetime, modby, moddatetime, active, doc_fkey) TO stdout; You probably have a corrupted database. You should get that fixed first, then you can upgrade. Maybe you should hire a professional for that. The other alternative is that you hava a buggy extension installed that causes PostgreSQL to crash. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Am 03.11.2018 um 23:47 schrieb Charles Martin: > When I do a pg_dump using PG 9.6, I got this: > >> pg_dump: Dumping the contents of table "docfile" failed: >> PQgetCopyData() failed. >> >> pg_dump: Error message from server: server closed the connection >> unexpectedly >> >> This probably means the server terminated abnormally >> >> before or while processing the request. >> >> pg_dump: The command was: COPY public.docfile (docfile_pkey, >> docfileoriginalname, ordernumber, versionnum, docfilecontents, >> docfilepath, docfileextension, enddatetime, endby, editnum, insby, >> insdatetime, modby, moddatetime, active, doc_fkey) TO stdout; >> which exact minor version please? Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
>> unexpectedly
>Is this error the client reporting?
2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG: server process (PID 30438) was terminated by signal 9: Killed
2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL: Failed process was running: COPY public.docfile (docfile_pkey, docfileoriginalname, ordernumber, versionnum, docfilecontents, docfilepath, d$
2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG: terminating any other active server processes
2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING: terminating connection because of crash of another server process
2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnor$
>So where is the server located relative to the pg_dump client?
>On the same machine?
>If so is it a virtual machine e.g AWS?
>Across a local or remote network?
I gave the command in a terminal session after SSHing to the server from the same network. It is not a virtual machine.
Lsaurenz said:
>You probably have a corrupted database.
>You should get that fixed first, then you can upgrade.
>Maybe you should hire a professional for that.
I suspect this is is correct, both that there is corruption in the table and that I need a professional to help. If someone here is available, I'm interested.
Andreas said:
>which exact minor version please?
PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
Am 04.11.2018 um 17:38 schrieb Charles Martin: > > Andreas said: > > > >which exact minor version please? > > > PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 > 20120313 (Red Hat 4.4.7-23), 64-bit > okay, i asked to just rule out a stale version. 9.6.7, for instance, contains some importand bug fixes. Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
On 11/4/18 8:38 AM, Charles Martin wrote: > > Adtrian said: >>> pg_dump: Error message from server: server closed the connection >>> unexpectedly > > >Is this error the client reporting? > >Is this the same that is showing up in the server log? > > Yes, that's the client message, i.e. what appeared in the terminal > window that gave the command. The server log shows: > > 2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG: server process > (PID 30438) was terminated by signal 9: Killed > > 2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL: Failed process > was running: COPY public.docfile (docfile_pkey, docfileoriginalname, > ordernumber, versionnum, docfilecontents, docfilepath, d$ > > 2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG: terminating any > other active server processes > > 2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING: terminating > connection because of crash of another server process > > 2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL: The > postmaster has commanded this server process to roll back the current > transaction and exit, because another server process exited abnor$ > > > >>So where is the server located relative to the pg_dump client? >>On the same machine? >>If so is it a virtual machine e.g AWS? >>Across a local or remote network? > > > I gave the command in a terminal session after SSHing to the server > from the same network. It is not a virtual machine. > > > Lsaurenz said: > > >>You probably have a corrupted database. >>You should get that fixed first, then you can upgrade. >>Maybe you should hire a professional for that. > > > I suspect this is is correct, both that there is corruption in the table > and that I need a professional to help. If someone here is available, > I'm interested. Given that this involves your largest table I would confirm that the signal 9 kill was not coming from the system OOM killer. Take a look at the system logs to see what they show over the same time period. > > > Andreas said: > > > >which exact minor version please? > > > PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 > 20120313 (Red Hat 4.4.7-23), 64-bit > > -- Adrian Klaver adrian.klaver@aklaver.com
On 11/4/18 8:38 AM, Charles Martin wrote:
>
> Adtrian said:
>>> pg_dump: Error message from server: server closed the connection
>>> unexpectedly
>
> >Is this error the client reporting?
> >Is this the same that is showing up in the server log?
>
> Yes, that's the client message, i.e. what appeared in the terminal
> window that gave the command. The server log shows:
>
> 2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG: server process
> (PID 30438) was terminated by signal 9: Killed
>
> 2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL: Failed process
> was running: COPY public.docfile (docfile_pkey, docfileoriginalname,
> ordernumber, versionnum, docfilecontents, docfilepath, d$
>
> 2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG: terminating any
> other active server processes
>
> 2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING: terminating
> connection because of crash of another server process
>
> 2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL: The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server process exited abnor$
>
>
>
>>So where is the server located relative to the pg_dump client?
>>On the same machine?
>>If so is it a virtual machine e.g AWS?
>>Across a local or remote network?
>
>
> I gave the command in a terminal session after SSHing to the server
> from the same network. It is not a virtual machine.
>
>
> Lsaurenz said:
>
>
>>You probably have a corrupted database.
>>You should get that fixed first, then you can upgrade.
>>Maybe you should hire a professional for that.
>
>
> I suspect this is is correct, both that there is corruption in the table
> and that I need a professional to help. If someone here is available,
> I'm interested.
Given that this involves your largest table I would confirm that the
signal 9 kill was not coming from the system OOM killer. Take a look at
the system logs to see what they show over the same time period.
>
>
> Andreas said:
>
>
> >which exact minor version please?
>
>
> PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-23), 64-bit
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Yep, you called it:Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438 (postmaster) score 709 or sacrifice childNov 2 20:30:45 localhost kernel: Killed process 30438, UID 26, (postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kBSo it's running out of memory when trying to dump this table. The "old" server has 4GB of ram, the "new" server 20GB.On Sun, Nov 4, 2018 at 3:13 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 11/4/18 8:38 AM, Charles Martin wrote:
>
> Adtrian said:
>>> pg_dump: Error message from server: server closed the connection
>>> unexpectedly
>
> >Is this error the client reporting?
> >Is this the same that is showing up in the server log?
>
> Yes, that's the client message, i.e. what appeared in the terminal
> window that gave the command. The server log shows:
>
> 2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG: server process
> (PID 30438) was terminated by signal 9: Killed
>
> 2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL: Failed process
> was running: COPY public.docfile (docfile_pkey, docfileoriginalname,
> ordernumber, versionnum, docfilecontents, docfilepath, d$
>
> 2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG: terminating any
> other active server processes
>
> 2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING: terminating
> connection because of crash of another server process
>
> 2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL: The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server process exited abnor$
>
>
>
>>So where is the server located relative to the pg_dump client?
>>On the same machine?
>>If so is it a virtual machine e.g AWS?
>>Across a local or remote network?
>
>
> I gave the command in a terminal session after SSHing to the server
> from the same network. It is not a virtual machine.
>
>
> Lsaurenz said:
>
>
>>You probably have a corrupted database.
>>You should get that fixed first, then you can upgrade.
>>Maybe you should hire a professional for that.
>
>
> I suspect this is is correct, both that there is corruption in the table
> and that I need a professional to help. If someone here is available,
> I'm interested.
Given that this involves your largest table I would confirm that the
signal 9 kill was not coming from the system OOM killer. Take a look at
the system logs to see what they show over the same time period.
>
>
> Andreas said:
>
>
> >which exact minor version please?
>
>
> PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-23), 64-bit
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Angular momentum makes the world go 'round.
Charles Martin <ssappeals@gmail.com> writes: > Yep, you called it: > Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438 > (postmaster) score 709 or sacrifice child > Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26, > (postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kB > So it's running out of memory when trying to dump this table. The "old" > server has 4GB of ram, the "new" server 20GB. That's kind of odd: a COPY shouldn't really consume very much working memory. I suspect that much of the process's apparent VM consumption may be shared buffers ... what have you got shared_buffers set to on the old server? If it's more than half a GB or so, maybe reducing it would help. regards, tom lane
On 11/4/18 2:55 PM, Charles Martin wrote: > Yep, you called it: > > Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438 > (postmaster) score 709 or sacrifice child > Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26, > (postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kB > > So it's running out of memory when trying to dump this table. The "old" > server has 4GB of ram, the "new" server 20GB. > In addition to the other suggestions, what is the exact pg_dump command you are using? -- Adrian Klaver adrian.klaver@aklaver.com
>memory. I suspect that much of the process's apparent VM consumption may
>be shared buffers ... what have you got shared_buffers set to on the old
>server? If it's more than half a GB or so, maybe reducing it would help.
This is not a VM, but hardware.
[postgres@mandj tmp]$ pg_dump martinandjones > /mnt/4tbB/pgbackup/2018-11-02/mandj.bak
On 11/4/18 2:55 PM, Charles Martin wrote:
> Yep, you called it:
>
> Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438
> (postmaster) score 709 or sacrifice child
> Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26,
> (postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kB
>
> So it's running out of memory when trying to dump this table. The "old"
> server has 4GB of ram, the "new" server 20GB.
>
In addition to the other suggestions, what is the exact pg_dump command
you are using?
--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/5/18 5:56 AM, Charles Martin wrote: > Tom said: > > >That's kind of odd: a COPY shouldn't really consume very much working > >memory. I suspect that much of the process's apparent VM consumption may > >be shared buffers ... what have you got shared_buffers set to on the old > >server? If it's more than half a GB or so, maybe reducing it would help. > > This is not a VM, but hardware. Tom was referring to this from your previous post: (postmaster) total-vm:3068900kB, where vm(VM) is Virtual Memory: https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean#22326766 So what is your shared_buffers: https://www.postgresql.org/docs/10/static/runtime-config-resource.html set to? > > Adrian said: > > >In addition to the other suggestions, what is the exact pg_dump command > >you are using? > > The last time: > > [postgres@mandj tmp]$ pg_dump martinandjones > > /mnt/4tbB/pgbackup/2018-11-02/mandj.bak > > > > > On Sun, Nov 4, 2018 at 8:16 PM Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 11/4/18 2:55 PM, Charles Martin wrote: > > Yep, you called it: > > > > Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438 > > (postmaster) score 709 or sacrifice child > > Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26, > > (postmaster) total-vm:3068900kB, anon-rss:1695392kB, > file-rss:1074692kB > > > > So it's running out of memory when trying to dump this table. The > "old" > > server has 4GB of ram, the "new" server 20GB. > > > > In addition to the other suggestions, what is the exact pg_dump command > you are using? > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > -- Adrian Klaver adrian.klaver@aklaver.com
Charles Martin wrote: > >So where is the server located relative to the pg_dump client? > >On the same machine? > >If so is it a virtual machine e.g AWS? > >Across a local or remote network? > > > I gave the command in a terminal session after SSHing to the server from > the same network. It is not a virtual machine. That means that pg_dump runs on the same machine as the server. It's plausible that, with only 4GB of RAM, the table that fails to dump has some very large rows that can't be allocated, especially since both the backend and pg_dump need to have it simultaneously in memory. > pg_dump: The command was: COPY public.docfile (docfile_pkey, > docfileoriginalname, ordernumber, versionnum, docfilecontents, > docfilepath, docfileextension, enddatetime, endby, editnum, insby, > insdatetime, modby, moddatetime, active, doc_fkey) TO stdout; The "docfilecontents" column suggests that it might contain large contents. If its type is bytea, it's going to be expanded to twice its size to build the hex representation. You may get a sense on how big is the biggest row expressed as text with this query: SELECT max(length(contents.*::text)) FROM public.docfile; If it's big enough that it might cause the OOM issue, try to run pg_dump remotely through an SSH tunnel [1], which you can already do in terms of network permissions since you log in with SSH, so pg_dump itself does not use any memory on the server. Also, if the machine doesn't have swap space, it might be that just adding a few GB's of swap would make the operation succeed. [1] https://www.postgresql.org/docs/current/static/ssh-tunnels.html Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
>(postmaster) total-vm:3068900kB,
>where vm(VM) is Virtual Memory:
>https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean#22326766
>So what is your shared_buffers:
>https://www.postgresql.org/docs/10/static/runtime-config-resource.html
>set to?
On 11/5/18 5:56 AM, Charles Martin wrote:
> Tom said:
>
> >That's kind of odd: a COPY shouldn't really consume very much working
> >memory. I suspect that much of the process's apparent VM consumption may
> >be shared buffers ... what have you got shared_buffers set to on the old
> >server? If it's more than half a GB or so, maybe reducing it would help.
>
> This is not a VM, but hardware.
Tom was referring to this from your previous post:
(postmaster) total-vm:3068900kB,
where vm(VM) is Virtual Memory:
https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean#22326766
So what is your shared_buffers:
https://www.postgresql.org/docs/10/static/runtime-config-resource.html
set to?
>
> Adrian said:
>
> >In addition to the other suggestions, what is the exact pg_dump command
> >you are using?
>
> The last time:
>
> [postgres@mandj tmp]$ pg_dump martinandjones >
> /mnt/4tbB/pgbackup/2018-11-02/mandj.bak
>
>
>
>
> On Sun, Nov 4, 2018 at 8:16 PM Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
> On 11/4/18 2:55 PM, Charles Martin wrote:
> > Yep, you called it:
> >
> > Nov 2 20:30:45 localhost kernel: Out of memory: Kill process 30438
> > (postmaster) score 709 or sacrifice child
> > Nov 2 20:30:45 localhost kernel: Killed process 30438, UID 26,
> > (postmaster) total-vm:3068900kB, anon-rss:1695392kB,
> file-rss:1074692kB
> >
> > So it's running out of memory when trying to dump this table. The
> "old"
> > server has 4GB of ram, the "new" server 20GB.
> >
>
> In addition to the other suggestions, what is the exact pg_dump command
> you are using?
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/5/18 7:04 AM, Charles Martin wrote: > Adrian said: > > >Tom was referring to this from your previous post: > >(postmaster) total-vm:3068900kB, > >where vm(VM) is Virtual Memory: > >https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean#22326766 > >So what is your shared_buffers: > >https://www.postgresql.org/docs/10/static/runtime-config-resource.html > >set to? > > Ok, thanks for explaining this. Here is the current value: > "shared_buffers""131072""8kB" > It should be a single value something like this for the default: shared_buffers = 128MB -- Adrian Klaver adrian.klaver@aklaver.com
Charles Martin <ssappeals@gmail.com> writes: > Ok, thanks for explaining this. Here is the current value: > "shared_buffers" "131072" "8kB" Well, that's 1GB, which might be ambitious inside a VM with a hard restriction to 4GB total RAM. Postgres can get by with a *lot* less. Try knocking it down to a tenth of that and see if it makes a difference. regards, tom lane
to dump has some very large rows that can't be allocated, especially
since both the backend and pg_dump need to have it simultaneously
in memory.
> >pg_dump: The command was: COPY public.docfile (docfile_pkey,
> >docfileoriginalname, ordernumber, versionnum, docfilecontents,
>> docfilepath, docfileextension, enddatetime, endby, editnum, insby,
>> insdatetime, modby, moddatetime, active, doc_fkey) TO stdout;
>The "docfilecontents" column suggests that it might contain
large contents. If its type is bytea, it's going to be expanded
to twice its size to build the hex representation.
>You may get a sense on how big is the biggest row expressed
as text with this query:
SELECT max(length(contents.*::text)) FROM public.docfile;
>If it's big enough that it might cause the OOM issue,
try to run pg_dump remotely through an SSH tunnel [1], which you
can already do in terms of network permissions since you log in with
SSH, so pg_dump itself does not use any memory on the server.
>Also, if the machine doesn't have swap space, it might be
that just adding a few GB's of swap would make the operation
succeed.
>> "shared_buffers""131072""8kB"
>It should be a single value something like this for the default:
>shared_buffers = 128MB
Tom said:
>> "shared_buffers" "131072" "8kB"
>Well, that's 1GB, which might be ambitious inside a VM with a hard
restriction to 4GB total RAM. Postgres can get by with a *lot* less.
>Try knocking it down to a tenth of that and see if it makes a difference
On 11/5/18 7:04 AM, Charles Martin wrote:
> Adrian said:
>
> >Tom was referring to this from your previous post:
> >(postmaster) total-vm:3068900kB,
> >where vm(VM) is Virtual Memory:
> >https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean#22326766
> >So what is your shared_buffers:
> >https://www.postgresql.org/docs/10/static/runtime-config-resource.html
> >set to?
>
> Ok, thanks for explaining this. Here is the current value:
> "shared_buffers""131072""8kB"
>
It should be a single value something like this for the default:
shared_buffers = 128MB
--
Adrian Klaver
adrian.klaver@aklaver.com
[snip]
The results I pasted were from:SELECT * FROM pg_settingsMaybe I didn't get it the right way.
We were expecting the output of the "SHOW SHARED_BUFFERS;" command.
The system has only 4GB of RAM. I read that a reasonable swap size is 1/4 of RAM, so I've created a swap file of 1GB.
Times have changed... (I'd have made a 4GB swap file.)
Angular momentum makes the world go 'round.
>Times have changed... (I'd have made a 4GB swap file.)>The system has only 4GB of RAM. I read that a reasonable swap size is 1/4 of RAM, so I've created a swap file of 1GB.
On 11/05/2018 10:50 AM, Charles Martin wrote:
[snip]The results I pasted were from:SELECT * FROM pg_settingsMaybe I didn't get it the right way.
We were expecting the output of the "SHOW SHARED_BUFFERS;" command.The system has only 4GB of RAM. I read that a reasonable swap size is 1/4 of RAM, so I've created a swap file of 1GB.
Times have changed... (I'd have made a 4GB swap file.)--
Angular momentum makes the world go 'round.
Ron said:I have a spare drive that is 230G, so I have enough space. I suppose I can set swapoff, delete the swapfile, create a new 4G one, and set swapon. Or is there a better way?>We were expecting the output of the "SHOW SHARED_BUFFERS;" command.Ok, the result from that command is:1GB>Times have changed... (I'd have made a 4GB swap file.)>The system has only 4GB of RAM. I read that a reasonable swap size is 1/4 of RAM, so I've created a swap file of 1GB.
Linux already knows about the 1GB file; just add a 3GB swap file on the 230GB drive.
Angular momentum makes the world go 'round.
Charles Martin wrote: > SELECT max(length(docfilecontents::text)) FROM docfile; > and after a very long time, got: > ERROR: invalid memory alloc request size 1636085512 SQL state: XX000 It would mean that at least one row has a "docfilecontents" close to 0.5GB in size. Or that the size fields in certain rows are corrupted, although that's less plausible if you have no reason to suspect hardware errors. Does the following query work: SELECT max(octet_length(docfilecontents)) FROM docfile; or maybe a histogram by size in hundred of megabytes: SELECT octet_length(docfilecontents)/(1024*1024*100), count(*) FROM docfile GROUP BY octet_length(docfilecontents)/(1024*1024*100); Note that the error message above does not say that there's not enough free memory, it says that it won't even try to allocate that much, because 1636085512 is over the "varlena limit" of 1GB. AFAICS I'm afraid that this table as it is now cannot be exported by pg_dump, even if you had enough free memory, because any individual row in COPY cannot exceed 1GB in text format. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Charles Martin wrote:
> SELECT max(length(docfilecontents::text)) FROM docfile;
> and after a very long time, got:
> ERROR: invalid memory alloc request size 1636085512 SQL state: XX000
It would mean that at least one row has a "docfilecontents"
close to 0.5GB in size. Or that the size fields in certain rows
are corrupted, although that's less plausible if you have
no reason to suspect hardware errors.
Does the following query work:
SELECT max(octet_length(docfilecontents)) FROM docfile;
or maybe a histogram by size in hundred of megabytes:
SELECT octet_length(docfilecontents)/(1024*1024*100),
count(*)
FROM docfile
GROUP BY octet_length(docfilecontents)/(1024*1024*100);
Note that the error message above does not say that there's not enough
free memory, it says that it won't even try to allocate that much, because
1636085512 is over the "varlena limit" of 1GB.
AFAICS I'm afraid that this table as it is now cannot be exported
by pg_dump, even if you had enough free memory, because any individual
row in COPY cannot exceed 1GB in text format.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
On 11/5/18 8:50 AM, Charles Martin wrote: > Daniel said: > > This appears to be the case. I ran: > > SELECT max(length(docfilecontents::text)) FROM docfile; > and after a very long time, got: > ERROR: invalid memory alloc request size 1636085512 SQL state: XX000 > > Adrian said: >>> Ok, thanks for explaining this. Here is the current value: >>> "shared_buffers""131072""8kB" > >It should be a single value something like this for the default: > >shared_buffers = 128MB > > The results I pasted were from: > > SELECT * FROM pg_settings > > Maybe I didn't get it the right way. > > The system has only 4GB of RAM. I read that a reasonable swap size is > 1/4 of RAM, so I've created a swap file of 1GB. I would change the shared_buffers setting to 128MB and retry. -- Adrian Klaver adrian.klaver@aklaver.com
Charles Martin wrote: > but the second one returned this: > > 0 "623140" > 1 "53" > 2 "12" > 3 "10" > 4 "1" > 5 "1" > 7 "1" > [null] "162" > > Not quite sure what that means, but if there is just a small number of > overly-large records, I might be able to delete them. If I can find them. The query was: SELECT octet_length(docfilecontents)/(1024*1024*100), count(*) FROM docfile GROUP BY octet_length(docfilecontents)/(1024*1024*100); The results above show that there is one document weighing over 700 MB (the first column being the multiple of 100MB), one between 500 and 600 MB, one between 400 MB and 500 MB, 10 between 300 and 400 MB, and so on. The hex expansion performed by COPY must allocate twice that size, plus the rest of the row, and if that resulting size is above 1GB, it will error out with the message you mentioned upthread: ERROR: invalid memory alloc request size <some value over 1 billion>. So there's no way it can deal with the contents over 500MB, and the ones just under that limit may also be problematic. A quick and dirty way of getting rid of these contents would be to nullify them. For instance, nullify anything over 400MB: UPDATE docfile SET docfilecontents=NULL WHERE octet_length(docfilecontents) > 1024*1024*400; Or a cleaner solution would be to delete them with the application if that's possible. You may turn the above query into a SELECT that retrieve the fields of interest (avoid SELECT * because of the huge column). Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Adrian, I'll try changing shared_buffers the next time I can restart postgres, at least if deleting the largest records and adding VM hasn't worked.
Charles Martin wrote:
> but the second one returned this:
>
> 0 "623140"
> 1 "53"
> 2 "12"
> 3 "10"
> 4 "1"
> 5 "1"
> 7 "1"
> [null] "162"
>
> Not quite sure what that means, but if there is just a small number of
> overly-large records, I might be able to delete them. If I can find them.
The query was:
SELECT octet_length(docfilecontents)/(1024*1024*100),
count(*)
FROM docfile
GROUP BY octet_length(docfilecontents)/(1024*1024*100);
The results above show that there is one document weighing over 700 MB
(the first column being the multiple of 100MB), one between 500 and
600 MB, one between 400 MB and 500 MB, 10 between 300 and 400 MB, and
so on.
The hex expansion performed by COPY must allocate twice that size,
plus the rest of the row, and if that resulting size is above 1GB, it
will error out with the message you mentioned upthread:
ERROR: invalid memory alloc request size <some value over 1 billion>.
So there's no way it can deal with the contents over 500MB, and the
ones just under that limit may also be problematic.
A quick and dirty way of getting rid of these contents would be to
nullify them. For instance, nullify anything over 400MB:
UPDATE docfile SET docfilecontents=NULL
WHERE octet_length(docfilecontents) > 1024*1024*400;
Or a cleaner solution would be to delete them with the application if
that's possible. You may turn the above query into a SELECT that
retrieve the fields of interest (avoid SELECT * because of the huge
column).
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
On 11/6/18 3:47 AM, Daniel Verite wrote: > Charles Martin wrote: > >> but the second one returned this: >> >> 0 "623140" >> 1 "53" >> 2 "12" >> 3 "10" >> 4 "1" >> 5 "1" >> 7 "1" >> [null] "162" >> >> Not quite sure what that means, but if there is just a small number of >> overly-large records, I might be able to delete them. If I can find them. > > The query was: > > SELECT octet_length(docfilecontents)/(1024*1024*100), > count(*) > FROM docfile > GROUP BY octet_length(docfilecontents)/(1024*1024*100); > > The results above show that there is one document weighing over 700 MB > (the first column being the multiple of 100MB), one between 500 and > 600 MB, one between 400 MB and 500 MB, 10 between 300 and 400 MB, and > so on. > > The hex expansion performed by COPY must allocate twice that size, > plus the rest of the row, and if that resulting size is above 1GB, it > will error out with the message you mentioned upthread: > ERROR: invalid memory alloc request size <some value over 1 billion>. > So there's no way it can deal with the contents over 500MB, and the > ones just under that limit may also be problematic. To me that looks like a bug, putting data into a record you cannot get out. > > A quick and dirty way of getting rid of these contents would be to > nullify them. For instance, nullify anything over 400MB: > > UPDATE docfile SET docfilecontents=NULL > WHERE octet_length(docfilecontents) > 1024*1024*400; > > Or a cleaner solution would be to delete them with the application if > that's possible. You may turn the above query into a SELECT that > retrieve the fields of interest (avoid SELECT * because of the huge > column). > > > Best regards, > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver wrote: > > So there's no way it can deal with the contents over 500MB, and the > > ones just under that limit may also be problematic. > > To me that looks like a bug, putting data into a record you cannot get out. Strictly speaking, it could probably get out with COPY in binary format, but pg_dump doesn't use that. It's undoubtedly very annoying that a database can end up with non-pg_dump'able contents, but it's not an easy problem to solve. Some time ago, work was done to extend the 1GB limit but eventually it got scratched. The thread in [1] discusses many details of the problem and why the proposed solution were mostly a band aid. Basically, the specs of COPY and other internal aspects of Postgres are from the 32-bit era when putting the size of an entire CDROM in a single row/column was not anticipated as a valid use case. It's still a narrow use case today and applications that need to store big pieces of data like that should slice them in chunks, a bit like in pg_largeobject, except in much larger chunks, like 1MB. [1] pg_dump / copy bugs with "big lines" ? https://www.postgresql.org/message-id/1836813.YmyOrS99PX%40ronan.dunklau.fr Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On 11/6/18 8:27 AM, Daniel Verite wrote: > Adrian Klaver wrote: > >>> So there's no way it can deal with the contents over 500MB, and the >>> ones just under that limit may also be problematic. >> >> To me that looks like a bug, putting data into a record you cannot get out. > > Strictly speaking, it could probably get out with COPY in binary format, > but pg_dump doesn't use that. > > It's undoubtedly very annoying that a database can end up with > non-pg_dump'able contents, but it's not an easy problem to solve. > Some time ago, work was done to extend the 1GB limit > but eventually it got scratched. The thread in [1] discusses > many details of the problem and why the proposed solution > were mostly a band aid. Basically, the specs of COPY > and other internal aspects of Postgres are from the 32-bit era when > putting the size of an entire CDROM in a single row/column was not > anticipated as a valid use case. > It's still a narrow use case today and applications that need to store > big pieces of data like that should slice them in chunks, a bit like in > pg_largeobject, except in much larger chunks, like 1MB. Should there not be some indication of this in the docs here?: https://www.postgresql.org/docs/11/datatype-binary.html > > [1] pg_dump / copy bugs with "big lines" ? > https://www.postgresql.org/message-id/1836813.YmyOrS99PX%40ronan.dunklau.fr > > Best regards, > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver <adrian.klaver@aklaver.com> writes: > On 11/6/18 8:27 AM, Daniel Verite wrote: >> Adrian Klaver wrote: >>> To me that looks like a bug, putting data into a record you cannot get out. >> Strictly speaking, it could probably get out with COPY in binary format, >> but pg_dump doesn't use that. Another possibility, seeing that the problematic data is bytea, is that it might depend on whether you use hex or escape bytea_output format. Hex format is reliably twice the size of the stored data, but escape format could be anywhere from the same size as the stored data to four times the size, depending on the contents. pg_dump is agnostic about this and will just dump using the prevailing bytea_output setting, so you might be able to get it to work by changing that setting. regards, tom lane
Chuck
As someone pointed out, there is a limit with bytea (Blob's).To test if it is bytea, use a COPY with a select statement :COPY ( select A, B,C ,D ...etc FROM table ) TO 'outfile' ;Leaveing out the bytea column.If this works, then then one of the bytea columns is way to big.Ben Duncan - Business Network Solutions, Inc. 336 Elton Road Jackson MS, 39212
"Never attribute to malice, that which can be adequately explained by stupidity"
- Hanlon's Razor-------- Original Message --------
Subject: Re: Trouble Upgrading Postgres
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue, November 06, 2018 11:53 am
To: Adrian Klaver <adrian.klaver@aklaver.com>
Cc: Daniel Verite <daniel@manitou-mail.org>, Charles Martin
<ssappeals@gmail.com>, pgsql-general
<pgsql-general@postgresql.org>
Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 11/6/18 8:27 AM, Daniel Verite wrote:
>> Adrian Klaver wrote:
>>> To me that looks like a bug, putting data into a record you cannot get out.
>> Strictly speaking, it could probably get out with COPY in binary format,
>> but pg_dump doesn't use that.
Another possibility, seeing that the problematic data is bytea, is that
it might depend on whether you use hex or escape bytea_output format.
Hex format is reliably twice the size of the stored data, but escape
format could be anywhere from the same size as the stored data to four
times the size, depending on the contents. pg_dump is agnostic about this
and will just dump using the prevailing bytea_output setting, so you might
be able to get it to work by changing that setting.
regards, tom lane