Обсуждение: pg_dump crashes

Поиск
Список
Период
Сортировка

pg_dump crashes

От
Nico De Ranter
Дата:
Hi all,

Postgres version: 9.5
OS: Ubuntu 18.04.4

I have a 144GB Bacula database that crashes the postgres daemon when I try to do a pg_dump.
At some point the server ran out of diskspace for the database storage.  I expanded the lvm and rebooted the server. It seemed to work fine, however when I try to dump the bacula database the postgres daemon dies after about 37GB.

I tried copying the database to another machine and upgrading postgres to 11 using pg_upgrade.  The upgrade seems to work but I still get exactly the same problem when trying to dump the database.

postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
pg_dump: Dumping the contents of table "file" failed: PQgetCopyData() failed.
pg_dump: Error message from server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.file (fileid, fileindex, jobid, pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
pg_dumpall: pg_dump failed on database "bacula", exiting

In the logs I see:

2020-05-22 14:23:30.649 CEST [12768] LOG:  server process (PID 534) was terminated by signal 11: Segmentation fault
2020-05-22 14:23:30.649 CEST [12768] DETAIL:  Failed process was running: COPY public.file (fileid, fileindex, jobid, pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
2020-05-22 14:23:30.651 CEST [12768] LOG:  terminating any other active server processes
2020-05-22 14:23:30.651 CEST [482] WARNING:  terminating connection because of crash of another server process
2020-05-22 14:23:30.651 CEST [482] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-05-22 14:23:30.651 CEST [482] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-05-22 14:23:30.652 CEST [12768] LOG:  all server processes terminated; reinitializing
2020-05-22 14:23:30.671 CEST [578] LOG:  database system was interrupted; last known up at 2020-05-22 14:15:19 CEST
2020-05-22 14:23:30.809 CEST [578] LOG:  database system was not properly shut down; automatic recovery in progress
2020-05-22 14:23:30.819 CEST [578] LOG:  redo starts at 197/D605EA18
2020-05-22 14:23:30.819 CEST [578] LOG:  invalid record length at 197/D605EA50: wanted 24, got 0
2020-05-22 14:23:30.819 CEST [578] LOG:  redo done at 197/D605EA18
2020-05-22 14:23:30.876 CEST [12768] LOG:  database system is ready to accept connections
2020-05-22 14:29:07.511 CEST [12768] LOG:  received fast shutdown request


Any ideas how to fix or debug this?

Nico

--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 5:37 AM, Nico De Ranter wrote:
> Hi all,
> 
> Postgres version: 9.5
> OS: Ubuntu 18.04.4
> 
> I have a 144GB Bacula database that crashes the postgres daemon when I 
> try to do a pg_dump.
> At some point the server ran out of diskspace for the database storage.  
> I expanded the lvm and rebooted the server. It seemed to work fine, 
> however when I try to dump the bacula database the postgres daemon dies 
> after about 37GB.
> 
> I tried copying the database to another machine and upgrading postgres 
> to 11 using pg_upgrade.  The upgrade seems to work but I still get 
> exactly the same problem when trying to dump the database.
> 
> postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
> pg_dump: Dumping the contents of table "file" failed: PQgetCopyData() 
> failed.
> pg_dump: Error message from server: server closed the connection 
> unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_dump: The command was: COPY public.file (fileid, fileindex, jobid, 
> pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
> pg_dumpall: pg_dump failed on database "bacula", exiting

What happens if you try to dump just this table?

Something along lines of:

pg_dump -t file -d some_db -U some_user

Have you looked at the system logs to see if it is the OS killing the 
process?


> 
> In the logs I see:
> 
> 2020-05-22 14:23:30.649 CEST [12768] LOG:  server process (PID 534) was 
> terminated by signal 11: Segmentation fault
> 2020-05-22 14:23:30.649 CEST [12768] DETAIL:  Failed process was 
> running: COPY public.file (fileid, fileindex, jobid, pathid, filenameid, 
> deltaseq, markid, lstat, md5) TO stdout;
> 2020-05-22 14:23:30.651 CEST [12768] LOG:  terminating any other active 
> server processes
> 2020-05-22 14:23:30.651 CEST [482] WARNING:  terminating connection 
> because of crash of another server process
> 2020-05-22 14:23:30.651 CEST [482] DETAIL:  The postmaster has commanded 
> this server process to roll back the current transaction and exit, 
> because another server process exited abnormally and possibly corrupted 
> shared memory.
> 2020-05-22 14:23:30.651 CEST [482] HINT:  In a moment you should be able 
> to reconnect to the database and repeat your command.
> 2020-05-22 14:23:30.652 CEST [12768] LOG:  all server processes 
> terminated; reinitializing
> 2020-05-22 14:23:30.671 CEST [578] LOG:  database system was 
> interrupted; last known up at 2020-05-22 14:15:19 CEST
> 2020-05-22 14:23:30.809 CEST [578] LOG:  database system was not 
> properly shut down; automatic recovery in progress
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo starts at 197/D605EA18
> 2020-05-22 14:23:30.819 CEST [578] LOG:  invalid record length at 
> 197/D605EA50: wanted 24, got 0
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo done at 197/D605EA18
> 2020-05-22 14:23:30.876 CEST [12768] LOG:  database system is ready to 
> accept connections
> 2020-05-22 14:29:07.511 CEST [12768] LOG:  received fast shutdown request
> 
> 
> Any ideas how to fix or debug this?
> 
> Nico
> 
> -- 
> 
> Nico De Ranter
> 
> Operations Engineer
> 
> T. +32 16 38 72 10
> 
> 
> <http://www.esaturnus.com>
> 
> <http://www.esaturnus.com>
> 
> 
> eSATURNUS
> Philipssite 5, D, box 28
> 3001 Leuven – Belgium
> 
>     
> 
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com <http://www.esaturnus.com>
> 
> ** <http://www.esaturnus.com/>
> 
> *For Service & Support :*
> 
> Support Line Belgium: +32 2 2009897
> 
> Support Line International: +44 12 56 68 38 78
> 
> Or via email : medical.services.eu@sony.com 
> <mailto:medical.services.eu@sony.com>
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 5:37 AM, Nico De Ranter wrote:
> Hi all,
> 
> Postgres version: 9.5
> OS: Ubuntu 18.04.4
> 
> I have a 144GB Bacula database that crashes the postgres daemon when I 
> try to do a pg_dump.
> At some point the server ran out of diskspace for the database storage.  
> I expanded the lvm and rebooted the server. It seemed to work fine, 
> however when I try to dump the bacula database the postgres daemon dies 
> after about 37GB.
> 
> I tried copying the database to another machine and upgrading postgres 
> to 11 using pg_upgrade.  The upgrade seems to work but I still get 
> exactly the same problem when trying to dump the database.

What was the full command you used to do the pg_upgrade?

> 
> postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
> pg_dump: Dumping the contents of table "file" failed: PQgetCopyData() 
> failed.
> pg_dump: Error message from server: server closed the connection 
> unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_dump: The command was: COPY public.file (fileid, fileindex, jobid, 
> pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
> pg_dumpall: pg_dump failed on database "bacula", exiting
> 
> In the logs I see:
> 
> 2020-05-22 14:23:30.649 CEST [12768] LOG:  server process (PID 534) was 
> terminated by signal 11: Segmentation fault
> 2020-05-22 14:23:30.649 CEST [12768] DETAIL:  Failed process was 
> running: COPY public.file (fileid, fileindex, jobid, pathid, filenameid, 
> deltaseq, markid, lstat, md5) TO stdout;
> 2020-05-22 14:23:30.651 CEST [12768] LOG:  terminating any other active 
> server processes
> 2020-05-22 14:23:30.651 CEST [482] WARNING:  terminating connection 
> because of crash of another server process
> 2020-05-22 14:23:30.651 CEST [482] DETAIL:  The postmaster has commanded 
> this server process to roll back the current transaction and exit, 
> because another server process exited abnormally and possibly corrupted 
> shared memory.
> 2020-05-22 14:23:30.651 CEST [482] HINT:  In a moment you should be able 
> to reconnect to the database and repeat your command.
> 2020-05-22 14:23:30.652 CEST [12768] LOG:  all server processes 
> terminated; reinitializing
> 2020-05-22 14:23:30.671 CEST [578] LOG:  database system was 
> interrupted; last known up at 2020-05-22 14:15:19 CEST
> 2020-05-22 14:23:30.809 CEST [578] LOG:  database system was not 
> properly shut down; automatic recovery in progress
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo starts at 197/D605EA18
> 2020-05-22 14:23:30.819 CEST [578] LOG:  invalid record length at 
> 197/D605EA50: wanted 24, got 0
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo done at 197/D605EA18
> 2020-05-22 14:23:30.876 CEST [12768] LOG:  database system is ready to 
> accept connections
> 2020-05-22 14:29:07.511 CEST [12768] LOG:  received fast shutdown request
> 
> 
> Any ideas how to fix or debug this?
> 
> Nico
> 
> -- 
> 
> Nico De Ranter
> 
> Operations Engineer
> 
> T. +32 16 38 72 10
> 
> 
> <http://www.esaturnus.com>
> 
> <http://www.esaturnus.com>
> 
> 
> eSATURNUS
> Philipssite 5, D, box 28
> 3001 Leuven – Belgium
> 
>     
> 
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com <http://www.esaturnus.com>
> 
> ** <http://www.esaturnus.com/>
> 
> *For Service & Support :*
> 
> Support Line Belgium: +32 2 2009897
> 
> Support Line International: +44 12 56 68 38 78
> 
> Or via email : medical.services.eu@sony.com 
> <mailto:medical.services.eu@sony.com>
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Nico De Ranter
Дата:
I was just trying that.  It's always the same (huge) table that crashes the pg_dump.   Running a dump excluding that one table goes fine, running a dump of only that one table crashes.
In the system logs I always see a segfault

May 22 15:22:14 core4 kernel: [337837.874618] postgres[1311]: segfault at 7f778008ed0d ip 000055f197ccc008 sp 00007ffdd1fc15a8 error 4 in postgres[55f1977c0000+727000]

It doesn't seem to be an Out-of-memory thing (at least not on the OS level).
The database is currently installed on a dedicated server with 32GB RAM.   I tried tweaking some of the memory parameters for postgres, but the crash always happens at the exact same spot (if I run pg_dump for that one table with and without memory tweaks the resulting files are identical).

One thing I just noticed looking at the dump file: at around the end of the file I see this:

2087983804 516130 37989 2218636 3079067 0 0 P4B BcISC IGk L BOT BOP A jC BAA I BeMj/b BceUl6 BehUAn 0Ms A C I4p9CBfUiSeAPU4eDuipKQ
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
6071772946555290175 1056985679 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????

2087983833 554418 37989 5405605 14507502 0 0 P4B Bb8c/ IGk L BOS BOP A Lfh BAA Bg BeMj+2 Bd1LVN BehUAl rlx ABA TOR

It looks suspicious however there are about 837 more lines before the output stops.

Nico

On Fri, May 22, 2020 at 3:27 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/22/20 5:37 AM, Nico De Ranter wrote:
> Hi all,
>
> Postgres version: 9.5
> OS: Ubuntu 18.04.4
>
> I have a 144GB Bacula database that crashes the postgres daemon when I
> try to do a pg_dump.
> At some point the server ran out of diskspace for the database storage. 
> I expanded the lvm and rebooted the server. It seemed to work fine,
> however when I try to dump the bacula database the postgres daemon dies
> after about 37GB.
>
> I tried copying the database to another machine and upgrading postgres
> to 11 using pg_upgrade.  The upgrade seems to work but I still get
> exactly the same problem when trying to dump the database.
>
> postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
> pg_dump: Dumping the contents of table "file" failed: PQgetCopyData()
> failed.
> pg_dump: Error message from server: server closed the connection
> unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_dump: The command was: COPY public.file (fileid, fileindex, jobid,
> pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
> pg_dumpall: pg_dump failed on database "bacula", exiting

What happens if you try to dump just this table?

Something along lines of:

pg_dump -t file -d some_db -U some_user

Have you looked at the system logs to see if it is the OS killing the
process?


>
> In the logs I see:
>
> 2020-05-22 14:23:30.649 CEST [12768] LOG:  server process (PID 534) was
> terminated by signal 11: Segmentation fault
> 2020-05-22 14:23:30.649 CEST [12768] DETAIL:  Failed process was
> running: COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
> deltaseq, markid, lstat, md5) TO stdout;
> 2020-05-22 14:23:30.651 CEST [12768] LOG:  terminating any other active
> server processes
> 2020-05-22 14:23:30.651 CEST [482] WARNING:  terminating connection
> because of crash of another server process
> 2020-05-22 14:23:30.651 CEST [482] DETAIL:  The postmaster has commanded
> this server process to roll back the current transaction and exit,
> because another server process exited abnormally and possibly corrupted
> shared memory.
> 2020-05-22 14:23:30.651 CEST [482] HINT:  In a moment you should be able
> to reconnect to the database and repeat your command.
> 2020-05-22 14:23:30.652 CEST [12768] LOG:  all server processes
> terminated; reinitializing
> 2020-05-22 14:23:30.671 CEST [578] LOG:  database system was
> interrupted; last known up at 2020-05-22 14:15:19 CEST
> 2020-05-22 14:23:30.809 CEST [578] LOG:  database system was not
> properly shut down; automatic recovery in progress
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo starts at 197/D605EA18
> 2020-05-22 14:23:30.819 CEST [578] LOG:  invalid record length at
> 197/D605EA50: wanted 24, got 0
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo done at 197/D605EA18
> 2020-05-22 14:23:30.876 CEST [12768] LOG:  database system is ready to
> accept connections
> 2020-05-22 14:29:07.511 CEST [12768] LOG:  received fast shutdown request
>
>
> Any ideas how to fix or debug this?
>
> Nico
>
> --
>
> Nico De Ranter
>
> Operations Engineer
>
> T. +32 16 38 72 10
>
>
> <http://www.esaturnus.com>
>
> <http://www.esaturnus.com>
>
>
> eSATURNUS
> Philipssite 5, D, box 28
> 3001 Leuven – Belgium
>
>       
>
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com <http://www.esaturnus.com>
>
> ** <http://www.esaturnus.com/>
>
> *For Service & Support :*
>
> Support Line Belgium: +32 2 2009897
>
> Support Line International: +44 12 56 68 38 78
>
> Or via email : medical.services.eu@sony.com
> <mailto:medical.services.eu@sony.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Nico De Ranter
Дата:
/usr/lib/postgresql/11/bin/pg_upgrade 
           --old-datadir /data/postgresql/9.5/main/ 
           --new-datadir /var/lib/postgresql/11/main/ 
           -b /usr/lib/postgresql/9.5/bin 
           -B /usr/lib/postgresql/11/bin 
           -o ' -c config_file=/etc/postgresql/9.5/main/postgresql.conf'
           -O ' -c config_file=/etc/postgresql/11/main/postgresql.conf'

On Fri, May 22, 2020 at 3:35 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/22/20 5:37 AM, Nico De Ranter wrote:
> Hi all,
>
> Postgres version: 9.5
> OS: Ubuntu 18.04.4
>
> I have a 144GB Bacula database that crashes the postgres daemon when I
> try to do a pg_dump.
> At some point the server ran out of diskspace for the database storage. 
> I expanded the lvm and rebooted the server. It seemed to work fine,
> however when I try to dump the bacula database the postgres daemon dies
> after about 37GB.
>
> I tried copying the database to another machine and upgrading postgres
> to 11 using pg_upgrade.  The upgrade seems to work but I still get
> exactly the same problem when trying to dump the database.

What was the full command you used to do the pg_upgrade?

>
> postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
> pg_dump: Dumping the contents of table "file" failed: PQgetCopyData()
> failed.
> pg_dump: Error message from server: server closed the connection
> unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_dump: The command was: COPY public.file (fileid, fileindex, jobid,
> pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
> pg_dumpall: pg_dump failed on database "bacula", exiting
>
> In the logs I see:
>
> 2020-05-22 14:23:30.649 CEST [12768] LOG:  server process (PID 534) was
> terminated by signal 11: Segmentation fault
> 2020-05-22 14:23:30.649 CEST [12768] DETAIL:  Failed process was
> running: COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
> deltaseq, markid, lstat, md5) TO stdout;
> 2020-05-22 14:23:30.651 CEST [12768] LOG:  terminating any other active
> server processes
> 2020-05-22 14:23:30.651 CEST [482] WARNING:  terminating connection
> because of crash of another server process
> 2020-05-22 14:23:30.651 CEST [482] DETAIL:  The postmaster has commanded
> this server process to roll back the current transaction and exit,
> because another server process exited abnormally and possibly corrupted
> shared memory.
> 2020-05-22 14:23:30.651 CEST [482] HINT:  In a moment you should be able
> to reconnect to the database and repeat your command.
> 2020-05-22 14:23:30.652 CEST [12768] LOG:  all server processes
> terminated; reinitializing
> 2020-05-22 14:23:30.671 CEST [578] LOG:  database system was
> interrupted; last known up at 2020-05-22 14:15:19 CEST
> 2020-05-22 14:23:30.809 CEST [578] LOG:  database system was not
> properly shut down; automatic recovery in progress
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo starts at 197/D605EA18
> 2020-05-22 14:23:30.819 CEST [578] LOG:  invalid record length at
> 197/D605EA50: wanted 24, got 0
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo done at 197/D605EA18
> 2020-05-22 14:23:30.876 CEST [12768] LOG:  database system is ready to
> accept connections
> 2020-05-22 14:29:07.511 CEST [12768] LOG:  received fast shutdown request
>
>
> Any ideas how to fix or debug this?
>
> Nico
>
> --
>
> Nico De Ranter
>
> Operations Engineer
>
> T. +32 16 38 72 10
>
>
> <http://www.esaturnus.com>
>
> <http://www.esaturnus.com>
>
>
> eSATURNUS
> Philipssite 5, D, box 28
> 3001 Leuven – Belgium
>
>       
>
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com <http://www.esaturnus.com>
>
> ** <http://www.esaturnus.com/>
>
> *For Service & Support :*
>
> Support Line Belgium: +32 2 2009897
>
> Support Line International: +44 12 56 68 38 78
>
> Or via email : medical.services.eu@sony.com
> <mailto:medical.services.eu@sony.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Andreas Kretschmer
Дата:

Am 22.05.20 um 14:37 schrieb Nico De Ranter:
> Postgres version: 9.5

which minor-version?

Can you check if the table has TOAST-Tables? Can you try to select all 
columns but not TOASTed columns?
Maybe there is data-corruption only in toast-tables.


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: pg_dump crashes

От
Ron
Дата:
> pg_dump: The command was: COPY public.file (fileid, fileindex, jobid, pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;

What happens when you run that COPY ... TO stdout; command (but redirecting it to /dev/null)?

On 5/22/20 8:40 AM, Nico De Ranter wrote:
I was just trying that.  It's always the same (huge) table that crashes the pg_dump.   Running a dump excluding that one table goes fine, running a dump of only that one table crashes.
In the system logs I always see a segfault

May 22 15:22:14 core4 kernel: [337837.874618] postgres[1311]: segfault at 7f778008ed0d ip 000055f197ccc008 sp 00007ffdd1fc15a8 error 4 in postgres[55f1977c0000+727000]

It doesn't seem to be an Out-of-memory thing (at least not on the OS level).
The database is currently installed on a dedicated server with 32GB RAM.   I tried tweaking some of the memory parameters for postgres, but the crash always happens at the exact same spot (if I run pg_dump for that one table with and without memory tweaks the resulting files are identical).

One thing I just noticed looking at the dump file: at around the end of the file I see this:

2087983804 516130 37989 2218636 3079067 0 0 P4B BcISC IGk L BOT BOP A jC BAA I BeMj/b BceUl6 BehUAn 0Ms A C I4p9CBfUiSeAPU4eDuipKQ
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
6071772946555290175 1056985679 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????

2087983833 554418 37989 5405605 14507502 0 0 P4B Bb8c/ IGk L BOS BOP A Lfh BAA Bg BeMj+2 Bd1LVN BehUAl rlx ABA TOR

It looks suspicious however there are about 837 more lines before the output stops.

Nico

On Fri, May 22, 2020 at 3:27 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/22/20 5:37 AM, Nico De Ranter wrote:
> Hi all,
>
> Postgres version: 9.5
> OS: Ubuntu 18.04.4
>
> I have a 144GB Bacula database that crashes the postgres daemon when I
> try to do a pg_dump.
> At some point the server ran out of diskspace for the database storage. 
> I expanded the lvm and rebooted the server. It seemed to work fine,
> however when I try to dump the bacula database the postgres daemon dies
> after about 37GB.
>
> I tried copying the database to another machine and upgrading postgres
> to 11 using pg_upgrade.  The upgrade seems to work but I still get
> exactly the same problem when trying to dump the database.
>
> postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
> pg_dump: Dumping the contents of table "file" failed: PQgetCopyData()
> failed.
> pg_dump: Error message from server: server closed the connection
> unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_dump: The command was: COPY public.file (fileid, fileindex, jobid,
> pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
> pg_dumpall: pg_dump failed on database "bacula", exiting

What happens if you try to dump just this table?

Something along lines of:

pg_dump -t file -d some_db -U some_user

Have you looked at the system logs to see if it is the OS killing the
process?


>
> In the logs I see:
>
> 2020-05-22 14:23:30.649 CEST [12768] LOG:  server process (PID 534) was
> terminated by signal 11: Segmentation fault
> 2020-05-22 14:23:30.649 CEST [12768] DETAIL:  Failed process was
> running: COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
> deltaseq, markid, lstat, md5) TO stdout;
> 2020-05-22 14:23:30.651 CEST [12768] LOG:  terminating any other active
> server processes
> 2020-05-22 14:23:30.651 CEST [482] WARNING:  terminating connection
> because of crash of another server process
> 2020-05-22 14:23:30.651 CEST [482] DETAIL:  The postmaster has commanded
> this server process to roll back the current transaction and exit,
> because another server process exited abnormally and possibly corrupted
> shared memory.
> 2020-05-22 14:23:30.651 CEST [482] HINT:  In a moment you should be able
> to reconnect to the database and repeat your command.
> 2020-05-22 14:23:30.652 CEST [12768] LOG:  all server processes
> terminated; reinitializing
> 2020-05-22 14:23:30.671 CEST [578] LOG:  database system was
> interrupted; last known up at 2020-05-22 14:15:19 CEST
> 2020-05-22 14:23:30.809 CEST [578] LOG:  database system was not
> properly shut down; automatic recovery in progress
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo starts at 197/D605EA18
> 2020-05-22 14:23:30.819 CEST [578] LOG:  invalid record length at
> 197/D605EA50: wanted 24, got 0
> 2020-05-22 14:23:30.819 CEST [578] LOG:  redo done at 197/D605EA18
> 2020-05-22 14:23:30.876 CEST [12768] LOG:  database system is ready to
> accept connections
> 2020-05-22 14:29:07.511 CEST [12768] LOG:  received fast shutdown request
>
>
> Any ideas how to fix or debug this?
>
> Nico
>
> --
>
> Nico De Ranter
>
> Operations Engineer
>
> T. +32 16 38 72 10
>
>
> <http://www.esaturnus.com>
>
> <http://www.esaturnus.com>
>
>
> eSATURNUS
> Philipssite 5, D, box 28
> 3001 Leuven – Belgium
>
>       
>
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com <http://www.esaturnus.com>
>
> ** <http://www.esaturnus.com/>
>
> *For Service & Support :*
>
> Support Line Belgium: +32 2 2009897
>
> Support Line International: +44 12 56 68 38 78
>
> Or via email : medical.services.eu@sony.com
> <mailto:medical.services.eu@sony.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com



--
Angular momentum makes the world go 'round.

Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 6:40 AM, Nico De Ranter wrote:
> I was just trying that.  It's always the same (huge) table that crashes 
> the pg_dump.   Running a dump excluding that one table goes fine, 
> running a dump of only that one table crashes.
> In the system logs I always see a segfault
> 
> May 22 15:22:14 core4 kernel: [337837.874618] postgres[1311]: segfault 
> at 7f778008ed0d ip 000055f197ccc008 sp 00007ffdd1fc15a8 error 4 in 
> postgres[55f1977c0000+727000]
> 
> It doesn't seem to be an Out-of-memory thing (at least not on the OS level).
> The database is currently installed on a dedicated server with 32GB 
> RAM.   I tried tweaking some of the memory parameters for postgres, but 
> the crash always happens at the exact same spot (if I run pg_dump for 
> that one table with and without memory tweaks the resulting files are 
> identical).
> 
> One thing I just noticed looking at the dump file: at around the end of 
> the file I see this:

So the below is the output from?:

pg_dumpall --cluster 11/main --file=dump.sql

> 
> 2087983804 516130 37989 2218636 3079067 0 0 P4B BcISC IGk L BOT BOP A jC 
> BAA I BeMj/b BceUl6 BehUAn 0Ms A C I4p9CBfUiSeAPU4eDuipKQ
> *4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 
> \N \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 6071772946555290175 1056985679 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????*
> 2087983833 554418 37989 5405605 14507502 0 0 P4B Bb8c/ IGk L BOS BOP A 
> Lfh BAA Bg BeMj+2 Bd1LVN BehUAl rlx ABA TOR
> 
> It looks suspicious however there are about 837 more lines before the 
> output stops.
> 
> Nico
> 
> On Fri, May 22, 2020 at 3:27 PM Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 5/22/20 5:37 AM, Nico De Ranter wrote:
>      > Hi all,
>      >
>      > Postgres version: 9.5
>      > OS: Ubuntu 18.04.4
>      >
>      > I have a 144GB Bacula database that crashes the postgres daemon
>     when I
>      > try to do a pg_dump.
>      > At some point the server ran out of diskspace for the database
>     storage.
>      > I expanded the lvm and rebooted the server. It seemed to work fine,
>      > however when I try to dump the bacula database the postgres
>     daemon dies
>      > after about 37GB.
>      >
>      > I tried copying the database to another machine and upgrading
>     postgres
>      > to 11 using pg_upgrade.  The upgrade seems to work but I still get
>      > exactly the same problem when trying to dump the database.
>      >
>      > postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
>      > pg_dump: Dumping the contents of table "file" failed:
>     PQgetCopyData()
>      > failed.
>      > pg_dump: Error message from server: server closed the connection
>      > unexpectedly
>      > This probably means the server terminated abnormally
>      > before or while processing the request.
>      > pg_dump: The command was: COPY public.file (fileid, fileindex,
>     jobid,
>      > pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
>      > pg_dumpall: pg_dump failed on database "bacula", exiting
> 
>     What happens if you try to dump just this table?
> 
>     Something along lines of:
> 
>     pg_dump -t file -d some_db -U some_user
> 
>     Have you looked at the system logs to see if it is the OS killing the
>     process?
> 
> 
>      >
>      > In the logs I see:
>      >
>      > 2020-05-22 14:23:30.649 CEST [12768] LOG:  server process (PID
>     534) was
>      > terminated by signal 11: Segmentation fault
>      > 2020-05-22 14:23:30.649 CEST [12768] DETAIL:  Failed process was
>      > running: COPY public.file (fileid, fileindex, jobid, pathid,
>     filenameid,
>      > deltaseq, markid, lstat, md5) TO stdout;
>      > 2020-05-22 14:23:30.651 CEST [12768] LOG:  terminating any other
>     active
>      > server processes
>      > 2020-05-22 14:23:30.651 CEST [482] WARNING:  terminating connection
>      > because of crash of another server process
>      > 2020-05-22 14:23:30.651 CEST [482] DETAIL:  The postmaster has
>     commanded
>      > this server process to roll back the current transaction and exit,
>      > because another server process exited abnormally and possibly
>     corrupted
>      > shared memory.
>      > 2020-05-22 14:23:30.651 CEST [482] HINT:  In a moment you should
>     be able
>      > to reconnect to the database and repeat your command.
>      > 2020-05-22 14:23:30.652 CEST [12768] LOG:  all server processes
>      > terminated; reinitializing
>      > 2020-05-22 14:23:30.671 CEST [578] LOG:  database system was
>      > interrupted; last known up at 2020-05-22 14:15:19 CEST
>      > 2020-05-22 14:23:30.809 CEST [578] LOG:  database system was not
>      > properly shut down; automatic recovery in progress
>      > 2020-05-22 14:23:30.819 CEST [578] LOG:  redo starts at 197/D605EA18
>      > 2020-05-22 14:23:30.819 CEST [578] LOG:  invalid record length at
>      > 197/D605EA50: wanted 24, got 0
>      > 2020-05-22 14:23:30.819 CEST [578] LOG:  redo done at 197/D605EA18
>      > 2020-05-22 14:23:30.876 CEST [12768] LOG:  database system is
>     ready to
>      > accept connections
>      > 2020-05-22 14:29:07.511 CEST [12768] LOG:  received fast shutdown
>     request
>      >
>      >
>      > Any ideas how to fix or debug this?
>      >
>      > Nico
>      >
>      > --
>      >
>      > Nico De Ranter
>      >
>      > Operations Engineer
>      >
>      > T. +32 16 38 72 10
>      >
>      >
>      > <http://www.esaturnus.com>
>      >
>      > <http://www.esaturnus.com>
>      >
>      >
>      > eSATURNUS
>      > Philipssite 5, D, box 28
>      > 3001 Leuven – Belgium
>      >
>      >
>      >
>      > T. +32 16 40 12 82
>      > F. +32 16 40 84 77
>      > www.esaturnus.com <http://www.esaturnus.com>
>     <http://www.esaturnus.com>
>      >
>      > ** <http://www.esaturnus.com/>
>      >
>      > *For Service & Support :*
>      >
>      > Support Line Belgium: +32 2 2009897
>      >
>      > Support Line International: +44 12 56 68 38 78
>      >
>      > Or via email : medical.services.eu@sony.com
>     <mailto:medical.services.eu@sony.com>
>      > <mailto:medical.services.eu@sony.com
>     <mailto:medical.services.eu@sony.com>>
>      >
>      >
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 
> 
> 
> -- 
> 
> Nico De Ranter
> 
> Operations Engineer
> 
> T. +32 16 38 72 10
> 
> 
> <http://www.esaturnus.com>
> 
> <http://www.esaturnus.com>
> 
> 
> eSATURNUS
> Philipssite 5, D, box 28
> 3001 Leuven – Belgium
> 
>     
> 
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com <http://www.esaturnus.com>
> 
> ** <http://www.esaturnus.com/>
> 
> *For Service & Support :*
> 
> Support Line Belgium: +32 2 2009897
> 
> Support Line International: +44 12 56 68 38 78
> 
> Or via email : medical.services.eu@sony.com 
> <mailto:medical.services.eu@sony.com>
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Nico De Ranter
Дата:
The original server was running 9.5.14
The system I am currently testing on is  11.8

2 fields are marked as 'extended'.   However if I understand correctly the table isn't actually toasted:


  oid  |    table_schema    |       table_name        | total_bytes  |   total    |   index    |   toast    |   table    
-------+--------------------+-------------------------+--------------+------------+------------+------------+------------
 19601 | public             | file                    | 147648061440 | 138 GB     | 57 GB      | 8192 bytes | 80 GB


On Fri, May 22, 2020 at 3:58 PM Andreas Kretschmer <andreas@a-kretschmer.de> wrote:


Am 22.05.20 um 14:37 schrieb Nico De Ranter:
> Postgres version: 9.5

which minor-version?

Can you check if the table has TOAST-Tables? Can you try to select all
columns but not TOASTed columns?
Maybe there is data-corruption only in toast-tables.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com





--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Nico De Ranter
Дата:
Correct.

If I run 'pg_dumpall --cluster 11/main --file=dump.sql'   the end of the file looks like:

###### cut here
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
\.

###### cut here

If I run 'pg_dump --table=public.file --cluster 11/main --file=dump-2.sql bacula'  those lines are actually followed by about 850 or so lines that look ok.  I'm assuming the difference is due to buffering.
However the fact that I do see a number of regular lines following this may suggest it's just garbage in the table but not really causing the issue afterall.

Nico



On Fri, May 22, 2020 at 4:47 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/22/20 6:40 AM, Nico De Ranter wrote:
> I was just trying that.  It's always the same (huge) table that crashes
> the pg_dump.   Running a dump excluding that one table goes fine,
> running a dump of only that one table crashes.
> In the system logs I always see a segfault
>
> May 22 15:22:14 core4 kernel: [337837.874618] postgres[1311]: segfault
> at 7f778008ed0d ip 000055f197ccc008 sp 00007ffdd1fc15a8 error 4 in
> postgres[55f1977c0000+727000]
>
> It doesn't seem to be an Out-of-memory thing (at least not on the OS level).
> The database is currently installed on a dedicated server with 32GB
> RAM.   I tried tweaking some of the memory parameters for postgres, but
> the crash always happens at the exact same spot (if I run pg_dump for
> that one table with and without memory tweaks the resulting files are
> identical).
>
> One thing I just noticed looking at the dump file: at around the end of
> the file I see this:

So the below is the output from?:

pg_dumpall --cluster 11/main --file=dump.sql

>
> 2087983804 516130 37989 2218636 3079067 0 0 P4B BcISC IGk L BOT BOP A jC
> BAA I BeMj/b BceUl6 BehUAn 0Ms A C I4p9CBfUiSeAPU4eDuipKQ
> *4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191
> \N \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 6071772946555290175 1056985679 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????*
> 2087983833 554418 37989 5405605 14507502 0 0 P4B Bb8c/ IGk L BOS BOP A
> Lfh BAA Bg BeMj+2 Bd1LVN BehUAl rlx ABA TOR
>
> It looks suspicious however there are about 837 more lines before the
> output stops.
>
> Nico
>
> On Fri, May 22, 2020 at 3:27 PM Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 5/22/20 5:37 AM, Nico De Ranter wrote:
>      > Hi all,
>      >
>      > Postgres version: 9.5
>      > OS: Ubuntu 18.04.4
>      >
>      > I have a 144GB Bacula database that crashes the postgres daemon
>     when I
>      > try to do a pg_dump.
>      > At some point the server ran out of diskspace for the database
>     storage.
>      > I expanded the lvm and rebooted the server. It seemed to work fine,
>      > however when I try to dump the bacula database the postgres
>     daemon dies
>      > after about 37GB.
>      >
>      > I tried copying the database to another machine and upgrading
>     postgres
>      > to 11 using pg_upgrade.  The upgrade seems to work but I still get
>      > exactly the same problem when trying to dump the database.
>      >
>      > postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
>      > pg_dump: Dumping the contents of table "file" failed:
>     PQgetCopyData()
>      > failed.
>      > pg_dump: Error message from server: server closed the connection
>      > unexpectedly
>      > This probably means the server terminated abnormally
>      > before or while processing the request.
>      > pg_dump: The command was: COPY public.file (fileid, fileindex,
>     jobid,
>      > pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
>      > pg_dumpall: pg_dump failed on database "bacula", exiting
>
>     What happens if you try to dump just this table?
>
>     Something along lines of:
>
>     pg_dump -t file -d some_db -U some_user
>
>     Have you looked at the system logs to see if it is the OS killing the
>     process?
>
>
>      >
>      > In the logs I see:
>      >
>      > 2020-05-22 14:23:30.649 CEST [12768] LOG:  server process (PID
>     534) was
>      > terminated by signal 11: Segmentation fault
>      > 2020-05-22 14:23:30.649 CEST [12768] DETAIL:  Failed process was
>      > running: COPY public.file (fileid, fileindex, jobid, pathid,
>     filenameid,
>      > deltaseq, markid, lstat, md5) TO stdout;
>      > 2020-05-22 14:23:30.651 CEST [12768] LOG:  terminating any other
>     active
>      > server processes
>      > 2020-05-22 14:23:30.651 CEST [482] WARNING:  terminating connection
>      > because of crash of another server process
>      > 2020-05-22 14:23:30.651 CEST [482] DETAIL:  The postmaster has
>     commanded
>      > this server process to roll back the current transaction and exit,
>      > because another server process exited abnormally and possibly
>     corrupted
>      > shared memory.
>      > 2020-05-22 14:23:30.651 CEST [482] HINT:  In a moment you should
>     be able
>      > to reconnect to the database and repeat your command.
>      > 2020-05-22 14:23:30.652 CEST [12768] LOG:  all server processes
>      > terminated; reinitializing
>      > 2020-05-22 14:23:30.671 CEST [578] LOG:  database system was
>      > interrupted; last known up at 2020-05-22 14:15:19 CEST
>      > 2020-05-22 14:23:30.809 CEST [578] LOG:  database system was not
>      > properly shut down; automatic recovery in progress
>      > 2020-05-22 14:23:30.819 CEST [578] LOG:  redo starts at 197/D605EA18
>      > 2020-05-22 14:23:30.819 CEST [578] LOG:  invalid record length at
>      > 197/D605EA50: wanted 24, got 0
>      > 2020-05-22 14:23:30.819 CEST [578] LOG:  redo done at 197/D605EA18
>      > 2020-05-22 14:23:30.876 CEST [12768] LOG:  database system is
>     ready to
>      > accept connections
>      > 2020-05-22 14:29:07.511 CEST [12768] LOG:  received fast shutdown
>     request
>      >
>      >
>      > Any ideas how to fix or debug this?
>      >
>      > Nico
>      >
>      > --
>      >
>      > Nico De Ranter
>      >
>      > Operations Engineer
>      >
>      > T. +32 16 38 72 10
>      >
>      >
>      > <http://www.esaturnus.com>
>      >
>      > <http://www.esaturnus.com>
>      >
>      >
>      > eSATURNUS
>      > Philipssite 5, D, box 28
>      > 3001 Leuven – Belgium
>      >
>      >
>      >
>      > T. +32 16 40 12 82
>      > F. +32 16 40 84 77
>      > www.esaturnus.com <http://www.esaturnus.com>
>     <http://www.esaturnus.com>
>      >
>      > ** <http://www.esaturnus.com/>
>      >
>      > *For Service & Support :*
>      >
>      > Support Line Belgium: +32 2 2009897
>      >
>      > Support Line International: +44 12 56 68 38 78
>      >
>      > Or via email : medical.services.eu@sony.com
>     <mailto:medical.services.eu@sony.com>
>      > <mailto:medical.services.eu@sony.com
>     <mailto:medical.services.eu@sony.com>>
>      >
>      >
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
>
> --
>
> Nico De Ranter
>
> Operations Engineer
>
> T. +32 16 38 72 10
>
>
> <http://www.esaturnus.com>
>
> <http://www.esaturnus.com>
>
>
> eSATURNUS
> Philipssite 5, D, box 28
> 3001 Leuven – Belgium
>
>       
>
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com <http://www.esaturnus.com>
>
> ** <http://www.esaturnus.com/>
>
> *For Service & Support :*
>
> Support Line Belgium: +32 2 2009897
>
> Support Line International: +44 12 56 68 38 78
>
> Or via email : medical.services.eu@sony.com
> <mailto:medical.services.eu@sony.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 7:55 AM, Nico De Ranter wrote:
> Correct.
> 
> If I run 'pg_dumpall --cluster 11/main --file=dump.sql'   the end of the 
> file looks like:
> 
> ###### cut here
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N 
> \N ??????????????????????????????
> \.
> 
> ###### cut here
> 
> If I run 'pg_dump --table=public.file --cluster 11/main 
> --file=dump-2.sql bacula'  those lines are actually followed by about 
> 850 or so lines that look ok.  I'm assuming the difference is due to 
> buffering.
> However the fact that I do see a number of regular lines following this 
> may suggest it's just garbage in the table but not really causing the 
> issue afterall.

Assuming the above matches:

COPY public.file (fileid, fileindex, jobid, pathid, filenameid, 
deltaseq, markid, lstat, md5)

the '????????????????????' would be for the md5 field. I'm going to say 
that is important.

> 
> Nico
> 
> 
> 

> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Nico De Ranter
Дата:

On Fri, May 22, 2020 at 5:02 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/22/20 7:55 AM, Nico De Ranter wrote:
> Correct.
>
> If I run 'pg_dumpall --cluster 11/main --file=dump.sql'   the end of the
> file looks like:
>
> ###### cut here
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> 4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N
> \N ??????????????????????????????
> \.
>
> ###### cut here
>
> If I run 'pg_dump --table=public.file --cluster 11/main
> --file=dump-2.sql bacula'  those lines are actually followed by about
> 850 or so lines that look ok.  I'm assuming the difference is due to
> buffering.
> However the fact that I do see a number of regular lines following this
> may suggest it's just garbage in the table but not really causing the
> issue afterall.

Assuming the above matches:

COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
deltaseq, markid, lstat, md5)

the '????????????????????' would be for the md5 field. I'm going to say
that is important.

But that would be content of the database only. The should matter for the application but not for a dump of the database, right?
 

>
> Nico
>
>
>

>


--
Adrian Klaver
adrian.klaver@aklaver.com




--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 7:48 AM, Nico De Ranter wrote:
> The original server was running 9.5.14
> The system I am currently testing on is  11.8
> 
> 2 fields are marked as 'extended'.   However if I understand correctly 
> the table isn't actually toasted:
> 
> 
>    oid  |    table_schema    |       table_name        | total_bytes  | 
>    total    |   index    |   toast    |   table
>
-------+--------------------+-------------------------+--------------+------------+------------+------------+------------
>   19601 | public             | file                    | 147648061440 | 
> 138 GB     | 57 GB      | 8192 bytes | 80 GB

What query are you using to get above?

> 
> 
> On Fri, May 22, 2020 at 3:58 PM Andreas Kretschmer 
> <andreas@a-kretschmer.de <mailto:andreas@a-kretschmer.de>> wrote:
> 
> 
> 
>     Am 22.05.20 um 14:37 schrieb Nico De Ranter:
>      > Postgres version: 9.5
> 
>     which minor-version?
> 
>     Can you check if the table has TOAST-Tables? Can you try to select all
>     columns but not TOASTed columns?
>     Maybe there is data-corruption only in toast-tables.
> 
> 
>     Regards, Andreas
> 
>     -- 
>     2ndQuadrant - The PostgreSQL Support Company.
>     www.2ndQuadrant.com <http://www.2ndQuadrant.com>
> 
> 
> 
> 
> 
> -- 
> 
> Nico De Ranter
> 
> Operations Engineer
> 
> T. +32 16 38 72 10
> 
> 
> <http://www.esaturnus.com>
> 
> <http://www.esaturnus.com>
> 
> 
> eSATURNUS
> Philipssite 5, D, box 28
> 3001 Leuven – Belgium
> 
>     
> 
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com <http://www.esaturnus.com>
> 
> ** <http://www.esaturnus.com/>
> 
> *For Service & Support :*
> 
> Support Line Belgium: +32 2 2009897
> 
> Support Line International: +44 12 56 68 38 78
> 
> Or via email : medical.services.eu@sony.com 
> <mailto:medical.services.eu@sony.com>
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 8:05 AM, Nico De Ranter wrote:
> 
> On Fri, May 22, 2020 at 5:02 PM Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 5/22/20 7:55 AM, Nico De Ranter wrote:
>      > Correct.
>      >
>      > If I run 'pg_dumpall --cluster 11/main --file=dump.sql'   the end
>     of the
>      > file looks like:
>      >
>      > ###### cut here
>      > 4557430888798830399 1061109567 1061109567 1061109567 1061109567
>     16191 \N
>      > \N ??????????????????????????????
>      > 4557430888798830399 1061109567 1061109567 1061109567 1061109567
>     16191 \N
>      > \N ??????????????????????????????
>      > \.
>      >
>      > ###### cut here
>      >
>      > If I run 'pg_dump --table=public.file --cluster 11/main
>      > --file=dump-2.sql bacula'  those lines are actually followed by
>     about
>      > 850 or so lines that look ok.  I'm assuming the difference is due to
>      > buffering.
>      > However the fact that I do see a number of regular lines
>     following this
>      > may suggest it's just garbage in the table but not really causing
>     the
>      > issue afterall.
> 
>     Assuming the above matches:
> 
>     COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
>     deltaseq, markid, lstat, md5)
> 
>     the '????????????????????' would be for the md5 field. I'm going to say
>     that is important.
> 
> 
> But that would be content of the database only. The should matter for 
> the application but not for a dump of the database, right?

I would think that depends on what '???????' is really representing.

In the database try:

SELECT md5 FROM public.file where fileid = 4557430888798830399;


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Nico De Ranter
Дата:


On Fri, May 22, 2020 at 5:07 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/22/20 7:48 AM, Nico De Ranter wrote:
> The original server was running 9.5.14
> The system I am currently testing on is  11.8
>
> 2 fields are marked as 'extended'.   However if I understand correctly
> the table isn't actually toasted:
>
>
>    oid  |    table_schema    |       table_name        | total_bytes  |
>    total    |   index    |   toast    |   table
> -------+--------------------+-------------------------+--------------+------------+------------+------------+------------
>   19601 | public             | file                    | 147648061440 |
> 138 GB     | 57 GB      | 8192 bytes | 80 GB

What query are you using to get above?

SELECT oid, table_schema, table_name,                                                                                           total_bytes,   pg_size_pretty(total_bytes) AS total   , pg_size_pretty(index_bytes) AS INDEX   , pg_size_pretty(toast_bytes) AS toast   , pg_size_pretty(table_bytes) AS TABLE FROM ( SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (     SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME             , c.reltuples AS row_estimate             , pg_total_relation_size(c.oid) AS total_bytes             , pg_indexes_size(c.oid) AS index_bytes             , pg_total_relation_size(reltoastrelid) AS toast_bytes         FROM pg_class c         LEFT JOIN pg_namespace n ON n.oid = c.relnamespace         WHERE relkind = 'r' ) a
) a order by total_bytes desc;
 Stolen from stackoverflow :-)

Nico

--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Nico De Ranter
Дата:


On Fri, May 22, 2020 at 5:09 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/22/20 8:05 AM, Nico De Ranter wrote:
>
> On Fri, May 22, 2020 at 5:02 PM Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 5/22/20 7:55 AM, Nico De Ranter wrote:
>      > Correct.
>      >
>      > If I run 'pg_dumpall --cluster 11/main --file=dump.sql'   the end
>     of the
>      > file looks like:
>      >
>      > ###### cut here
>      > 4557430888798830399 1061109567 1061109567 1061109567 1061109567
>     16191 \N
>      > \N ??????????????????????????????
>      > 4557430888798830399 1061109567 1061109567 1061109567 1061109567
>     16191 \N
>      > \N ??????????????????????????????
>      > \.
>      >
>      > ###### cut here
>      >
>      > If I run 'pg_dump --table=public.file --cluster 11/main
>      > --file=dump-2.sql bacula'  those lines are actually followed by
>     about
>      > 850 or so lines that look ok.  I'm assuming the difference is due to
>      > buffering.
>      > However the fact that I do see a number of regular lines
>     following this
>      > may suggest it's just garbage in the table but not really causing
>     the
>      > issue afterall.
>
>     Assuming the above matches:
>
>     COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
>     deltaseq, markid, lstat, md5)
>
>     the '????????????????????' would be for the md5 field. I'm going to say
>     that is important.
>
>
> But that would be content of the database only. The should matter for
> the application but not for a dump of the database, right?

I would think that depends on what '???????' is really representing.

In the database try:

SELECT md5 FROM public.file where fileid = 4557430888798830399;



bacula=# SELECT md5 FROM public.file where fileid = 4557430888798830399;
 md5
-----
(0 rows)
 
So that fileid is bogus too (max(bigint) I assume)

--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 8:05 AM, Nico De Ranter wrote:
> 

> 
>     Assuming the above matches:
> 
>     COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
>     deltaseq, markid, lstat, md5)
> 
>     the '????????????????????' would be for the md5 field. I'm going to say
>     that is important.
> 
> 
> But that would be content of the database only. The should matter for 
> the application but not for a dump of the database, right?

Also what does:

\d public.file

show?

In particular are there any triggers on the table?


> 
> -- 
> 
> Nico De Ranter
> 

> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Nico De Ranter
Дата:


On Fri, May 22, 2020 at 5:14 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/22/20 8:05 AM, Nico De Ranter wrote:
>

>
>     Assuming the above matches:
>
>     COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
>     deltaseq, markid, lstat, md5)
>
>     the '????????????????????' would be for the md5 field. I'm going to say
>     that is important.
>
>
> But that would be content of the database only. The should matter for
> the application but not for a dump of the database, right?

Also what does:

\d public.file

show?

In particular are there any triggers on the table?

bacula=# \d public.file
                                 Table "public.file"
   Column   |   Type   | Collation | Nullable |               Default                
------------+----------+-----------+----------+--------------------------------------
 fileid     | bigint   |           | not null | nextval('file_fileid_seq'::regclass)
 fileindex  | integer  |           | not null | 0
 jobid      | integer  |           | not null |
 pathid     | integer  |           | not null |
 filenameid | integer  |           | not null |
 deltaseq   | smallint |           | not null | 0
 markid     | integer  |           | not null | 0
 lstat      | text     |           | not null |
 md5        | text     |           | not null |
Indexes:
    "file_pkey" PRIMARY KEY, btree (fileid)
    "file_jobid_idx" btree (jobid)
    "file_jpfid_idx" btree (jobid, pathid, filenameid)



Following up on the max(bigint), I tried 

   SELECT md5 FROM public.file where fileid >2087994666;

and got

   ERROR:  compressed data is corrupted

So it does look like those entries are killing it.  Now for the million-dollar question: how do I get them out?

Nico

--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 8:13 AM, Nico De Ranter wrote:
> 
> 

> 
> 
> bacula=# SELECT md5 FROM public.file where fileid = 4557430888798830399;
>   md5
> -----
> (0 rows)
> So that fileid is bogus too (max(bigint) I assume)

No:

select 4557430888798830399::bigint;
         int8
---------------------
  4557430888798830399
(1 row)

It means it cannot find that fileid. I putting that down to file corruption.


> 
> -- 
> 
> Nico De Ranter
> 
>
-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 8:17 AM, Nico De Ranter wrote:
> 
> 
> On Fri, May 22, 2020 at 5:14 PM Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 5/22/20 8:05 AM, Nico De Ranter wrote:
>      >
> 
>      >
>      >     Assuming the above matches:
>      >
>      >     COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
>      >     deltaseq, markid, lstat, md5)
>      >
>      >     the '????????????????????' would be for the md5 field. I'm
>     going to say
>      >     that is important.
>      >
>      >
>      > But that would be content of the database only. The should matter
>     for
>      > the application but not for a dump of the database, right?
> 
>     Also what does:
> 
>     \d public.file
> 
>     show?
> 
>     In particular are there any triggers on the table?
> 
> 
> bacula=# \d public.file
>                                   Table "public.file"
>     Column   |   Type   | Collation | Nullable |               Default
> ------------+----------+-----------+----------+--------------------------------------
>   fileid     | bigint   |           | not null | 
> nextval('file_fileid_seq'::regclass)
>   fileindex  | integer  |           | not null | 0
>   jobid      | integer  |           | not null |
>   pathid     | integer  |           | not null |
>   filenameid | integer  |           | not null |
>   deltaseq   | smallint |           | not null | 0
>   markid     | integer  |           | not null | 0
>   lstat      | text     |           | not null |
>   md5        | text     |           | not null |
> Indexes:
>      "file_pkey" PRIMARY KEY, btree (fileid)
>      "file_jobid_idx" btree (jobid)
>      "file_jpfid_idx" btree (jobid, pathid, filenameid)
> 
> 
> 
> Following up on the max(bigint), I tried
> 
>     SELECT md5 FROM public.file where fileid >2087994666;
> 
> and got
> 
>     ERROR:  compressed data is corrupted
> 
> So it does look like those entries are killing it.  Now for the 
> million-dollar question: how do I get them out?

Do you have recent previous backup?

> 
> Nico
> 
> -- 
> 
> Nico De Ranter
> 
> Operations Engineer
> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/22/20 8:17 AM, Nico De Ranter wrote:
> 
> 
> On Fri, May 22, 2020 at 5:14 PM Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 5/22/20 8:05 AM, Nico De Ranter wrote:
>      >
> 
>      >
>      >     Assuming the above matches:
>      >
>      >     COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
>      >     deltaseq, markid, lstat, md5)
>      >
>      >     the '????????????????????' would be for the md5 field. I'm
>     going to say
>      >     that is important.
>      >
>      >
>      > But that would be content of the database only. The should matter
>     for
>      > the application but not for a dump of the database, right?
> 
>     Also what does:
> 
>     \d public.file
> 
>     show?
> 
>     In particular are there any triggers on the table?
> 
> 
> bacula=# \d public.file
>                                   Table "public.file"
>     Column   |   Type   | Collation | Nullable |               Default
> ------------+----------+-----------+----------+--------------------------------------
>   fileid     | bigint   |           | not null | 
> nextval('file_fileid_seq'::regclass)
>   fileindex  | integer  |           | not null | 0
>   jobid      | integer  |           | not null |
>   pathid     | integer  |           | not null |
>   filenameid | integer  |           | not null |
>   deltaseq   | smallint |           | not null | 0
>   markid     | integer  |           | not null | 0
>   lstat      | text     |           | not null |
>   md5        | text     |           | not null |
> Indexes:
>      "file_pkey" PRIMARY KEY, btree (fileid)
>      "file_jobid_idx" btree (jobid)
>      "file_jpfid_idx" btree (jobid, pathid, filenameid)
> 
> 
> 
> Following up on the max(bigint), I tried
> 
>     SELECT md5 FROM public.file where fileid >2087994666;
> 
> and got
> 
>     ERROR:  compressed data is corrupted
> 
> So it does look like those entries are killing it.  Now for the 
> million-dollar question: how do I get them out?

This is a Bacula table and I'm guessing you are using:

https://www.bacula.org/9.6.x-manuals/en/main/Configuring_Director.html#11585

signature=MD5
     An MD5 signature will be computed for all files saved. Adding this 
option generates about 5% extra overhead for each file saved. In 
addition to the additional CPU time, the MD5 signature adds 16 more 
bytes per file to your catalog. We strongly recommend that this option 
or the SHA1 option be specified as a default for all files.

Maybe you can ask the Bacula folks how that is actually calculated and 
test that on row, to see if you can overwrite the bad data.

> 
> Nico
> 
> -- 
> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Nico De Ranter
Дата:
Unfortunately not. I discovered the issue rather late. The last working backup is about 2 months old.

On Fri, May 22, 2020 at 5:36 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/22/20 8:17 AM, Nico De Ranter wrote:
>
>
> On Fri, May 22, 2020 at 5:14 PM Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 5/22/20 8:05 AM, Nico De Ranter wrote:
>      >
>
>      >
>      >     Assuming the above matches:
>      >
>      >     COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
>      >     deltaseq, markid, lstat, md5)
>      >
>      >     the '????????????????????' would be for the md5 field. I'm
>     going to say
>      >     that is important.
>      >
>      >
>      > But that would be content of the database only. The should matter
>     for
>      > the application but not for a dump of the database, right?
>
>     Also what does:
>
>     \d public.file
>
>     show?
>
>     In particular are there any triggers on the table?
>
>
> bacula=# \d public.file
>                                   Table "public.file"
>     Column   |   Type   | Collation | Nullable |               Default
> ------------+----------+-----------+----------+--------------------------------------
>   fileid     | bigint   |           | not null |
> nextval('file_fileid_seq'::regclass)
>   fileindex  | integer  |           | not null | 0
>   jobid      | integer  |           | not null |
>   pathid     | integer  |           | not null |
>   filenameid | integer  |           | not null |
>   deltaseq   | smallint |           | not null | 0
>   markid     | integer  |           | not null | 0
>   lstat      | text     |           | not null |
>   md5        | text     |           | not null |
> Indexes:
>      "file_pkey" PRIMARY KEY, btree (fileid)
>      "file_jobid_idx" btree (jobid)
>      "file_jpfid_idx" btree (jobid, pathid, filenameid)
>
>
>
> Following up on the max(bigint), I tried
>
>     SELECT md5 FROM public.file where fileid >2087994666;
>
> and got
>
>     ERROR:  compressed data is corrupted
>
> So it does look like those entries are killing it.  Now for the
> million-dollar question: how do I get them out?

Do you have recent previous backup?

>
> Nico
>
> --
>
> Nico De Ranter
>
> Operations Engineer
>



--
Adrian Klaver
adrian.klaver@aklaver.com


--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com


Re: pg_dump crashes

От
Adrian Klaver
Дата:
On 5/24/20 10:30 PM, Nico De Ranter wrote:
> Unfortunately not. I discovered the issue rather late. The last working 
> backup is about 2 months old.

Well first it is entirely possible this is not the only corruption in 
the database.

Second you are probably going to have to reach out to the Bacula folks:

https://www.bacula.org/support/
https://sourceforge.net/projects/bacula/lists/bacula-users

I would say the questions to ask them are:

1) Is the md5 required for a file?

2) If so how and over what is calculated?


Then you could experiment with trying to update the md5 field with new 
data.

>      >
>      >
>      >
>      > Following up on the max(bigint), I tried
>      >
>      >     SELECT md5 FROM public.file where fileid >2087994666;
>      >
>      > and got
>      >
>      >     ERROR:  compressed data is corrupted
>      >
>      > So it does look like those entries are killing it.  Now for the
>      > million-dollar question: how do I get them out?
> 
>     Do you have recent previous backup?
> 
>      >
>      > Nico
>      >
>      > --
>      >
>      > Nico De Ranter
>      >
>      > Operations Engineer
>      >
> 
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 
> 
> 
> -- 
> 
> Nico De Ranter
> 
> Operations Engineer
> 



-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: pg_dump crashes

От
Nico De Ranter
Дата:
FYI:  I tried setting the md5 field to '' in the whole table but that didn't fix the pg_dump issue.  In the end I decided to drop the database and revert to my last successful backup. I'm now still reading in all tapes to reconstruct the latest database state.

Thanks for the help anyway.

Nico

On Mon, May 25, 2020 at 4:20 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/24/20 10:30 PM, Nico De Ranter wrote:
> Unfortunately not. I discovered the issue rather late. The last working
> backup is about 2 months old.

Well first it is entirely possible this is not the only corruption in
the database.

Second you are probably going to have to reach out to the Bacula folks:

https://www.bacula.org/support/
https://sourceforge.net/projects/bacula/lists/bacula-users

I would say the questions to ask them are:

1) Is the md5 required for a file?

2) If so how and over what is calculated?


Then you could experiment with trying to update the md5 field with new
data.

>      >
>      >
>      >
>      > Following up on the max(bigint), I tried
>      >
>      >     SELECT md5 FROM public.file where fileid >2087994666;
>      >
>      > and got
>      >
>      >     ERROR:  compressed data is corrupted
>      >
>      > So it does look like those entries are killing it.  Now for the
>      > million-dollar question: how do I get them out?
>
>     Do you have recent previous backup?
>
>      >
>      > Nico
>      >
>      > --
>      >
>      > Nico De Ranter
>      >
>      > Operations Engineer
>      >
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
>
> --
>
> Nico De Ranter
>
> Operations Engineer
>



--
Adrian Klaver
adrian.klaver@aklaver.com


--

Nico De Ranter

Operations Engineer

T. +32 16 38 72 10



eSATURNUS
Philipssite 5, D, box 28
3001 Leuven – Belgium

T. +32 16 40 12 82
F. +32 16 40 84 77
www.esaturnus.com

For Service & Support :

Support Line Belgium: +32 2 2009897

Support Line International: +44 12 56 68 38 78

Or via email : medical.services.eu@sony.com