Обсуждение: "invalid memory alloc request size" + "Could not open file "pg_clog/XXXX"

Поиск
Список
Период
Сортировка

"invalid memory alloc request size" + "Could not open file "pg_clog/XXXX"

От
scheu_postgresql
Дата:
Hi

In my Postgresql 8.4.0 server, since this morning some tables are unavailable, see example below :

--> pg_dump MY_DB > bkp_MY_DB.dmp
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  invalid memory alloc request size 18446744073709551613
pg_dump: The command was: COPY <schema>.<unavailable_table> (col1, col2, ...).

--> vacuum analyze <schema>.<unavailable_table> ;
            WARNING:  terminating connection because of crash of another server process
            DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
            HINT:  In a moment you should be able to reconnect to the database and repeat your command.

--> select * from <schema>.<unavailable_table> ;
ERROR:  invalid memory alloc request size 18446744073709551613

--> server log file
Feb 29 05:31:44 my_server postgres[6686]: [17-1] user=,db= LOG:  server process (PID 3887) was terminated by signal 11: Segmentation fault
Feb 29 05:31:44 my_server postgres[6686]: [18-1] user=,db= LOG:  terminating any other active server processes
Feb 29 05:31:44 my_server postgres[6686]: [19-1] user=,db= LOG:  all server processes terminated; reinitializing
Feb 29 05:31:44 my_server postgres[3892]: [20-1] user=,db= LOG:  database system was interrupted; last known up at 2012-02-29 05:22:33 CET
Feb 29 05:31:44 my_server postgres[3892]: [21-1] user=,db= LOG:  database system was not properly shut down; automatic recovery in progress
Feb 29 05:31:44 my_server postgres[3892]: [22-1] user=,db= LOG:  redo starts at 10/67C2A3B8
Feb 29 05:31:45 my_server postgres[3892]: [23-1] user=,db= LOG:  record with zero length at 10/68BCF990
Feb 29 05:31:45 my_server postgres[3892]: [24-1] user=,db= LOG:  redo done at 10/68BCF960
Feb 29 05:31:45 my_server postgres[3892]: [25-1] user=,db= LOG:  last completed transaction was at log time 2012-02-29 05:31:42.618352+01
Feb 29 05:31:45 my_server postgres[6686]: [20-1] user=,db= LOG:  database system is ready to accept connections
Feb 29 05:32:52 my_server postgres[4469]: [21-1] user=[unknown],db=[unknown] LOG:  incomplete startup packet
Feb 29 05:33:52 my_server postgres[6686]: [21-1] user=,db= LOG:  server process (PID 5151) was terminated by signal 11: Segmentation fault
Feb 29 05:33:52 my_server postgres[6686]: [22-1] user=,db= LOG:  terminating any other active server processes
Feb 29 05:33:52 my_server postgres[6686]: [23-1] user=,db= LOG:  all server processes terminated; reinitializing
Feb 29 05:33:52 my_server postgres[5152]: [24-1] user=,db= LOG:  database system was interrupted; last known up at 2012-02-29 05:31:45 CET
Feb 29 05:33:52 my_server postgres[5152]: [25-1] user=,db= LOG:  database system was not properly shut down; automatic recovery in progress
Feb 29 05:33:52 my_server postgres[5152]: [26-1] user=,db= LOG:  record with zero length at 10/68BCF9D8
Feb 29 05:33:52 my_server postgres[5152]: [27-1] user=,db= LOG:  redo is not required
Feb 29 05:33:52 my_server postgres[5153]: [24-1] user=match,db=MY_DB FATAL:  the database system is in recovery mode
Feb 29 05:33:52 my_server postgres[6686]: [24-1] user=,db= LOG:  database system is ready to accept connections
Feb 29 05:37:19 my_server postgres[6686]: [25-1] user=,db= LOG:  server process (PID 8065) was terminated by signal 11: Segmentation fault
Feb 29 05:37:19 my_server postgres[6686]: [26-1] user=,db= LOG:  terminating any other active server processes
Feb 29 05:37:19 my_server postgres[6686]: [27-1] user=,db= LOG:  all server processes terminated; reinitializing
Feb 29 05:37:19 my_server postgres[8066]: [28-1] user=,db= LOG:  database system was interrupted; last known up at 2012-02-29 05:33:52 CET
Feb 29 05:37:19 my_server postgres[8066]: [29-1] user=,db= LOG:  database system was not properly shut down; automatic recovery in progress
Feb 29 05:37:19 my_server postgres[8066]: [30-1] user=,db= LOG:  redo starts at 10/68BCFA20
Feb 29 05:37:19 my_server postgres[8066]: [31-1] user=,db= LOG:  record with zero length at 10/68BD5BD0
Feb 29 05:37:19 my_server postgres[8066]: [32-1] user=,db= LOG:  redo done at 10/68BD5BA0
Feb 29 05:37:19 my_server postgres[8066]: [33-1] user=,db= LOG:  last completed transaction was at log time 2012-02-29 05:35:44.468968+01
Feb 29 05:37:19 my_server postgres[6686]: [28-1] user=,db= LOG:  database system is ready to accept connections
Feb 29 05:38:27 my_server postgres[8639]: [29-1] user=[unknown],db=[unknown] LOG:  incomplete startup packet
Feb 29 05:38:53 my_server postgres[6686]: [29-1] user=,db= LOG:  server process (PID 8809) was terminated by signal 11: Segmentation fault


I have tried to restart Postgresql but it did not solve these issues
I cannot backup the full database because some tables have become unreadable
I have got 7 databases on this server and only 2 have got this problem

What could be the cause of the problem ?
Is there a way to fix it without losing data and without dropping and recreating the db with my nightly pg_dump backup ?

Thank you by advance for your help

Re: "invalid memory alloc request size" + "Could not open file "pg_clog/XXXX"

От
"Albe Laurenz"
Дата:
scheu_postgresql wrote:
> In my Postgresql 8.4.0 server, since this morning some tables are
unavailable, see example below :
>
> --> pg_dump MY_DB > bkp_MY_DB.dmp
> pg_dump: SQL command failed
> pg_dump: Error message from server: ERROR:  invalid memory alloc
request size 18446744073709551613
> pg_dump: The command was: COPY <schema>.<unavailable_table> (col1,
col2, ...).
>
> --> vacuum analyze <schema>.<unavailable_table> ;
>             WARNING:  terminating connection because of crash of
another server process
>             DETAIL:  The postmaster has commanded this server process
to roll back the current
> transaction and exit, because another server process exited abnormally
and possibly corrupted shared
> memory.
>             HINT:  In a moment you should be able to reconnect to the
database and repeat your
> command.
>
> --> select * from <schema>.<unavailable_table> ;
> ERROR:  invalid memory alloc request size 18446744073709551613
>
> --> server log file
> Feb 29 05:31:44 my_server postgres[6686]: [17-1] user=,db= LOG:
server process (PID 3887) was
> terminated by signal 11: Segmentation fault
> Feb 29 05:31:44 my_server postgres[6686]: [18-1] user=,db= LOG:
terminating any other active server
> processes
> Feb 29 05:31:44 my_server postgres[6686]: [19-1] user=,db= LOG:  all
server processes terminated;
> reinitializing
> Feb 29 05:31:44 my_server postgres[3892]: [20-1] user=,db= LOG:
database system was interrupted; last
> known up at 2012-02-29 05:22:33 CET
> Feb 29 05:31:44 my_server postgres[3892]: [21-1] user=,db= LOG:
database system was not properly shut
> down; automatic recovery in progress
> Feb 29 05:31:44 my_server postgres[3892]: [22-1] user=,db= LOG:  redo
starts at 10/67C2A3B8
> Feb 29 05:31:45 my_server postgres[3892]: [23-1] user=,db= LOG:
record with zero length at
> 10/68BCF990
> Feb 29 05:31:45 my_server postgres[3892]: [24-1] user=,db= LOG:  redo
done at 10/68BCF960
> Feb 29 05:31:45 my_server postgres[3892]: [25-1] user=,db= LOG:  last
completed transaction was at log
> time 2012-02-29 05:31:42.618352+01
> Feb 29 05:31:45 my_server postgres[6686]: [20-1] user=,db= LOG:
database system is ready to accept
> connections
> Feb 29 05:32:52 my_server postgres[4469]: [21-1]
user=[unknown],db=[unknown] LOG:  incomplete startup
> packet
> Feb 29 05:33:52 my_server postgres[6686]: [21-1] user=,db= LOG:
server process (PID 5151) was
> terminated by signal 11: Segmentation fault
> Feb 29 05:33:52 my_server postgres[6686]: [22-1] user=,db= LOG:
terminating any other active server
> processes
> Feb 29 05:33:52 my_server postgres[6686]: [23-1] user=,db= LOG:  all
server processes terminated;
> reinitializing
> Feb 29 05:33:52 my_server postgres[5152]: [24-1] user=,db= LOG:
database system was interrupted; last
> known up at 2012-02-29 05:31:45 CET
> Feb 29 05:33:52 my_server postgres[5152]: [25-1] user=,db= LOG:
database system was not properly shut
> down; automatic recovery in progress
> Feb 29 05:33:52 my_server postgres[5152]: [26-1] user=,db= LOG:
record with zero length at
> 10/68BCF9D8
> Feb 29 05:33:52 my_server postgres[5152]: [27-1] user=,db= LOG:  redo
is not required
> Feb 29 05:33:52 my_server postgres[5153]: [24-1] user=match,db=MY_DB
FATAL:  the database system is in
> recovery mode
> Feb 29 05:33:52 my_server postgres[6686]: [24-1] user=,db= LOG:
database system is ready to accept
> connections
> Feb 29 05:37:19 my_server postgres[6686]: [25-1] user=,db= LOG:
server process (PID 8065) was
> terminated by signal 11: Segmentation fault
> Feb 29 05:37:19 my_server postgres[6686]: [26-1] user=,db= LOG:
terminating any other active server
> processes
> Feb 29 05:37:19 my_server postgres[6686]: [27-1] user=,db= LOG:  all
server processes terminated;
> reinitializing
> Feb 29 05:37:19 my_server postgres[8066]: [28-1] user=,db= LOG:
database system was interrupted; last
> known up at 2012-02-29 05:33:52 CET
> Feb 29 05:37:19 my_server postgres[8066]: [29-1] user=,db= LOG:
database system was not properly shut
> down; automatic recovery in progress
> Feb 29 05:37:19 my_server postgres[8066]: [30-1] user=,db= LOG:  redo
starts at 10/68BCFA20
> Feb 29 05:37:19 my_server postgres[8066]: [31-1] user=,db= LOG:
record with zero length at
> 10/68BD5BD0
> Feb 29 05:37:19 my_server postgres[8066]: [32-1] user=,db= LOG:  redo
done at 10/68BD5BA0
> Feb 29 05:37:19 my_server postgres[8066]: [33-1] user=,db= LOG:  last
completed transaction was at log
> time 2012-02-29 05:35:44.468968+01
> Feb 29 05:37:19 my_server postgres[6686]: [28-1] user=,db= LOG:
database system is ready to accept
> connections
> Feb 29 05:38:27 my_server postgres[8639]: [29-1]
user=[unknown],db=[unknown] LOG:  incomplete startup
> packet
> Feb 29 05:38:53 my_server postgres[6686]: [29-1] user=,db= LOG:
server process (PID 8809) was
> terminated by signal 11: Segmentation fault
>
>
> I have tried to restart Postgresql but it did not solve these issues
> I cannot backup the full database because some tables have become
unreadable
> I have got 7 databases on this server and only 2 have got this problem
>
> What could be the cause of the problem ?

If a sequential scan fails, I would say that the table is corrupted.
The cause could be faulty hardware, a corrupted file system or a
software bug.
I notice that you are running 8.4.0 which is a really bad idea.
A number of data corruption bugs have been fixed since.

Check the hardware and the file systems.

> Is there a way to fix it without losing data and without dropping and
recreating the db with my
> nightly pg_dump backup ?

Without losing data? Not unless you can poke around in the guts of
the corrupted blocks and make sense of what you find there...

If your requirement is "no data loss", you'll have to use a different
backup strategy.

Yours,
Laurenz Albe